Can't we all just get along? Citizen scientists interacting with algorithms.

Responding to the continued and accelerating rise of computational AI based on machine learning and/or neural networking-based paradigms in citizen science, we organized a discussion panel at the third European Citizen Science 2020 Conference to initiate a dialogue on how citizen scientists interact and collaborate with algorithms. This brief summarizes a presentation about two Zooniverse projects, which illustrated the impact that new developments in computational AI are having on citizen science projects that involve visual inspection of large datasets. We also share the results of a poll to elicit opinions and ideas from the audience on two statements, one positive and one critical of using computational AI in citizen science. The discussion with the participants raised several issues that we grouped into four main themes: a) democracy and participation; b) skill-biased technological change; c) data ownership vs public domain/digital commons, and d) transparency. All these issues warrant further research for those who are concerned about computational AI in citizen science.


BACKGROUND
Responding to the continued and accelerating rise of computational AI methods based on machine learning (ML) and/or neural networking-based paradigms in citizen science, we organized a discussion panel at the European Citizen Science 2020 Conference held last September to initiate a dialogue on how citizen scientists interact and collaborate with algorithms in citizen science projects. Currently, several projects using ML are centered around analyzing, coding, and classifying data provided, for example, by cameras and telescope images. In a recently conducted literature review of 50 peer-reviewed papers 1 to map the current state of the division of labor in citizen science projects where integrations of humans and computational AI are implemented, three main types of projects employing both human and machine efforts were identified (Seredko, Gander, & Ponti, 20219: 1. Projects that relate to identifying and classifying objects, when the large size of a dataset makes expert identification or classification unfeasible (e.g., Nguyen et al., 2018;Lukic et al., 2018;Delipetrev et al., 2020).). Examples include the Zooniverse projects, Eyewire, and Stall Catchers.
2. Projects that benefit from citizen scientists' ability to collect data in the field covering large territories and producing large volumes of data, while ML approaches are used to predict the distribution of species or probability of phenomenon occurrence (e.g., Jackson et al., 2015;Robinson et al., 2018).
3. Projects focused on clustering data to discover new classes (Coughlin et al., 2019;Ostermann et al., 2017;Wright et al., 2019). In contrast to the first two types of projects, where citizen science data is used to 'help' an algorithm, here an algorithm is used to 'help' citizen scientists. An example is provided by Gravity Spy, a project aimed to classify the glitches that afflict gravitational wave detectors into 22 classes (Bahaadini et al., 2018). Coughlin et al. (2019) employed transfer learning -an algorithm designed to apply knowledge obtained from a model that was trained on one data set to another data set -to quantify similarities between Gravity Spy images. Transfer learning allowed citizen scientists to search for glitches of similar morphology facilitating the identification of new classes.
These types of projects rely on complex interplays of human efforts and machine learning to reach their goals. These interplays already take a remarkable variety of forms. The most obvious is pattern and species recognition: it is possible to train an algorithm for specific image recognition which can be used in projects that require classification of large amounts of image data. Notably, we concentrate on the applications of computational AI methods which are directly applied to the scientific process. We do not include the use of artificial intelligence (AI) for helping prospective citizen scientists to discover new projects they might contribute to, or the recent emerging use of chatbots using natural language processing to increase communication with participants in citizen science projects. Note that some research problems are still considered computationally intractable and need human cognitive skills. For example, machines cannot yet match a person's ability to identify certain objects and it is unclear to what extent they will ever succeed. Conversely, manual classification or identification of a large data set can be made more efficient in combination with ML approaches. Even so, the participation of citizens remains critical to perform certain tasks, such as the creation of datasets with correctly tagged data to feed algorithms (Torney et al., 2019). The Galaxy Zoo project and the classification and identification of galaxy morphological shapes are a good case in point (Walmsley et al., 2020).
Against this background, we organized a one-hour session around two interconnected events: 1) an instant poll to elicit opinions and ideas from the audience on two statements, one positive and one critical of using ML in CS, and 2) a presentation titled Can't we all just get along? Hybrid human-machine approaches to citizen science in the Zooniverse, which presented a fascinating case on the issue. In this brief, we present the results of the poll and a summary of the presentation. We conclude by discussing some open issues raised during the discussion and suggesting some directions for future research.

THE POLL
We conducted a poll with the participants to collect their opinions on two statements. Statement one (Q1) addressed a positive outcome of using AI/ML in citizen science, while statement two (Q2) addressed a potential negative impact: Question 1: The use of AI/Machine learning can result in the democratization of research, as laborintensive and routine tasks can be carried out by machines. Question 2: The use of AI/machine learning can reduce accountability (Who can be held responsible for problematic (discriminatory) outcomes? Research designer, algorithm developer, programmer, or untrained contributors?). We summarize the responses received from the participants. The results of the poll do not have statistical significance but are suggestive of how the audience -about 70 participants including citizen science researchers and practitioners -perceived a potential issue and an opportunity raised by the use of AI/ML in citizen science. Figure 1 shows the breakdown of responses to Question 1.

Figure 1. Participant responses to Question 1
We also collected some comments from the audience regarding their perceived opportunities (Table 1).

Participant feedback
The

Reliance on programmer design intentions
• It depends on who is doing the machine learning and how things are programmed -it still is potentially coming from a small group of people who are controlling interpretation of the data, and then shared a common understanding.
• In general, a lot is possible, and the technology can be beneficial for many things, but it depends on the design of the system. Constantly taking care of biases.  Figure 2 shows the breakdown of responses to Question 2.

THE ZOONIVERSE CASES
Grant Miller gave a talk about the work from within the Zooniverse, done in collaboration with Chris Lintott, Nora Eisner, and Mike Walmsley, in a presentation titled Can't we all just get along? Hybrid human-machine approaches to citizen science in the Zooniverse. Using examples from two long-established astronomical projects, Galaxy Zoo and Planet Hunters, they considered the impact that new developments in machine learning are having on citizen science projects which involve visual inspection of large datasets. Such projects, in particular those supported by the Zooniverse platform which hosts both Galaxy Zoo and Planet Hunters, have become widespread in the last decade, and in some fields, most notably ecology and astronomy, are now in common use.
The current rapid progress in machine learning for image recognition and labeling, in particular the use of deep learning through convolutional neural networks, generative adversarial networks, and more, presents an obvious threat to this mode of engagement; if machines can confidently carry out the work required, then there is no space for authentic engagement in the scientific process. The talk considered the changing efficacy of the Planet Hunters project, which relaunched in 2018 having run in its original form since 2011. The new site uses data from NASA's TESS satellite, which is conducting the first space-based all-sky search for transiting exoplanets, but the task is similar to prior versions. Interestingly, the increased competition in the discovery space provided by sophisticated machine learning spurred the project to operate in a much more open fashion, with volunteers' classifications providing a shared resource for researchers using machine learning in parallel.
Furthermore, Miller argued that there is significant evidence that the introduction of machine learning tools into a project with a sufficiently rich dataset can benefit both the scientific return and the engagement goals of a suitable project. Galaxy Zoo (Figure 3) recently introduced an enhanced mode, in which participants are preferentially served images of galaxies for which further classifications are expected to maximally improve machine learning performance. This system, which incorporates a Bayesian neural network capable of predicting confidence as well as classification, has seen increased volunteer engagement -the systems which the machine most needs classified map well to the set of those that volunteers are most interested in classifying.

OPEN ISSUES AND FINAL REMARKS
A general problem in citizen science (and generally in science) is that data grows much faster than the number of people who can analyze them. Therefore, combining human efforts with ML can help researchers process more data faster. Human-machine integration can also allow citizen scientists to focus on more difficult classification tasks, while helping researchers validate participants' observations. Deep convolutional neural networks, for example, can reduce the amount of repetitive expert-input required. While combining humans and machines has already proved to increase efficiency, like in the case of Zooniverse projects, it also raises issues that the citizen science community needs to address. The comments from the audience and the discussion during the session identified several open issues that need further investigation. We grouped such issues into four main themes: a) democracy and participation; b) skill-biased technological change; c) data ownership vs public domain/digital commons, and d) transparency. All these issues warrant further research.

Democracy and Participation.
Extending the capabilities of machines may not be enough to broaden the capabilities of citizen scientists and democratize the research capabilities that are now available only to professional scientists. As noted by some participants, machines taking over boring and repetitive tasks will not have as a direct consequence that humans will have more time and opportunities for doing sophisticated work in a project. The idea that democratization of research can impact required contributor skills was contested. It was suggested that access to an AI-powered "Citizen Science Toolkit" could enable citizens to initiate and conduct research for which they otherwise would not have the resources and/or skills.
Another issue concerns the question of "real" participation vs. participation-washing (Sloane et al., 2020, p. 1), a risk that citizen science projects incur when citizen scientists' efforts go unrecognized. Sloane et al. warned the ML community against the possibility to accept and support exploitative and extractive forms of community involvement. Unlike automated systems relieving humans of labor, citizen science projects using ML require human contributions. In several projects, citizens play an important role in contributing, classifying, and annotating the data that is used to train and evaluate ML models. Citizens can also improve the performance of these models when algorithms make mistakes.

Skill-biased technological change.
The rise of AI is now changing the capabilities of machines and the potential allocation of tasks between humans and algorithms, hence affecting the role of citizen scientists. These changes might result in different demands and skillsets for volunteers and effectively raise new barriers for contributing. However, it is not clear whether the use of algorithms could result in the disappearance of "low skill" citizen science roles, and by any means, what do we refer to when we talk about low-skilled or unskilled work in the context of citizen science? There are tasks that we might consider low skill but that does not mean a machine can do them. A question is also how skills are redistributed between humans and machines, depending on what roles they are assigned.
Ownership vs public domain/digital commons. Should the outcomes of volunteering always be public domain by default, or have some license restrictions (if only to guarantee that it will remain openly available? Do CC licenses constitute "ownership"? Ceccaroni et al. (2019) noted that AI computing resources should be openly accessible and available to citizens, to avoid their exclusion from decisions about data use or from involvement in research that uses AI.
Transparency. Lack of intelligibility and transparency of ML models reduce our understanding of how AI and deep learning function and the way we work with them. In this respect, for example, the European project SiSCODE (https://siscodeproject.eu) is examining this issue and has introduced the idea of AI co-spectatorship, claiming that we are more and more spectators with AI. SISCODE has organized workshops to explore how AI works outside its functionalities and reconfigures the way we see the world by experiencing this co-spectatorship alongside artificial agents.
The purpose of this brief is not to make a final declaration about the directions that should be taken, but to invite further examination of issues and trends concerning AI/ML in this area. We also consider the interplay between people (especially citizen scientists) and machines as a promising field to contribute to digital literacy, human-centered AI, and explainable AI, which are all three important priorities of our times.