Dr. Luther’s position paper accompanying the presentation is available online. The abstract for the paper is as follows:
Visual search tasks, such as identifying an unknown person or location in a photo, are a crucial element of many forms of investigative work, from academic research, to journalism, to law enforcement. While AI techniques like computer vision can often quickly and accurately narrow down a large search space of thousands of possibilities to a shortlist of promising candidates, they usually cannot select the correct match(es) among those, a challenge known as the last-mile problem. We have developed an approach called crowd-augmented expert work to leverage the complementary strengths of human intelligence to solve the last-mile problem. We report on case studies developing and deploying two visual search tools, GroundTruth and Photo Sleuth, to illustrate this approach.
Historians spend significant time looking for relevant, high-quality primary sources in digitized archives and through web searches. One reason this task is time-consuming is that historians’ research interests are often highly abstract and specialized. These topics are unlikely to be manually indexed and are difficult to identify with automated text analysis techniques. In this article, we investigate the potential of a new crowdsourcing model in which the historian delegates to a novice crowd the task of labeling the relevance of primary sources with respect to her unique research interests. The model employs a novel crowd workflow, Read-Agree-Predict (RAP), that allows novice crowd workers to label relevance as well as expert historians. As a useful byproduct, RAP also reveals and prioritizes crowd confusions as targeted learning opportunities. We demonstrate the value of our model with two experiments with paid crowd workers (n=170), with the future goal of extending our work to classroom students and public history interventions. We also discuss broader implications for historical research and education.
The Crowd Lab had two posters/demos accepted for AAAI HCOMP 2019! Both of these papers involved substantial contributions from our summer REU interns, who will be attending the conference at Skamania Lodge, Washington, to present their work.
It’s QuizTime: A study of online verification practices on Twitter was led by Crowd Lab Ph.D. student Sukrit Venkatagiri, with co-authors Jacob Thebault-Spieker, Sarwat Kazmi, and Efua Akonor. Sarwat and Efua were summer REU interns in the Crowd Lab from the University of Maryland and Wellesley College, respectively. The abstract for the poster is:
Misinformation poses a threat to public health, safety, and democracy. Training novices to debunk visual misinformation with image verification techniques has shown promise, yet little is known about how novices do so in the wild, and what methods prove effective. Thus, we studied 225 verification challenges posted by experts on Twitter over one year with the aim of improving novices’ skills. We collected, annotated, and analyzed these challenges and over 3,100 replies by 304 unique participants. We find that novices employ multiple tools and approaches, and techniques like collaboration and reverse image search significantly improve performance.
PairWise: Mitigating political bias in crowdsourced content moderation was led by Crowd Lab postdoc Jacob Thebault-Spieker, with co-authors Sukrit Venkatagiri, David Mitchell, and Chris Hurt. David was a summer REU intern from the University of Illinois, and Chris was a Virginia Tech undergraduate. The abstract for the demo is:
Crowdsourced labeling of political social media content is an area of increasing interest, due to the contextual nature of political content. However, there are substantial risks of human biases causing data to be labelled incorrectly, possibly advantaging certain political groups over others. Inspired by the social computing theory of social translucence and findings from social psychology, we built PairWise, a system designed to facilitate interpersonal accountability and help mitigate biases in political content labelling.
Ph.D. student and lead author Vikram Mohanty will present the paper, co-authored with Dr. Luther and Crowd Lab undergraduate researchers Kareem Abdol-Hamid and Courtney Ebersohl. Here’s the paper’s abstract:
As AI-based face recognition technologies are increasingly adopted for high-stakes applications like locating suspected criminals, public concerns about the accuracy of these technologies have grown as well. These technologies often present a human expert with a shortlist of high-confidence candidate faces from which the expert must select correct match(es) while avoiding false positives, which we term the “last-mile problem.” We propose Second Opinion, a web-based software tool that employs a novel crowdsourcing workflow inspired by cognitive psychology, seed-gather-analyze, to assist experts in solving the last-mile problem. We evaluated Second Opinion with a mixed-methods lab study involving 10 experts and 300 crowd workers who collaborate to identify people in historical photos. We found that crowds can eliminate 75% of false positives from the highest-confidence candidates suggested by face recognition, and that experts were enthusiastic about using Second Opinion in their work. We also discuss broader implications for crowd–AI interaction and crowdsourced person identification.
The Crowd Lab had two papers accepted for the upcoming ACM Computer Supported Cooperative Work and Social Computing (CSCW 2019) conference in Austin, TX, USA, November 9-13, 2019. The conference had a 31% acceptance rate.
Expert investigators bring advanced skills and deep experience to analyze visual evidence, but they face limits on their time and attention. In contrast, crowds of novices can be highly scalable and parallelizable, but lack expertise. In this paper, we introduce the concept of shared representations for crowd–augmented expert work, focusing on the complex sensemaking task of image geolocation performed by professional journalists and human rights investigators. We built GroundTruth, an online system that uses three shared representations—a diagram, grid, and heatmap—to allow experts to work with crowds in real time to geolocate images. Our mixed-methods evaluation with 11 experts and 567 crowd workers found that GroundTruth helped experts geolocate images, and revealed challenges and success strategies for expert–crowd interaction. We also discuss designing shared representations for visual search, sensemaking, and beyond.
Crowdsourced sensemaking has shown great potential for enabling scalable analysis of complex data sets, from planning trips, to designing products, to solving crimes. Yet, most crowd sensemaking approaches still require expert intervention because of worker errors and bottlenecks that would otherwise harm the output quality. Mitigating these errors and bottlenecks would significantly reduce the burden on experts, yet little is known about the types of mistakes crowds make with sensemaking micro-tasks and how they propagate in the sensemaking loop. In this paper, we conduct a series of studies with 325 crowd workers using a crowd sensemaking pipeline to solve a fictional terrorist plot, focusing on understanding why errors and bottlenecks happen and how they propagate. We classify types of crowd errors and show how the amount and quality of input data influence worker performance. We conclude by suggesting design recommendations for integrated crowdsourcing systems and speculating how a complementary top-down path of the pipeline could refine crowd analyses.
Congratulations to Sukrit, Tianyi, and their collaborators!
Our paper, “Flud: a hybrid crowd-algorithm approach for visualizing biological networks,” was accepted to the CHI 2019 workshop titled, Where is the Human? Bridging the Gap Between AI and HCI, in Glasgow, Scotland. Congratulations to Crowd Lab co-authors Aditya Bharadwaj (Ph.D. student) and David Gwizdala (undergraduate researcher), as well as Yoonjin Kim and Aditya’s co-advisor, Dr. T.M. Murali.
Checklists and guidelines have played an increasingly important role in complex tasks ranging from the cockpit to the operating theater. Their role in creative tasks like design is less explored. In a needfinding study with expert web designers, we identified designers’ challenges in adhering to a checklist of design guidelines. We built Critter, which addressed these challenges with three components: Dynamic Checklists that progressively disclose guideline complexity with a self-pruning hierarchical view, AutoQA to automate common quality assurance checks, and guideline-specific feedback provided by a reviewer to highlight mistakes as they appear. In an observational study, we found that the more engaged a designer was with Critter, the fewer mistakes they made in following design guidelines. Designers rated the AutoQA and contextual feedback experience highly, and provided feedback on the tradeoffs of the hierarchical Dynamic Checklists. We additionally found that a majority of designers rated the AutoQA experience as excellent and felt that it increased the quality of their work. Finally, we discuss broader implications for supporting complex creative tasks.
Two members of the Crowd Lab each had a paper accepted for presentation at the upcoming IUI 2019 conference in Los Angeles, CA. The acceptance rate for this conference, which focuses on the intersection of human-computer interaction and artificial intelligence, was 25%.
Identifying people in historical photographs is important for preserving material culture, correcting the historical record, and creating economic value, but it is also a complex and challenging task. In this paper, we focus on identifying portraits of soldiers who participated in the American Civil War (1861- 65), the first widely-photographed conflict. Many thousands of these portraits survive, but only 10–20% are identified. We created Photo Sleuth, a web-based platform that combines crowdsourced human expertise and automated face recognition to support Civil War portrait identification. Our mixed-methods evaluation of Photo Sleuth one month after its public launch showed that it helped users successfully identify unknown portraits and provided a sustainable model for volunteer contribution. We also discuss implications for crowd-AI interaction and person identification pipelines.
Major breaches of sensitive company data, as for Facebook’s 50 million user accounts in 2018 or Equifax’s 143 million user accounts in 2017, are showing the limitations of reactive data security technologies. Companies and government organizations are turning to proactive data security technologies that secure sensitive data at source. However, data security analysts still face two fundamental challenges in data protection decisions: 1) the information overload from the growing number of data repositories and protection techniques to consider; 2) the optimization of protection plans given the current goals and available resources in the organization. In this work, we propose an intelligent user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. In a domain with limited access to expert users and practices, we elicited user requirements from security analysts in industry and modeled data risks based on architectural and conceptual attributes. Our preliminary evaluation suggests that the design improves the understanding and trust of the recommended protections and helps convert risk information in protection plans.
Congratulations to Vikram, David, Sneha, Tianyi, and their collaborators!
Dr. Luther recently published a report about the GroupSight Workshop on Human Computation for Image and Video Analysis in AI Magazine. The workshop, held at the HCOMP 2017 conference in Quebec City, Canada, was co-organized by Dr. Luther, Danna Gurari, Genevieve Patterson, and Steve Branson. More information about the workshop can be found in a Follow the Crowd blog post by Dr. Luther.
Two members of the Crowd Lab each had a paper accepted for presentation at the CSCW 2018 conference in Jersey City, NJ. The acceptance rate for this top-tier conference was 26%.
Crowdsourcing more complex and creative tasks is seen as a desirable goal for both employers and workers, but these tasks traditionally require domain expertise. Employers can recruit only expert workers, but this approach does not scale well. Alternatively, employers can decompose complex tasks into simpler micro-tasks, but some domains, such as historical analysis, cannot be easily modularized in this way. A third approach is to train workers to learn the domain expertise. This approach offers clear benefits to workers, but is perceived as costly or infeasible for employers. In this paper, we explore the trade-offs between learning and productivity in training crowd workers to analyze historical documents. We compare CrowdSCIM, a novel approach that teaches historical thinking skills to crowd workers, with two crowd learning techniques from prior work and a baseline. Our evaluation (n=360) shows that CrowdSCIM allows workers to learn domain expertise while producing work of equal or higher quality versus other conditions, but efficiency is slightly lower.
The increasing volume of text data is challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but we must overcome the challenge of modeling the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this paper, we explore how to crowdsource the sensemaking process via a pipeline of modularized steps connected by clearly defined inputs and outputs. Our pipeline restructures and partitions information into “context slices” for individual workers. We implemented CrowdIA, a software platform to enable unsupervised crowd sensemaking using our pipeline. With CrowdIA, crowds successfully solved two mysteries, and were one step away from solving the third. The crowd’s intermediate results revealed their reasoning process and provided evidence that justifies their conclusions. We suggest broader possibilities to optimize each component, as well as to evaluate and refine previous intermediate analyses to improve the final result.