crowdia – Crowd Intelligence Lab at Virginia Tech

Presented at University of Maryland CASCI Seminar

Dr. Luther gave an invited presentation at the Center for the Advanced Study of Communities and Information (CASCI) at the University of Maryland on November 19, 2019. The title of his presentation was, “Crowd Sleuths: Solving Mysteries with Crowdsourcing, Experts and AI.” The abstract was as follows:

Professional investigators in fields such as journalism, law enforcement, and academia have long sought the public’s help in solving mysteries, typically by providing tips. However, as social technologies capture more digital traces of daily life and enable new forms of collaboration, members of the public are increasingly leading their own investigations. These efforts are perhaps best known for high-profile failures characterized by sloppy research and vigilantism, such as the 2013 Boston Marathon Bombing manhunt on Reddit and 4chan. However, other crowdsourced investigations have led to the successful recovery of missing persons and apprehension of violent criminals, suggesting real potential. I will present three projects from my research group, the Crowd Intelligence Lab, where we helped to enable novice crowds to discover a hidden terrorist plot within large quantities of textual evidence documents; collaborate with expert investigators to geolocate and verify (or debunk) photos and videos shared on social media; and use AI-based face recognition to identify unknown soldiers in historical portraits from the American Civil War era.

Two papers accepted for CSCW 2019

The Crowd Lab had two papers accepted for the upcoming ACM Computer Supported Cooperative Work and Social Computing (CSCW 2019) conference in Austin, TX, USA, November 9-13, 2019. The conference had a 31% acceptance rate.

Ph.D. student Sukrit Venkatagiri will be presenting “GroundTruth: Augmenting expert image geolocation with crowdsourcing and shared representations,” co-authored with Jacob Thebault-Spieker, Rachel Kohler, John Purviance, Rifat Sabbir Mansur, and Kurt Luther, all from Virginia Tech. Here’s the paper’s abstract:

Expert investigators bring advanced skills and deep experience to analyze visual evidence, but they face limits on their time and attention. In contrast, crowds of novices can be highly scalable and parallelizable, but lack expertise. In this paper, we introduce the concept of shared representations for crowd–augmented expert work, focusing on the complex sensemaking task of image geolocation performed by professional journalists and human rights investigators. We built GroundTruth, an online system that uses three shared representations—a diagram, grid, and heatmap—to allow experts to work with crowds in real time to geolocate images. Our mixed-methods evaluation with 11 experts and 567 crowd workers found that GroundTruth helped experts geolocate images, and revealed challenges and success strategies for expert–crowd interaction. We also discuss designing shared representations for visual search, sensemaking, and beyond.

Ph.D. student Tianyi Li will be presenting “Dropping the baton? Understanding errors and bottlenecks in a crowdsourced sensemaking pipeline,” co-authored with Chandler J. Manns, Chris North, and Kurt Luther, also from VT. Here’s the abstract:

Crowdsourced sensemaking has shown great potential for enabling scalable analysis of complex data sets, from planning trips, to designing products, to solving crimes. Yet, most crowd sensemaking approaches still require expert intervention because of worker errors and bottlenecks that would otherwise harm the output quality. Mitigating these errors and bottlenecks would significantly reduce the burden on experts, yet little is known about the types of mistakes crowds make with sensemaking micro-tasks and how they propagate in the sensemaking loop. In this paper, we conduct a series of studies with 325 crowd workers using a crowd sensemaking pipeline to solve a fictional terrorist plot, focusing on understanding why errors and bottlenecks happen and how they propagate. We classify types of crowd errors and show how the amount and quality of input data influence worker performance. We conclude by suggesting design recommendations for integrated crowdsourcing systems and speculating how a complementary top-down path of the pipeline could refine crowd analyses.

Congratulations to Sukrit, Tianyi, and their collaborators!

Two papers accepted for CSCW 2018

Two members of the Crowd Lab each had a paper accepted for presentation at the CSCW 2018 conference in Jersey City, NJ. The acceptance rate for this top-tier conference was 26%.

Ph.D. student Nai-Ching Wang presented “Exploring Trade-Offs Between Learning and Productivity in Crowdsourced History” with Virginia Tech professor of education David Hicks and Dr. Luther as co-authors. Here is the paper’s abstract:

Crowdsourcing more complex and creative tasks is seen as a desirable goal for both employers and workers, but these tasks traditionally require domain expertise. Employers can recruit only expert workers, but this approach does not scale well. Alternatively, employers can decompose complex tasks into simpler micro-tasks, but some domains, such as historical analysis, cannot be easily modularized in this way. A third approach is to train workers to learn the domain expertise. This approach offers clear benefits to workers, but is perceived as costly or infeasible for employers. In this paper, we explore the trade-offs between learning and productivity in training crowd workers to analyze historical documents. We compare CrowdSCIM, a novel approach that teaches historical thinking skills to crowd workers, with two crowd learning techniques from prior work and a baseline. Our evaluation (n=360) shows that CrowdSCIM allows workers to learn domain expertise while producing work of equal or higher quality versus other conditions, but efficiency is slightly lower.

Ph.D. student Tianyi Li presented “CrowdIA: Solving Mysteries with Crowdsourced Sensemaking” with Dr. Luther and Virginia Tech computer science professor Chris North as co-authors. Here is the paper’s abstract:

The increasing volume of text data is challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but we must overcome the challenge of modeling the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this paper, we explore how to crowdsource the sensemaking process via a pipeline of modularized steps connected by clearly defined inputs and outputs. Our pipeline restructures and partitions information into “context slices” for individual workers. We implemented CrowdIA, a software platform to enable unsupervised crowd sensemaking using our pipeline. With CrowdIA, crowds successfully solved two mysteries, and were one step away from solving the third. The crowd’s intermediate results revealed their reasoning process and provided evidence that justifies their conclusions. We suggest broader possibilities to optimize each component, as well as to evaluate and refine previous intermediate analyses to improve the final result.

Congratulations Nai-Ching and Tianyi!