Dr. Luther is the lead guest editor for an upcoming special issue of the journal ACM Transactions on Social Computing. The theme of the special issue, “Negotiating Truth and Trust in Socio-Technical Systems“, emerged from the Designing Socio-Technical Systems of Truth workshop that Dr. Luther led at Virginia Tech in March 2018. The special issue co-editors are Crowd Lab postdoc Jacob Thebault-Spieker, Andrea Kavanaugh (Virginia Tech), and Judd Antin (AirBnb).
Congrats to Crowd Lab Ph.D. student Aditya Bharadwaj for his accepted paper at the upcoming CHI 2019 conference in Glasgow, Scotland, in May. The acceptance rate for this top-tier human-computer interaction conference is 24%. The paper, titled “Critter: Augmenting Creative Work with Dynamic Checklists, Automated Quality Assurance, and Contextual Reviewer Feedback“, was co-authored with colleagues Pao Siangliulue and Adam Marcus at the New York-based startup B12, where Aditya interned last summer. The paper’s abstract is as follows:
Checklists and guidelines have played an increasingly important role in complex tasks ranging from the cockpit to the operating theater. Their role in creative tasks like design is less explored. In a needfinding study with expert web designers, we identified designers’ challenges in adhering to a checklist of design guidelines. We built Critter, which addressed these challenges with three components: Dynamic Checklists that progressively disclose guideline complexity with a self-pruning hierarchical view, AutoQA to automate common quality assurance checks, and guideline-specific feedback provided by a reviewer to highlight mistakes as they appear. In an observational study, we found that the more engaged a designer was with Critter, the fewer mistakes they made in following design guidelines. Designers rated the AutoQA and contextual feedback experience highly, and provided feedback on the tradeoffs of the hierarchical Dynamic Checklists. We additionally found that a majority of designers rated the AutoQA experience as excellent and felt that it increased the quality of their work. Finally, we discuss broader implications for supporting complex creative tasks.
Two members of the Crowd Lab each had a paper accepted for presentation at the upcoming IUI 2019 conference in Los Angeles, CA. The acceptance rate for this conference, which focuses on the intersection of human-computer interaction and artificial intelligence, was 25%.
Crowd Lab Ph.D. student Vikram Mohanty will present “Photo Sleuth: Combining Human Expertise and Face Recognition to Identify Historical Portraits“, co-authored with undergraduate David Thames and Ph.D. student Sneha Mehta. Here is the paper’s abstract:
Identifying people in historical photographs is important for preserving material culture, correcting the historical record, and creating economic value, but it is also a complex and challenging task. In this paper, we focus on identifying portraits of soldiers who participated in the American Civil War (1861- 65), the first widely-photographed conflict. Many thousands of these portraits survive, but only 10–20% are identified. We created Photo Sleuth, a web-based platform that combines crowdsourced human expertise and automated face recognition to support Civil War portrait identification. Our mixed-methods evaluation of Photo Sleuth one month after its public launch showed that it helped users successfully identify unknown portraits and provided a sustainable model for volunteer contribution. We also discuss implications for crowd-AI interaction and person identification pipelines.
Crowd Lab Ph.D. student Tianyi Li will present “What Data Should I Protect? A Recommender and Impact Analysis Design to Assist Decision Making“, co-authored with Informatica colleagues Gregorio Convertino, Ranjeet Kumar Tayi, and Shima Kazerooni. Here is the paper’s abstract:
Major breaches of sensitive company data, as for Facebook’s 50 million user accounts in 2018 or Equifax’s 143 million user accounts in 2017, are showing the limitations of reactive data security technologies. Companies and government organizations are turning to proactive data security technologies that secure sensitive data at source. However, data security analysts still face two fundamental challenges in data protection decisions: 1) the information overload from the growing number of data repositories and protection techniques to consider; 2) the optimization of protection plans given the current goals and available resources in the organization. In this work, we propose an intelligent user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. In a domain with limited access to expert users and practices, we elicited user requirements from security analysts in industry and modeled data risks based on architectural and conceptual attributes. Our preliminary evaluation suggests that the design improves the understanding and trust of the recommended protections and helps convert risk information in protection plans.
Congratulations to Vikram, David, Sneha, Tianyi, and their collaborators!
Nai-Ching Wang, a Ph.D. student advised by Dr. Luther, successfully defended his dissertation today. His dissertation is titled, “Supporting Historical Research and Education with Crowdsourced Analysis of Primary Sources”, and his committee members were Dr. Luther (chair), Ed Fox, Gang Wang, and Paul Quigley, with Matt Lease (UT Austin School of Information) as the external member. Here is the abstract for his dissertation:
Historians, like many types of scholars, are often researchers and educators, and both roles involve significant interaction with primary sources. Primary sources are not only direct evidence for historical arguments but also important materials for teaching historical thinking skills to students in classrooms, and engaging the broader public. However, finding high quality primary sources that are relevant to a historian’s specialized topics of interest remains a significant challenge. Automated approaches to text analysis struggle to provide relevant results for these “long tail” searches with long semantic distances from the source material. Consequently, historians are often frustrated at spending so much time on manually the relevance of the contents of these archives other than writing and analysis. To overcome these challenges, my dissertation explores the use of crowdsourcing to support historians in analysis of primary sources. In four studies, I first proposed a class-sourcing model where historians outsource historical analysis to students as a teaching method and students learn historical thinking and gain authentic research experience while doing these analysis tasks. Incite, a realization of this model, deployed in 15 classrooms with positive feedback. Second, I expanded the class-sourcing model to a broader audience, novice (paid) crowds and developed the Read-agree-predict (RAP) technique to accurately evaluate relevance between primary sources and research topics. Third, I presented a set of design principles for crowdsourcing complex historical documents via the American Soldier project on Zooniverse. Finally, I developed CrowdSCIM to help crowds learn historical thinking and evaluated the tradeoffs between quality, learning and efficiency. The outcomes of the studies provide systems, techniques and design guidelines to 1) support historians in their research and teaching practices, 2) help crowd workers learn historical thinking and 3) suggest implications for the design of future crowdsourcing systems.
Congratulations Dr. Wang!
Our research investigating the use of crowd workers to analyze satellite imagery of tree canopy coverage was accepted as a poster for the American Geophysical Union (AGU 2018) fall meeting in Washington, DC. The lead author is Forestry Ph.D. student Jill Derwin, with co-authors Valerie Thomas, Randolph Wynne, S. Seth Peery, John Coulston, Dr. Luther, Greg Liknes, and Stacie Bender. The abstract for the poster, titled “Validating the 2011 and 2016 NLCD Tree Canopy Cover Products using Crowdsourced Interpretations“, is as follows:
The 2011 and 2016 National Land Cover Database (NLCD) Tree Canopy Cover (TCC) products utilize training data collected by experienced photo interpreters.. Observations of tree canopy cover were collected using 1-meter NAIP imagery overlaid on a dot grid. At each point in the dot grid, experts interpreted whether the point fell on canopy or not. The proportion of positive observations yields percent canopy cover. These data are used in conjunction with a set of 30-m resolution predictors (primarily Landsat imagery) to train a random forest model predicting TCC nationwide. We will test the use of crowdsourced observations of canopy cover to validate national products. Crowd-workers will apply the same training data photo interpretation methodology at plot locations across the United States subsampled from the public Forest Inventory and Analysis database . Each plot will have repeated samples, with multiple crowd observers interpreting each location. Using a multi-scale bootstrap-aggregation or ‘bagging’ approach at the plot- and dot-levels, we randomly select sets of interpretations from randomly chosen interpreters to train consecutive models. This bagging methodology is applied at both the plot level as well as the individual dot observations to test the within-plot crowd-sourced interpretation variance. We will compare the NLCD TCC models from 2011 and 2016 to multiple bagged samples and aggregated quality metrics such as the coefficient of determination and root mean square error to evaluate model quality. We will also compare these bagged samples to independent expert interpretations in order to gain insight into the quality of crowd interpretations themselves. This work provides insight into the utility of crowdsourced observations as validation of national tree canopy cover products. In addition to comparing aggregated crowd interpretations to expert measurements, identifying conditions that result in disagreement in interpreters’ observations may help to inform the methodology and to improve interpreter-training for the crowdsourcing task.
Our Civil War Photo Sleuth project got a burst of publicity in recent weeks, leading to hundreds of new site registrations and contributions. Here is a round-up of some highlights:
- Slate Magazine: Who’s Behind That Beard?
- Smithsonian Magazine: Facial Recognition Software Is Helping Identify Unknown Figures in Civil War Photographs
- Family Tree Magazine: Applying Facial Recognition to Civil War Photos
- Fast Company: Online sleuths are using face recognition to identify Civil War soldiers in old photographs
- Vintage News: Unknown Civil War Faces are Being Identified Through Facial Recognition App
- Daily Mail: The facial recognition software that could identify thousands of soldiers in American Civil War photographs
- Virginia Tech News: Computer science professor creates facial recognition software to identify Civil War portraits (shown above)
- Civil War Times Magazine: New Photo Research Tool (shown below)
Thanks to these media outlets for the great publicity!
Dr. Luther recently published a report about the GroupSight Workshop on Human Computation for Image and Video Analysis in AI Magazine. The workshop, held at the HCOMP 2017 conference in Quebec City, Canada, was co-organized by Dr. Luther, Danna Gurari, Genevieve Patterson, and Steve Branson. More information about the workshop can be found in a Follow the Crowd blog post by Dr. Luther.
Dr. Luther gave an invited presentation to the Department of Computer and Information Sciences at Virginia Military Institute (VMI). The title of his presentation was, “Solving Mysteries with Expert-Led Crowdsourcing“, and the abstract was as follows:
Investigators have enlisted the help of the public since the days of the first “wanted” posters, but in an era where extensive personal information, as well as powerful search tools, are widely available online, the public is increasingly taking matters into its own hands. Some of these crowdsourced investigations have solved crimes and located missing persons, while others have leveled false accusations or devolved into witch hunts. In this talk, Luther describes his lab’s recent efforts to develop software platforms that support effective, ethical crowdsourced investigations in domains such as history, journalism, and national security.
Two members of the Crowd Lab each had a paper accepted for presentation at the CSCW 2018 conference in Jersey City, NJ. The acceptance rate for this top-tier conference was 26%.
Ph.D. student Nai-Ching Wang presented “Exploring Trade-Offs Between Learning and Productivity in Crowdsourced History” with Virginia Tech professor of education David Hicks and Dr. Luther as co-authors. Here is the paper’s abstract:
Crowdsourcing more complex and creative tasks is seen as a desirable goal for both employers and workers, but these tasks traditionally require domain expertise. Employers can recruit only expert workers, but this approach does not scale well. Alternatively, employers can decompose complex tasks into simpler micro-tasks, but some domains, such as historical analysis, cannot be easily modularized in this way. A third approach is to train workers to learn the domain expertise. This approach offers clear benefits to workers, but is perceived as costly or infeasible for employers. In this paper, we explore the trade-offs between learning and productivity in training crowd workers to analyze historical documents. We compare CrowdSCIM, a novel approach that teaches historical thinking skills to crowd workers, with two crowd learning techniques from prior work and a baseline. Our evaluation (n=360) shows that CrowdSCIM allows workers to learn domain expertise while producing work of equal or higher quality versus other conditions, but efficiency is slightly lower.
Ph.D. student Tianyi Li presented “CrowdIA: Solving Mysteries with Crowdsourced Sensemaking” with Dr. Luther and Virginia Tech computer science professor Chris North as co-authors. Here is the paper’s abstract:
The increasing volume of text data is challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but we must overcome the challenge of modeling the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this paper, we explore how to crowdsource the sensemaking process via a pipeline of modularized steps connected by clearly defined inputs and outputs. Our pipeline restructures and partitions information into “context slices” for individual workers. We implemented CrowdIA, a software platform to enable unsupervised crowd sensemaking using our pipeline. With CrowdIA, crowds successfully solved two mysteries, and were one step away from solving the third. The crowd’s intermediate results revealed their reasoning process and provided evidence that justifies their conclusions. We suggest broader possibilities to optimize each component, as well as to evaluate and refine previous intermediate analyses to improve the final result.
Congratulations Nai-Ching and Tianyi!
Dr. Luther and his frequent collaborator Ron Coddington, editor and publisher of Military Images magazine, gave an invited presentation on Civil War photo sleuthing at the 18th annual Image of War Seminar in Alexandria, VA, hosted by the Center for Civil War Photography. The presentation included a brief history of American Civil War photography and a live demonstration of the Civil War Photo Sleuth website.