Our paper, “Flud: a hybrid crowd-algorithm approach for visualizing biological networks,” was accepted to the CHI 2019 workshop titled, Where is the Human? Bridging the Gap Between AI and HCI, in Glasgow, Scotland. Congratulations to Crowd Lab co-authors Aditya Bharadwaj (Ph.D. student) and David Gwizdala (undergraduate researcher), as well as Yoonjin Kim and Aditya’s co-advisor, Dr. T.M. Murali.
Dr. Luther gave an invited presentation, titled “Solving Photo Mysteries with Expert-Led Crowdsourcing,” at the University of Washington’s DUB (Design, Use, Build) Seminar on February 27. Here is the abstract for the presentation:
Investigators in domains such as journalism, military intelligence, and human rights advocacy frequently analyze photographs of questionable or unknown provenance. These photos can provide invaluable leads and evidence, but even experts must invest significant time in each analysis, with no guarantee of success. Crowdsourcing, with its affordances for scalability and parallelization, has great potential to augment expert performance, but little is known about how crowds might fit into photo analysts’ complex workflows. In this talk, I present my group’s research with two communities: open-source investigators who geolocate and verify social media photos, and antiquarians who identify unknown persons in 19th-century portrait photography. Informed by qualitative studies of current practice, we developed a novel approach, expert-led crowdsourcing, that combines the complementary strengths of experts and crowds to solve photo mysteries. We built two software tools based on this approach, GroundTruth and Photo Sleuth, and evaluated them with real experts. I conclude by discussing some broader takeaways for crowdsourced investigations, sensemaking, and image analysis.
Dr. Luther was selected as one of eight Emerging Scholars by the American Civil War Museum in Richmond, VA. He will give an invited presentation on Civil War Photo Sleuth to audiences at the grand opening of the newly expanded museum on May 4. The goal of the program is to “highlight some of the most interesting work of the next generation of writers, communicators, and thinkers of Civil War era history/public history.”
Dr. Luther is the lead guest editor for an upcoming special issue of the journal ACM Transactions on Social Computing. The theme of the special issue, “Negotiating Truth and Trust in Socio-Technical Systems“, emerged from the Designing Socio-Technical Systems of Truth workshop that Dr. Luther led at Virginia Tech in March 2018. The special issue co-editors are Crowd Lab postdoc Jacob Thebault-Spieker, Andrea Kavanaugh (Virginia Tech), and Judd Antin (AirBnb).
Congrats to Crowd Lab Ph.D. student Aditya Bharadwaj for his accepted paper at the upcoming CHI 2019 conference in Glasgow, Scotland, in May. The acceptance rate for this top-tier human-computer interaction conference is 24%. The paper, titled “Critter: Augmenting Creative Work with Dynamic Checklists, Automated Quality Assurance, and Contextual Reviewer Feedback“, was co-authored with colleagues Pao Siangliulue and Adam Marcus at the New York-based startup B12, where Aditya interned last summer. The paper’s abstract is as follows:
Checklists and guidelines have played an increasingly important role in complex tasks ranging from the cockpit to the operating theater. Their role in creative tasks like design is less explored. In a needfinding study with expert web designers, we identified designers’ challenges in adhering to a checklist of design guidelines. We built Critter, which addressed these challenges with three components: Dynamic Checklists that progressively disclose guideline complexity with a self-pruning hierarchical view, AutoQA to automate common quality assurance checks, and guideline-specific feedback provided by a reviewer to highlight mistakes as they appear. In an observational study, we found that the more engaged a designer was with Critter, the fewer mistakes they made in following design guidelines. Designers rated the AutoQA and contextual feedback experience highly, and provided feedback on the tradeoffs of the hierarchical Dynamic Checklists. We additionally found that a majority of designers rated the AutoQA experience as excellent and felt that it increased the quality of their work. Finally, we discuss broader implications for supporting complex creative tasks.
Two members of the Crowd Lab each had a paper accepted for presentation at the upcoming IUI 2019 conference in Los Angeles, CA. The acceptance rate for this conference, which focuses on the intersection of human-computer interaction and artificial intelligence, was 25%.
Crowd Lab Ph.D. student Vikram Mohanty will present “Photo Sleuth: Combining Human Expertise and Face Recognition to Identify Historical Portraits“, co-authored with undergraduate David Thames and Ph.D. student Sneha Mehta. Here is the paper’s abstract:
Identifying people in historical photographs is important for preserving material culture, correcting the historical record, and creating economic value, but it is also a complex and challenging task. In this paper, we focus on identifying portraits of soldiers who participated in the American Civil War (1861- 65), the first widely-photographed conflict. Many thousands of these portraits survive, but only 10–20% are identified. We created Photo Sleuth, a web-based platform that combines crowdsourced human expertise and automated face recognition to support Civil War portrait identification. Our mixed-methods evaluation of Photo Sleuth one month after its public launch showed that it helped users successfully identify unknown portraits and provided a sustainable model for volunteer contribution. We also discuss implications for crowd-AI interaction and person identification pipelines.
Crowd Lab Ph.D. student Tianyi Li will present “What Data Should I Protect? A Recommender and Impact Analysis Design to Assist Decision Making“, co-authored with Informatica colleagues Gregorio Convertino, Ranjeet Kumar Tayi, and Shima Kazerooni. Here is the paper’s abstract:
Major breaches of sensitive company data, as for Facebook’s 50 million user accounts in 2018 or Equifax’s 143 million user accounts in 2017, are showing the limitations of reactive data security technologies. Companies and government organizations are turning to proactive data security technologies that secure sensitive data at source. However, data security analysts still face two fundamental challenges in data protection decisions: 1) the information overload from the growing number of data repositories and protection techniques to consider; 2) the optimization of protection plans given the current goals and available resources in the organization. In this work, we propose an intelligent user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. In a domain with limited access to expert users and practices, we elicited user requirements from security analysts in industry and modeled data risks based on architectural and conceptual attributes. Our preliminary evaluation suggests that the design improves the understanding and trust of the recommended protections and helps convert risk information in protection plans.
Congratulations to Vikram, David, Sneha, Tianyi, and their collaborators!
Nai-Ching Wang, a Ph.D. student advised by Dr. Luther, successfully defended his dissertation today. His dissertation is titled, “Supporting Historical Research and Education with Crowdsourced Analysis of Primary Sources”, and his committee members were Dr. Luther (chair), Ed Fox, Gang Wang, and Paul Quigley, with Matt Lease (UT Austin School of Information) as the external member. Here is the abstract for his dissertation:
Historians, like many types of scholars, are often researchers and educators, and both roles involve significant interaction with primary sources. Primary sources are not only direct evidence for historical arguments but also important materials for teaching historical thinking skills to students in classrooms, and engaging the broader public. However, finding high quality primary sources that are relevant to a historian’s specialized topics of interest remains a significant challenge. Automated approaches to text analysis struggle to provide relevant results for these “long tail” searches with long semantic distances from the source material. Consequently, historians are often frustrated at spending so much time on manually the relevance of the contents of these archives other than writing and analysis. To overcome these challenges, my dissertation explores the use of crowdsourcing to support historians in analysis of primary sources. In four studies, I first proposed a class-sourcing model where historians outsource historical analysis to students as a teaching method and students learn historical thinking and gain authentic research experience while doing these analysis tasks. Incite, a realization of this model, deployed in 15 classrooms with positive feedback. Second, I expanded the class-sourcing model to a broader audience, novice (paid) crowds and developed the Read-agree-predict (RAP) technique to accurately evaluate relevance between primary sources and research topics. Third, I presented a set of design principles for crowdsourcing complex historical documents via the American Soldier project on Zooniverse. Finally, I developed CrowdSCIM to help crowds learn historical thinking and evaluated the tradeoffs between quality, learning and efficiency. The outcomes of the studies provide systems, techniques and design guidelines to 1) support historians in their research and teaching practices, 2) help crowd workers learn historical thinking and 3) suggest implications for the design of future crowdsourcing systems.
Congratulations Dr. Wang!
Our research investigating the use of crowd workers to analyze satellite imagery of tree canopy coverage was accepted as a poster for the American Geophysical Union (AGU 2018) fall meeting in Washington, DC. The lead author is Forestry Ph.D. student Jill Derwin, with co-authors Valerie Thomas, Randolph Wynne, S. Seth Peery, John Coulston, Dr. Luther, Greg Liknes, and Stacie Bender. The abstract for the poster, titled “Validating the 2011 and 2016 NLCD Tree Canopy Cover Products using Crowdsourced Interpretations“, is as follows:
The 2011 and 2016 National Land Cover Database (NLCD) Tree Canopy Cover (TCC) products utilize training data collected by experienced photo interpreters.. Observations of tree canopy cover were collected using 1-meter NAIP imagery overlaid on a dot grid. At each point in the dot grid, experts interpreted whether the point fell on canopy or not. The proportion of positive observations yields percent canopy cover. These data are used in conjunction with a set of 30-m resolution predictors (primarily Landsat imagery) to train a random forest model predicting TCC nationwide. We will test the use of crowdsourced observations of canopy cover to validate national products. Crowd-workers will apply the same training data photo interpretation methodology at plot locations across the United States subsampled from the public Forest Inventory and Analysis database . Each plot will have repeated samples, with multiple crowd observers interpreting each location. Using a multi-scale bootstrap-aggregation or ‘bagging’ approach at the plot- and dot-levels, we randomly select sets of interpretations from randomly chosen interpreters to train consecutive models. This bagging methodology is applied at both the plot level as well as the individual dot observations to test the within-plot crowd-sourced interpretation variance. We will compare the NLCD TCC models from 2011 and 2016 to multiple bagged samples and aggregated quality metrics such as the coefficient of determination and root mean square error to evaluate model quality. We will also compare these bagged samples to independent expert interpretations in order to gain insight into the quality of crowd interpretations themselves. This work provides insight into the utility of crowdsourced observations as validation of national tree canopy cover products. In addition to comparing aggregated crowd interpretations to expert measurements, identifying conditions that result in disagreement in interpreters’ observations may help to inform the methodology and to improve interpreter-training for the crowdsourcing task.
Our Civil War Photo Sleuth project got a burst of publicity in recent weeks, leading to hundreds of new site registrations and contributions. Here is a round-up of some highlights:
- Slate Magazine: Who’s Behind That Beard?
- Smithsonian Magazine: Facial Recognition Software Is Helping Identify Unknown Figures in Civil War Photographs
- Family Tree Magazine: Applying Facial Recognition to Civil War Photos
- Fast Company: Online sleuths are using face recognition to identify Civil War soldiers in old photographs
- Vintage News: Unknown Civil War Faces are Being Identified Through Facial Recognition App
- Daily Mail: The facial recognition software that could identify thousands of soldiers in American Civil War photographs
- Virginia Tech News: Computer science professor creates facial recognition software to identify Civil War portraits (shown above)
- Civil War Times Magazine: New Photo Research Tool (shown below)
Thanks to these media outlets for the great publicity!
Dr. Luther recently published a report about the GroupSight Workshop on Human Computation for Image and Video Analysis in AI Magazine. The workshop, held at the HCOMP 2017 conference in Quebec City, Canada, was co-organized by Dr. Luther, Danna Gurari, Genevieve Patterson, and Steve Branson. More information about the workshop can be found in a Follow the Crowd blog post by Dr. Luther.