Our research investigating the use of crowd workers to analyze satellite imagery of tree canopy coverage was accepted as a poster for the American Geophysical Union (AGU 2018) fall meeting in Washington, DC. The lead author is Forestry Ph.D. student Jill Derwin, with co-authors Valerie Thomas, Randolph Wynne, S. Seth Peery, John Coulston, Dr. Luther, Greg Liknes, and Stacie Bender. The abstract for the poster, titled “Validating the 2011 and 2016 NLCD Tree Canopy Cover Products using Crowdsourced Interpretations“, is as follows:
The 2011 and 2016 National Land Cover Database (NLCD) Tree Canopy Cover (TCC) products utilize training data collected by experienced photo interpreters.. Observations of tree canopy cover were collected using 1-meter NAIP imagery overlaid on a dot grid. At each point in the dot grid, experts interpreted whether the point fell on canopy or not. The proportion of positive observations yields percent canopy cover. These data are used in conjunction with a set of 30-m resolution predictors (primarily Landsat imagery) to train a random forest model predicting TCC nationwide. We will test the use of crowdsourced observations of canopy cover to validate national products. Crowd-workers will apply the same training data photo interpretation methodology at plot locations across the United States subsampled from the public Forest Inventory and Analysis database . Each plot will have repeated samples, with multiple crowd observers interpreting each location. Using a multi-scale bootstrap-aggregation or ‘bagging’ approach at the plot- and dot-levels, we randomly select sets of interpretations from randomly chosen interpreters to train consecutive models. This bagging methodology is applied at both the plot level as well as the individual dot observations to test the within-plot crowd-sourced interpretation variance. We will compare the NLCD TCC models from 2011 and 2016 to multiple bagged samples and aggregated quality metrics such as the coefficient of determination and root mean square error to evaluate model quality. We will also compare these bagged samples to independent expert interpretations in order to gain insight into the quality of crowd interpretations themselves. This work provides insight into the utility of crowdsourced observations as validation of national tree canopy cover products. In addition to comparing aggregated crowd interpretations to expert measurements, identifying conditions that result in disagreement in interpreters’ observations may help to inform the methodology and to improve interpreter-training for the crowdsourcing task.
Investigators have enlisted the help of the public since the days of the first “wanted” posters, but in an era where extensive personal information, as well as powerful search tools, are widely available online, the public is increasingly taking matters into its own hands. Some of these crowdsourced investigations have solved crimes and located missing persons, while others have leveled false accusations or devolved into witch hunts. In this talk, Luther describes his lab’s recent efforts to develop software platforms that support effective, ethical crowdsourced investigations in domains such as history, journalism, and national security.
Crowdsourcing more complex and creative tasks is seen as a desirable goal for both employers and workers, but these tasks traditionally require domain expertise. Employers can recruit only expert workers, but this approach does not scale well. Alternatively, employers can decompose complex tasks into simpler micro-tasks, but some domains, such as historical analysis, cannot be easily modularized in this way. A third approach is to train workers to learn the domain expertise. This approach offers clear benefits to workers, but is perceived as costly or infeasible for employers. In this paper, we explore the trade-offs between learning and productivity in training crowd workers to analyze historical documents. We compare CrowdSCIM, a novel approach that teaches historical thinking skills to crowd workers, with two crowd learning techniques from prior work and a baseline. Our evaluation (n=360) shows that CrowdSCIM allows workers to learn domain expertise while producing work of equal or higher quality versus other conditions, but efficiency is slightly lower.
The increasing volume of text data is challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but we must overcome the challenge of modeling the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this paper, we explore how to crowdsource the sensemaking process via a pipeline of modularized steps connected by clearly defined inputs and outputs. Our pipeline restructures and partitions information into “context slices” for individual workers. We implemented CrowdIA, a software platform to enable unsupervised crowd sensemaking using our pipeline. With CrowdIA, crowds successfully solved two mysteries, and were one step away from solving the third. The crowd’s intermediate results revealed their reasoning process and provided evidence that justifies their conclusions. We suggest broader possibilities to optimize each component, as well as to evaluate and refine previous intermediate analyses to improve the final result.
Dr. Luther and his frequent collaborator Ron Coddington, editor and publisher of Military Images magazine, gave an invited presentation on Civil War photo sleuthing at the 18th annual Image of War Seminar in Alexandria, VA, hosted by the Center for Civil War Photography. The presentation included a brief history of American Civil War photography and a live demonstration of the Civil War Photo Sleuth website.
Congratulations to Crowd Lab Ph.D. student Sukrit Venkatagiri on his selection as one of 12 Graduate Student Fellows of the Rita Allen Foundation’s Misinformation Solutions Forum, which took place in October 2018 in Washington, DC. As a Graduate Fellow, Sukrit received a travel grant to attend the Forum and co-authored (with Amy Zhang of MIT) an essay that was published in the Forum’s proceedings.
On August 1, we held our public launch party for the Civil War Photo Sleuth website at the National Archives Building in Washington, DC. Our team spent the day helping new users (in person and online) get signed up and contributing to the site. Dr. Luther and Military Images editor Ron Coddington gave brief remarks, and we were joined by many distinguished guests, including Library of Congress and National Archives staff. The National Archives’ Innovation Hub provided the perfect setting for the event. We were also grateful for VT Computer Science and Civil War Times for event photography and social media coverage (more photos are available here).
A highlight of the event was sharing a new identification — made via the website — of a previously unknown Civil War soldier tintype from the Library of Congress collection. The donor of the photo, Tom Liljenquist, was present to receive the identification.
Exploring coordinated relationships (e.g., shared relationships between two sets of entities) is an important analytics task in a variety of real-world applications, such as discovering similarly behaved genes in bioinformatics, detecting malware collusions in cyber security, and identifying products bundles in marketing analysis. Coordinated relationships can be formalized as biclusters. In order to support visual exploration of biclusters, bipartite graphs based visualizations have been proposed, and edge bundling is used to show biclusters. However, it suffers from edge crossings due to possible overlaps of biclusters, and lacks in-depth understanding of its impact on user exploring biclusters in bipartite graphs. To address these, we propose a novel bicluster-based seriation technique that can reduce edge crossings in bipartite graphs drawing and conducted a user experiment to study the effect of edge bundling and this proposed technique on visualizing biclusters in bipartite graphs. We found that they both had impact on reducing entity visits for users exploring biclusters, and edge bundles helped them find more justified answers. Moreover, we identified four key trade-offs that inform the design of future bicluster visualizations. The study results suggest that edge bundling is critical for exploring biclusters in bipartite graphs, which helps to reduce low-level perceptual problems and support high-level inferences.
Dr. Luther gave an invited keynote presentation at Vietnam War / American War Stories: A Symposium on Conflict and Civic Engagement, hosted by the Institute for Digital Arts & Humanities at Indiana University-Bloomington. Other keynote speakers included included David Ferriero, the Archivist of the United States; and John Bodnar, Distinguished and Chancellor’s Professor of History at IU. Dr. Luther’s presentation was titled, “Rediscovering American War Experiences through Crowdsourcing and Computation,” and the abstract was as follows:
Stories of war are complex, varied, powerful, and fundamentally human. Thus, crowdsourcing can be a natural fit for deepening our understanding of war, both by scaling up research efforts and by providing compelling learning experiences. Yet, few crowdsourced history projects help the public to do more than read, collect, or transcribe primary sources. In this talk, I present three examples of augmenting crowdsourcing efforts with computational techniques to enable deeper public engagement and more advanced historical analysis around stories of war. In “Mapping the Fourth of July in the Civil War Era,” funded by the NHPRC, we explore how crowdsourcing and natural language processing (NLP) tools help participants learn historical thinking skills while connecting American Civil War-era documents to scholarly topics of interest. In “Civil War Photo Sleuth,” funded by the NSF, we combine crowdsourcing with face recognition technology to help participants rediscover the lost identities of photographs of American Civil War soldiers and sailors. And in “The American Soldier in World War II,” funded by the NEH, we bring together crowdsourcing, NLP, and visualization to help participants explore the attitudes of American GIs in their own words. Across all three projects, I discuss broader principles for designing tools, interfaces, and online communities to support more meaningful and valuable crowdsourced contributions to scholarship about war and conflict.