Dr. Luther gave an invited presentation, titled “Civil War Photo Sleuthing: Past, Present, and Future” at Civil War Photo Talks in Arlington, VA, co-sponsored by Military Images Magazine and Civil War Faces. Other invited speakers included Ann Shumard, National Portrait Gallery; Micah Messenheimer, Library of Congress; Bryan Cheeseboro, National Archives; and Rick Brown, Military Images. The abstract for Dr. Luther’s talk was as follows:
People have struggled to identify unknown soldiers and sailors in Civil War photos since even before the war ended. In this talk, I trace the 150-year history of photo sleuthing, showing how the passage of time has magnified some challenges, but also unlocked exciting new possibilities. I show how technologies like social media, face recognition, and digital archives allow us to solve photo mysteries that have eluded families and researchers for a century and a half.
Investigators in domains such as journalism, military intelligence, and human rights advocacy frequently analyze photographs of questionable or unknown provenance. These photos can provide invaluable leads and evidence, but even experts must invest significant time in each analysis, with no guarantee of success. Crowdsourcing, with its affordances for scalability and parallelization, has great potential to augment expert performance, but little is known about how crowds might fit into photo analysts’ complex workflows. In this talk, I present my group’s research with two communities: open-source investigators who geolocate and verify social media photos, and antiquarians who identify unknown persons in 19th-century portrait photography. Informed by qualitative studies of current practice, we developed a novel approach, expert-led crowdsourcing, that combines the complementary strengths of experts and crowds to solve photo mysteries. We built two software tools based on this approach, GroundTruth and Photo Sleuth, and evaluated them with real experts. I conclude by discussing some broader takeaways for crowdsourced investigations, sensemaking, and image analysis.
Dr. Luther was selected as one of eight Emerging Scholars by the American Civil War Museum in Richmond, VA. He will give an invited presentation on Civil War Photo Sleuth to audiences at the grand opening of the newly expanded museum on May 4. The goal of the program is to “highlight some of the most interesting work of the next generation of writers, communicators, and thinkers of Civil War era history/public history.”
Nai-Ching Wang, a Ph.D. student advised by Dr. Luther, successfully defended his dissertation today. His dissertation is titled, “Supporting Historical Research and Education with Crowdsourced Analysis of Primary Sources”, and his committee members were Dr. Luther (chair), Ed Fox, Gang Wang, and Paul Quigley, with Matt Lease (UT Austin School of Information) as the external member. Here is the abstract for his dissertation:
Historians, like many types of scholars, are often researchers and educators, and both roles involve significant interaction with primary sources. Primary sources are not only direct evidence for historical arguments but also important materials for teaching historical thinking skills to students in classrooms, and engaging the broader public. However, finding high quality primary sources that are relevant to a historian’s specialized topics of interest remains a significant challenge. Automated approaches to text analysis struggle to provide relevant results for these “long tail” searches with long semantic distances from the source material. Consequently, historians are often frustrated at spending so much time on manually the relevance of the contents of these archives other than writing and analysis. To overcome these challenges, my dissertation explores the use of crowdsourcing to support historians in analysis of primary sources. In four studies, I first proposed a class-sourcing model where historians outsource historical analysis to students as a teaching method and students learn historical thinking and gain authentic research experience while doing these analysis tasks. Incite, a realization of this model, deployed in 15 classrooms with positive feedback. Second, I expanded the class-sourcing model to a broader audience, novice (paid) crowds and developed the Read-agree-predict (RAP) technique to accurately evaluate relevance between primary sources and research topics. Third, I presented a set of design principles for crowdsourcing complex historical documents via the American Soldier project on Zooniverse. Finally, I developed CrowdSCIM to help crowds learn historical thinking and evaluated the tradeoffs between quality, learning and efficiency. The outcomes of the studies provide systems, techniques and design guidelines to 1) support historians in their research and teaching practices, 2) help crowd workers learn historical thinking and 3) suggest implications for the design of future crowdsourcing systems.
Our research investigating the use of crowd workers to analyze satellite imagery of tree canopy coverage was accepted as a poster for the American Geophysical Union (AGU 2018) fall meeting in Washington, DC. The lead author is Forestry Ph.D. student Jill Derwin, with co-authors Valerie Thomas, Randolph Wynne, S. Seth Peery, John Coulston, Dr. Luther, Greg Liknes, and Stacie Bender. The abstract for the poster, titled “Validating the 2011 and 2016 NLCD Tree Canopy Cover Products using Crowdsourced Interpretations“, is as follows:
The 2011 and 2016 National Land Cover Database (NLCD) Tree Canopy Cover (TCC) products utilize training data collected by experienced photo interpreters.. Observations of tree canopy cover were collected using 1-meter NAIP imagery overlaid on a dot grid. At each point in the dot grid, experts interpreted whether the point fell on canopy or not. The proportion of positive observations yields percent canopy cover. These data are used in conjunction with a set of 30-m resolution predictors (primarily Landsat imagery) to train a random forest model predicting TCC nationwide. We will test the use of crowdsourced observations of canopy cover to validate national products. Crowd-workers will apply the same training data photo interpretation methodology at plot locations across the United States subsampled from the public Forest Inventory and Analysis database . Each plot will have repeated samples, with multiple crowd observers interpreting each location. Using a multi-scale bootstrap-aggregation or ‘bagging’ approach at the plot- and dot-levels, we randomly select sets of interpretations from randomly chosen interpreters to train consecutive models. This bagging methodology is applied at both the plot level as well as the individual dot observations to test the within-plot crowd-sourced interpretation variance. We will compare the NLCD TCC models from 2011 and 2016 to multiple bagged samples and aggregated quality metrics such as the coefficient of determination and root mean square error to evaluate model quality. We will also compare these bagged samples to independent expert interpretations in order to gain insight into the quality of crowd interpretations themselves. This work provides insight into the utility of crowdsourced observations as validation of national tree canopy cover products. In addition to comparing aggregated crowd interpretations to expert measurements, identifying conditions that result in disagreement in interpreters’ observations may help to inform the methodology and to improve interpreter-training for the crowdsourcing task.
Investigators have enlisted the help of the public since the days of the first “wanted” posters, but in an era where extensive personal information, as well as powerful search tools, are widely available online, the public is increasingly taking matters into its own hands. Some of these crowdsourced investigations have solved crimes and located missing persons, while others have leveled false accusations or devolved into witch hunts. In this talk, Luther describes his lab’s recent efforts to develop software platforms that support effective, ethical crowdsourced investigations in domains such as history, journalism, and national security.
Crowdsourcing more complex and creative tasks is seen as a desirable goal for both employers and workers, but these tasks traditionally require domain expertise. Employers can recruit only expert workers, but this approach does not scale well. Alternatively, employers can decompose complex tasks into simpler micro-tasks, but some domains, such as historical analysis, cannot be easily modularized in this way. A third approach is to train workers to learn the domain expertise. This approach offers clear benefits to workers, but is perceived as costly or infeasible for employers. In this paper, we explore the trade-offs between learning and productivity in training crowd workers to analyze historical documents. We compare CrowdSCIM, a novel approach that teaches historical thinking skills to crowd workers, with two crowd learning techniques from prior work and a baseline. Our evaluation (n=360) shows that CrowdSCIM allows workers to learn domain expertise while producing work of equal or higher quality versus other conditions, but efficiency is slightly lower.
The increasing volume of text data is challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but we must overcome the challenge of modeling the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this paper, we explore how to crowdsource the sensemaking process via a pipeline of modularized steps connected by clearly defined inputs and outputs. Our pipeline restructures and partitions information into “context slices” for individual workers. We implemented CrowdIA, a software platform to enable unsupervised crowd sensemaking using our pipeline. With CrowdIA, crowds successfully solved two mysteries, and were one step away from solving the third. The crowd’s intermediate results revealed their reasoning process and provided evidence that justifies their conclusions. We suggest broader possibilities to optimize each component, as well as to evaluate and refine previous intermediate analyses to improve the final result.
Dr. Luther and his frequent collaborator Ron Coddington, editor and publisher of Military Images magazine, gave an invited presentation on Civil War photo sleuthing at the 18th annual Image of War Seminar in Alexandria, VA, hosted by the Center for Civil War Photography. The presentation included a brief history of American Civil War photography and a live demonstration of the Civil War Photo Sleuth website.
Dr. Luther gave an invited keynote presentation at Vietnam War / American War Stories: A Symposium on Conflict and Civic Engagement, hosted by the Institute for Digital Arts & Humanities at Indiana University-Bloomington. Other keynote speakers included included David Ferriero, the Archivist of the United States; and John Bodnar, Distinguished and Chancellor’s Professor of History at IU. Dr. Luther’s presentation was titled, “Rediscovering American War Experiences through Crowdsourcing and Computation,” and the abstract was as follows:
Stories of war are complex, varied, powerful, and fundamentally human. Thus, crowdsourcing can be a natural fit for deepening our understanding of war, both by scaling up research efforts and by providing compelling learning experiences. Yet, few crowdsourced history projects help the public to do more than read, collect, or transcribe primary sources. In this talk, I present three examples of augmenting crowdsourcing efforts with computational techniques to enable deeper public engagement and more advanced historical analysis around stories of war. In “Mapping the Fourth of July in the Civil War Era,” funded by the NHPRC, we explore how crowdsourcing and natural language processing (NLP) tools help participants learn historical thinking skills while connecting American Civil War-era documents to scholarly topics of interest. In “Civil War Photo Sleuth,” funded by the NSF, we combine crowdsourcing with face recognition technology to help participants rediscover the lost identities of photographs of American Civil War soldiers and sailors. And in “The American Soldier in World War II,” funded by the NEH, we bring together crowdsourcing, NLP, and visualization to help participants explore the attitudes of American GIs in their own words. Across all three projects, I discuss broader principles for designing tools, interfaces, and online communities to support more meaningful and valuable crowdsourced contributions to scholarship about war and conflict.