Nai-Ching Wang, a Ph.D. student advised by Dr. Luther, successfully defended his dissertation today. His dissertation is titled, “Supporting Historical Research and Education with Crowdsourced Analysis of Primary Sources”, and his committee members were Dr. Luther (chair), Ed Fox, Gang Wang, and Paul Quigley, with Matt Lease (UT Austin School of Information) as the external member. Here is the abstract for his dissertation:
Historians, like many types of scholars, are often researchers and educators, and both roles involve significant interaction with primary sources. Primary sources are not only direct evidence for historical arguments but also important materials for teaching historical thinking skills to students in classrooms, and engaging the broader public. However, finding high quality primary sources that are relevant to a historian’s specialized topics of interest remains a significant challenge. Automated approaches to text analysis struggle to provide relevant results for these “long tail” searches with long semantic distances from the source material. Consequently, historians are often frustrated at spending so much time on manually the relevance of the contents of these archives other than writing and analysis. To overcome these challenges, my dissertation explores the use of crowdsourcing to support historians in analysis of primary sources. In four studies, I first proposed a class-sourcing model where historians outsource historical analysis to students as a teaching method and students learn historical thinking and gain authentic research experience while doing these analysis tasks. Incite, a realization of this model, deployed in 15 classrooms with positive feedback. Second, I expanded the class-sourcing model to a broader audience, novice (paid) crowds and developed the Read-agree-predict (RAP) technique to accurately evaluate relevance between primary sources and research topics. Third, I presented a set of design principles for crowdsourcing complex historical documents via the American Soldier project on Zooniverse. Finally, I developed CrowdSCIM to help crowds learn historical thinking and evaluated the tradeoffs between quality, learning and efficiency. The outcomes of the studies provide systems, techniques and design guidelines to 1) support historians in their research and teaching practices, 2) help crowd workers learn historical thinking and 3) suggest implications for the design of future crowdsourcing systems.
Crowdsourcing more complex and creative tasks is seen as a desirable goal for both employers and workers, but these tasks traditionally require domain expertise. Employers can recruit only expert workers, but this approach does not scale well. Alternatively, employers can decompose complex tasks into simpler micro-tasks, but some domains, such as historical analysis, cannot be easily modularized in this way. A third approach is to train workers to learn the domain expertise. This approach offers clear benefits to workers, but is perceived as costly or infeasible for employers. In this paper, we explore the trade-offs between learning and productivity in training crowd workers to analyze historical documents. We compare CrowdSCIM, a novel approach that teaches historical thinking skills to crowd workers, with two crowd learning techniques from prior work and a baseline. Our evaluation (n=360) shows that CrowdSCIM allows workers to learn domain expertise while producing work of equal or higher quality versus other conditions, but efficiency is slightly lower.
The increasing volume of text data is challenging the cognitive capabilities of expert analysts. Machine learning and crowdsourcing present new opportunities for large-scale sensemaking, but we must overcome the challenge of modeling the overall process so that many distributed agents can contribute to suitable components asynchronously and meaningfully. In this paper, we explore how to crowdsource the sensemaking process via a pipeline of modularized steps connected by clearly defined inputs and outputs. Our pipeline restructures and partitions information into “context slices” for individual workers. We implemented CrowdIA, a software platform to enable unsupervised crowd sensemaking using our pipeline. With CrowdIA, crowds successfully solved two mysteries, and were one step away from solving the third. The crowd’s intermediate results revealed their reasoning process and provided evidence that justifies their conclusions. We suggest broader possibilities to optimize each component, as well as to evaluate and refine previous intermediate analyses to improve the final result.