Dynamic strategies for crowdsourced data management
- As the world becomes ever-more connected to the Internet, crowdsourcing marketplaces such as Amazon Mechanical Turk give us a mechanism for the large-scale inclusion of humans into computational workflows. However, many crowdworkers make mistakes and disagree with one another, some workers are malicious and only contribute spam, and the crowd can often be both slow and expensive. Despite these many challenges, in this thesis we develop new algorithms that allow us to effectively utilize the crowd while still ensuring quick, low-cost, and accurate results. First, we consider the commonly encountered labeling or filtering problem, where we use the crowd to label or filter items in a dataset. We describe CrowdDQS (Crowd Dynamic Question Selection), a general-purpose system we developed that can reduce the cost of labeling by up to 6 times in practice by dynamically issuing questions to workers and automatically detecting and blocking poor workers. Next, we consider the maximum problem, where we are presented with a set of records, each with an unknown intrinsic score, and our goal is to use the crowd to find the record with the highest score. We develop hybrid strategies that judiciously use a combination of both a ratings interface and a comparisons interface to more efficiently find the maximum than typical single-interface strategies. Finally, we consider the problem of using the crowd to cluster together similar records or to perform entity resolution (ER). We significantly reduce the cost of pairwise crowd clustering approaches by soliciting the crowd for attribute labels on records, and then asking for pairwise judgments only between records with similar sets of attribute labels. We describe strategies which allow us to finely control the accuracy of our results while still maintaining significant cost reductions.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Khan, Asif R
|Stanford University, Department of Electrical Engineering.
|Statement of responsibility
|Asif R. Khan.
|Submitted to the Department of Electrical Engineering.
|Thesis (Ph.D.)--Stanford University, 2017.
- © 2017 by Asif Khan
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...