Advancing the use of crowdsourcing for data-intensive tasks
Abstract/Contents
- Abstract
- All aspects of industry, scholarship, and society have recently been witnessing a rapid growth in the volume of available data and the demand for data-intensive analytics. While automated techniques built on machine learning algorithms are being applied to many problems, a large class of challenging tasks still require human intelligence and input. Crowdsourcing is an effective mechanism for addressing problems that are not easily solved by computers alone, and require human insight. Furthermore, crowdsourcing often plays a crucial role in the development of machine learning algorithms, by being the primary source of high quality labeled training data, and by serving as a tool for the verification of the output of machine-learned models. The goal of this thesis is to characterize the spectrum of crowdsourcing tasks that are posted on real marketplaces, and to develop new and optimized algorithms for some fundamental classes of crowdsourcing tasks. First, we analyze a dataset comprising over 27 million microtasks performed by over 70,000 workers issued to a large crowdsourcing marketplace between 2012-2016. Based on this dataset, we identify two fundamental classes of crowdsourcing task types that are very popular in the marketplace: (a) {\em filtering and rating}, where worker responses are allowed to be any number in a fixed, numerical range $\{1, 2, \ldots, R\}$; (b) {\em counting}, where responses are allowed to be any non-negative integer $\{0, 1, 2, \ldots \}$. Next, we design efficient algorithms for these task types. Specifically: (1) For {\em filtering and rating}, we design an algorithm to discover a globally-optimum maximum likelihood-based solution to identify true answers and worker accuracies. (2) For {\em counting}, we design algorithms to optimally aggregate crowdsourced responses, along with hybrid algorithms that combine crowdsourcing and computer vision techniques to improve the quality and reduce the costs of our inferences even further. Our findings and algorithms have broad ramifications for how to best use crowdsourcing for collecting or processing large volumes of data at low cost and high accuracy.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2017 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Das Sarma, Akash | |
---|---|---|
Associated with | Stanford University, Computer Science Department. | |
Primary advisor | Widom, Jennifer | |
Thesis advisor | Widom, Jennifer | |
Thesis advisor | Garcia-Molina, Hector | |
Thesis advisor | Parameswaran, Aditya | |
Advisor | Garcia-Molina, Hector | |
Advisor | Parameswaran, Aditya |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Akash Das Sarma. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis (Ph.D.)--Stanford University, 2017. |
Location | electronic resource |
Access conditions
- Copyright
- © 2017 by Akash Das Sarma
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...