Advancing the use of crowdsourcing for data-intensive tasks

Placeholder Show Content

Abstract/Contents

Abstract
All aspects of industry, scholarship, and society have recently been witnessing a rapid growth in the volume of available data and the demand for data-intensive analytics. While automated techniques built on machine learning algorithms are being applied to many problems, a large class of challenging tasks still require human intelligence and input. Crowdsourcing is an effective mechanism for addressing problems that are not easily solved by computers alone, and require human insight. Furthermore, crowdsourcing often plays a crucial role in the development of machine learning algorithms, by being the primary source of high quality labeled training data, and by serving as a tool for the verification of the output of machine-learned models. The goal of this thesis is to characterize the spectrum of crowdsourcing tasks that are posted on real marketplaces, and to develop new and optimized algorithms for some fundamental classes of crowdsourcing tasks. First, we analyze a dataset comprising over 27 million microtasks performed by over 70,000 workers issued to a large crowdsourcing marketplace between 2012-2016. Based on this dataset, we identify two fundamental classes of crowdsourcing task types that are very popular in the marketplace: (a) {\em filtering and rating}, where worker responses are allowed to be any number in a fixed, numerical range $\{1, 2, \ldots, R\}$; (b) {\em counting}, where responses are allowed to be any non-negative integer $\{0, 1, 2, \ldots \}$. Next, we design efficient algorithms for these task types. Specifically: (1) For {\em filtering and rating}, we design an algorithm to discover a globally-optimum maximum likelihood-based solution to identify true answers and worker accuracies. (2) For {\em counting}, we design algorithms to optimally aggregate crowdsourced responses, along with hybrid algorithms that combine crowdsourcing and computer vision techniques to improve the quality and reduce the costs of our inferences even further. Our findings and algorithms have broad ramifications for how to best use crowdsourcing for collecting or processing large volumes of data at low cost and high accuracy.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English

Creators/Contributors

Associated with Das Sarma, Akash
Associated with Stanford University, Computer Science Department.
Primary advisor Widom, Jennifer
Thesis advisor Widom, Jennifer
Thesis advisor Garcia-Molina, Hector
Thesis advisor Parameswaran, Aditya
Advisor Garcia-Molina, Hector
Advisor Parameswaran, Aditya

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Akash Das Sarma.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

Copyright
© 2017 by Akash Das Sarma
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...