Advancing the use of crowdsourcing for data-intensive tasks

Das Sarma, Akash; Stanford University, Computer Science Department.

Advancing the use of crowdsourcing for data-intensive tasks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fyq724xv6320" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: All aspects of industry, scholarship, and society have recently been witnessing a rapid growth in the volume of available data and the demand for data-intensive analytics. While automated techniques built on machine learning algorithms are being applied to many problems, a large class of challenging tasks still require human intelligence and input. Crowdsourcing is an effective mechanism for addressing problems that are not easily solved by computers alone, and require human insight. Furthermore, crowdsourcing often plays a crucial role in the development of machine learning algorithms, by being the primary source of high quality labeled training data, and by serving as a tool for the verification of the output of machine-learned models. The goal of this thesis is to characterize the spectrum of crowdsourcing tasks that are posted on real marketplaces, and to develop new and optimized algorithms for some fundamental classes of crowdsourcing tasks. First, we analyze a dataset comprising over 27 million microtasks performed by over 70,000 workers issued to a large crowdsourcing marketplace between 2012-2016. Based on this dataset, we identify two fundamental classes of crowdsourcing task types that are very popular in the marketplace: (a) {\em filtering and rating}, where worker responses are allowed to be any number in a fixed, numerical range $\{1, 2, \ldots, R\}$; (b) {\em counting}, where responses are allowed to be any non-negative integer $\{0, 1, 2, \ldots \}$. Next, we design efficient algorithms for these task types. Specifically: (1) For {\em filtering and rating}, we design an algorithm to discover a globally-optimum maximum likelihood-based solution to identify true answers and worker accuracies. (2) For {\em counting}, we design algorithms to optimally aggregate crowdsourced responses, along with hybrid algorithms that combine crowdsourcing and computer vision techniques to improve the quality and reduce the costs of our inferences even further. Our findings and algorithms have broad ramifications for how to best use crowdsourcing for collecting or processing large volumes of data at low cost and high accuracy.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Das Sarma, Akash
Associated with	Stanford University, Computer Science Department.
Primary advisor	Widom, Jennifer
Thesis advisor	Widom, Jennifer
Thesis advisor	Garcia-Molina, Hector
Thesis advisor	Parameswaran, Aditya
Advisor	Garcia-Molina, Hector
Advisor	Parameswaran, Aditya

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Akash Das Sarma.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...