Tagging and other microtasks

Placeholder Show Content

Abstract/Contents

Abstract
Over the past decade, the web has become increasingly participatory. Many web sites would be non-functional without the contribution of many tiny units of work by users and workers around the world. We call such tiny units of work microtasks. Microtasks usually represent less than five minutes of someone's time. However, microtasks can produce massive effects when pooled together. Examples of microtasks include tagging a photo with a descriptive keyword, rating a movie, or categorizing a product. This thesis explores tagging systems, one of the first places where unpaid microtasks became common. Tagging systems allow regular users to annotate keywords ("tags") to objects like URLs, photos, and videos. We begin by looking at social bookmarking systems, tagging systems where users tag URLs. We consider whether social bookmarking tags are useful for web search, finding that they often mirror other available metadata. We also show that social bookmarking tags can be predicted to varying degrees with two techniques: support vector machines and market basket data mining. To expand our understanding of tags, we look at social cataloging systems, tagging systems where users tag books. Social cataloging systems allow us to compare user generated tags and expert library terms that were created in parallel. We find that tags have important features like consistency, quality, and completeness in common with expert library terms. We also find that paid tagging can be an effective supplement to a tagging system. Finally, our work expands to all microtasks, rather than tagging alone. We propose a framework called Human Processing for programming with and studying paid and unpaid microtasks. We then develop a tool called HPROC for programming within this framework, primarily on top of a paid microtask marketplace called Amazon Mechanical Turk (AMT). Lastly, we describe Turkalytics, a system for monitoring of workers completing paid microtasks on AMT. We cover tagging from web search, machine learning, and library science perspectives, and work extensively with both the paid and unpaid microtasks which are becoming a fixture of the modern web.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Heymann, Paul Brian
Associated with Stanford University, Computer Science Department
Primary advisor Garcia-Molina, Hector
Thesis advisor Garcia-Molina, Hector
Thesis advisor Leskovec, Jurij
Thesis advisor Paepcke, Andreas
Advisor Leskovec, Jurij
Advisor Paepcke, Andreas

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Paul Brian Heymann.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2011.
Location electronic resource

Access conditions

Copyright
© 2011 by Paul Brian Heymann
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...