Tagging and other microtasks
- Over the past decade, the web has become increasingly participatory. Many web sites would be non-functional without the contribution of many tiny units of work by users and workers around the world. We call such tiny units of work microtasks. Microtasks usually represent less than five minutes of someone's time. However, microtasks can produce massive effects when pooled together. Examples of microtasks include tagging a photo with a descriptive keyword, rating a movie, or categorizing a product. This thesis explores tagging systems, one of the first places where unpaid microtasks became common. Tagging systems allow regular users to annotate keywords ("tags") to objects like URLs, photos, and videos. We begin by looking at social bookmarking systems, tagging systems where users tag URLs. We consider whether social bookmarking tags are useful for web search, finding that they often mirror other available metadata. We also show that social bookmarking tags can be predicted to varying degrees with two techniques: support vector machines and market basket data mining. To expand our understanding of tags, we look at social cataloging systems, tagging systems where users tag books. Social cataloging systems allow us to compare user generated tags and expert library terms that were created in parallel. We find that tags have important features like consistency, quality, and completeness in common with expert library terms. We also find that paid tagging can be an effective supplement to a tagging system. Finally, our work expands to all microtasks, rather than tagging alone. We propose a framework called Human Processing for programming with and studying paid and unpaid microtasks. We then develop a tool called HPROC for programming within this framework, primarily on top of a paid microtask marketplace called Amazon Mechanical Turk (AMT). Lastly, we describe Turkalytics, a system for monitoring of workers completing paid microtasks on AMT. We cover tagging from web search, machine learning, and library science perspectives, and work extensively with both the paid and unpaid microtasks which are becoming a fixture of the modern web.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Heymann, Paul Brian
|Stanford University, Computer Science Department
|Statement of responsibility
|Paul Brian Heymann.
|Submitted to the Department of Computer Science.
|Thesis (Ph.D.)--Stanford University, 2011.
- © 2011 by Paul Brian Heymann
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...