Studying people, organizations, and the web with statistical text models

Placeholder Show Content

Abstract/Contents

Abstract
From social networks to academic publications, information technologies have enabled companies, organizations, and governments to collect huge datasets about the world. Many of these datasets have major textual components organized by human-applied labels or tags, promising to improve our understanding of large scale topical and social phenomena through the words people write. Doing so requires tools that can discover and quantify word usage patterns that are interpretable, trustworthy, and flexible. In particular, the discovered patterns should exploit the implicit domain knowledge embodied in tags, labels, or other categories of interest, when they are available, and lend themselves to visual exploration and interpretation. This dissertation presents studies of topical structure of the tagged web, social language in microblogs, and innovation in academia through statistical analyses of text. Several new probabilistic topic models of metadata-enriched document collections are introduced, facilitating domain specific studies of words associated with tags, emoticons, library subject codes, and other human-provided labels. I find that tags improve high-level clustering of web pages; that language on Twitter can be quantified with respect to its role as substance, status, social, or style; and that interdisciplinary research consistently uses language that looks like academia's future. These results are evaluated both quantitatively, with gold standard and task driven metrics, and qualitatively with visualizations of the textual patterns discovered by the models.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Ramage, Daniel Robert
Associated with Stanford University, Computer Science Department.
Primary advisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor McFarland, Daniel
Advisor Jurafsky, Dan, 1962-
Advisor McFarland, Daniel

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Daniel Ramage.
Note Submitted to the Department of Computer Science.
Thesis Ph. D. Stanford University 2011
Location electronic resource

Access conditions

Copyright
© 2011 by Daniel Robert Ramage
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...