Studying people, organizations, and the web with statistical text models
Abstract/Contents
- Abstract
- From social networks to academic publications, information technologies have enabled companies, organizations, and governments to collect huge datasets about the world. Many of these datasets have major textual components organized by human-applied labels or tags, promising to improve our understanding of large scale topical and social phenomena through the words people write. Doing so requires tools that can discover and quantify word usage patterns that are interpretable, trustworthy, and flexible. In particular, the discovered patterns should exploit the implicit domain knowledge embodied in tags, labels, or other categories of interest, when they are available, and lend themselves to visual exploration and interpretation. This dissertation presents studies of topical structure of the tagged web, social language in microblogs, and innovation in academia through statistical analyses of text. Several new probabilistic topic models of metadata-enriched document collections are introduced, facilitating domain specific studies of words associated with tags, emoticons, library subject codes, and other human-provided labels. I find that tags improve high-level clustering of web pages; that language on Twitter can be quantified with respect to its role as substance, status, social, or style; and that interdisciplinary research consistently uses language that looks like academia's future. These results are evaluated both quantitatively, with gold standard and task driven metrics, and qualitatively with visualizations of the textual patterns discovered by the models.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2011 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Ramage, Daniel Robert |
---|---|
Associated with | Stanford University, Computer Science Department. |
Primary advisor | Manning, Christopher D |
Thesis advisor | Manning, Christopher D |
Thesis advisor | Jurafsky, Dan, 1962- |
Thesis advisor | McFarland, Daniel |
Advisor | Jurafsky, Dan, 1962- |
Advisor | McFarland, Daniel |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Daniel Ramage. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Ph. D. Stanford University 2011 |
Location | electronic resource |
Access conditions
- Copyright
- © 2011 by Daniel Robert Ramage
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...