Holistic language processing : joint models of linguistic structure

Placeholder Show Content

Abstract/Contents

Abstract
Humans are much better than computers at understanding language. This is, in part, because humans naturally employ holistic language processing. They effortlessly keep track of many inter-related layers of low-level information, while simultaneously integrating in long-distance information from elsewhere in the conversation or document. This thesis is about joint models for natural language processing which also aim to capture the dependencies between different layers of information and between different parts of a document when making a decision. I address three aspects of holistic language processing. First, I present an information extraction model that includes long-distance links in order to jointly make decisions about related words which may be far away from one another in a document. Most information extraction systems use sequence models, such as linear-chain conditional random fields, which only have access to a small, local, context when making decisions. I show how to add long-distance links between related words which can be arbitrarily far apart with the document. Experiments show that these long-distance links can be used to improve performance on multiple tasks. I then move to jointly modeling different layers of information. First I present a sampling-based pipeline. In a typical linguistic annotation pipeline, different components are run one after another, and the best output from each is used as the input to the next stages. The pipeline I present is theoretically equivalent to passing the entire distribution from one stage to the next, instead of just the most likely output. Experimentally, this pipeline did outperform the typical, greedy pipeline, but did not outperform taking the k-best outputs at this stage. I follow this with a full joint model of parsing and named entity recognition. This joint model does not have the directionality constraints inherent in a pipeline, and both levels of annotation can directly influence and constrain one another. Experiments show that this joint model can produce significant improvements on both tasks. I then show how to further improve the joint model using additional data which has been annotated with only one type of structure, unlike the jointly annotated data needed by the original joint model. The additional data is incorporated using a hierarchical prior, which links feature weights between models for the different tasks. Lastly, I address the problem of multi-domain learning, where the goal is to jointly model different genres of text (annotated for the same task). This is once again done via a hierarchical prior which links the feature weights between the models for the different genres. Experiments show that this can technique can improve performance across all domains, though, not surprisingly, ones with smaller training corpora improve more.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2010
Issuance monographic
Language English

Creators/Contributors

Associated with Finkel, Jenny Rose
Associated with Stanford University, Computer Science Department
Primary advisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Koller, Daphne
Thesis advisor Ng, Andrew Y, 1976-
Advisor Jurafsky, Dan, 1962-
Advisor Koller, Daphne
Advisor Ng, Andrew Y, 1976-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Jenny Rose Finkel.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2010.
Location electronic resource

Access conditions

Copyright
© 2010 by Jenny Rose Finkel
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...