Learning structured probabilistic models for semantic role labeling

Vickrey, David Terrell; Stanford University, Computer Science Department

Learning structured probabilistic models for semantic role labeling

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ftb941ng3551" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Teaching a computer to read is one of the most interesting and important artificial intelligence tasks. In this thesis, we focus on semantic role labeling (SRL), one important processing step on the road from raw text to a full semantic representation. Given an input sentence and a target verb in that sentence, the SRL task is to label the semantic arguments, or roles, of that verb. For example, in the sentence "Tom eats an apple, " the verb "eat" has two roles, Eater = "Tom" and Thing Eaten = "apple". Most SRL systems, including the ones presented in this thesis, take as input a syntactic analysis built by an automatic syntactic parser. SRL systems rely heavily on path features constructed from the syntactic parse, which capture the syntactic relationship between the target verb and the phrase being classified. However, there are several issues with these path features. First, the path feature does not always contain all relevant information for the SRL task. Second, the space of possible path features is very large, resulting in very sparse features that are hard to learn. In this thesis, we consider two ways of addressing these issues. First, we experiment with a number of variants of the standard syntactic features for SRL. We include a large number of syntactic features suggested by previous work, many of which are designed to reduce sparsity of the path feature. We also suggest several new features, most of which are designed to capture additional information about the sentence not included in the standard path feature. We build an SRL model using the best of these new and old features, and show that this model achieves performance competitive with current state-of-the-art. The second method we consider is a new methodology for SRL based on labeling canonical forms. A canonical form is a representation of a verb and its arguments that is abstracted away from the syntax of the input sentence. For example, "A car hit Bob" and "Bob was hit by a car" have the same canonical form, {Verb = "hit", Deep Subject = "a car", Deep Object = "a car"}. Labeling canonical forms makes it much easier to generalize between sentences with different syntax. To label canonical forms, we first need to automatically extract them given an input parse. We develop a system based on a combination of hand-coded rules and machine learning. This allows us to include a large amount of linguistic knowledge and also have the robustness of a machine learning system. Our system improves significantly over a strong baseline, demonstrating the viability of this new approach to SRL. This latter method involves learning a large, complex probabilistic model. In the model we present, exact learning is tractable, but there are several natural extensions to the model for which exact learning is not possible. This is quite a general issue; in many different application domains, we would like to use probabilistic models that cannot be learned exactly. We propose a new method for learning these kinds of models based on contrastive objectives. The main idea is to learn by comparing only a few possible values of the model, instead of all possible values. This method generalizes a standard learning method, pseudo-likelihood, and is closely related to another, contrastive divergence. Previous work has mostly focused on comparing nearby sets of values; we focus on non-local contrastive objectives, which compare arbitrary sets of values. We prove several theoretical results about our model, showing that contrastive objectives attempt to enforce probability ratio constraints between the compared values. Based on this insight, we suggest several methods for constructing contrastive objectives, including contrastive constraint generation (CCG), a cutting-plane style algorithm that iteratively builds a good contrastive objective based on finding high-scoring values. We evaluate CCG on a machine vision task, showing that it significantly outperforms pseudo-likelihood, contrastive divergence, as well as a state-of-the-art max-margin cutting-plane algorithm.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2010
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Vickrey, David Terrell
Associated with	Stanford University, Computer Science Department
Primary advisor	Koller, Daphne
Thesis advisor	Koller, Daphne
Thesis advisor	Manning, Christopher D
Thesis advisor	Ng, Andrew Y, 1976-
Advisor	Manning, Christopher D
Advisor	Ng, Andrew Y, 1976-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	David Vickrey.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph. D.)--Stanford University, 2010.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...