Inducing event schemas and their participants from unlabeled text

Placeholder Show Content

Abstract/Contents

Abstract
The majority of information on the Internet is expressed in written text. Understanding and extracting this information is crucial to building intelligent systems that can organize this knowledge, but most algorithms focus on learning atomic facts and relations. For instance, we can reliably extract facts like "Stanford is a University" and "Professors teach Science" by observing redundant word patterns across a corpus. However, these facts do not capture richer knowledge like the way detonating a bomb is related to destroying a building, or that the perpetrator who was convicted must have been arrested. A structured model of these events and entities is needed to understand language across many genres, including news, blogs, and even social media. This dissertation describes a new approach to knowledge acquisition and extraction that learns rich structures of events (e.g., plant, detonate, destroy) and participants (e.g., suspect, target, victim) over a large corpus of news articles, beginning from scratch and without human involvement. As opposed to early event models in Natural Language Processing (NLP) such as scripts and frames, modern statistical approaches and advances in NLP now enable new representations and large-scale learning over many domains. This dissertation begins by describing a new model of events and entities called Narrative Event Schemas. A Narrative Event Schema is a collection of events that occur together in the real world, linked by the typical entities involved. I describe the representation itself, followed by a statistical learning algorithm that observes chains of entities repeatedly connecting the same sets of events within documents. The learning process extracts thousands of verbs within schemas from 14 years of newspaper data. I present novel contributions in the field of temporal ordering to build classifiers that order the events and infer likely schema orderings. I then present several new evaluations for the extracted knowledge. Finally, I apply Narrative Event Schemas to the field of Information Extraction, learning templates of events with sets of semantic roles. Most Information Extraction approaches assume foreknowledge of the domain's templates, but I instead start from scratch and learn schemas as templates, and then extract the entities from text as in a standard extraction task. My algorithm is the first to learn templates without human guidance, and its results approach those of supervised algorithms.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Chambers, Nathanael William
Associated with Stanford University, Computer Science Department
Primary advisor Jurafsky, Dan, 1962-
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Manning, Christopher D
Thesis advisor Ng, Andrew Y, 1976-
Advisor Manning, Christopher D
Advisor Ng, Andrew Y, 1976-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Nathanael William Chambers.
Note Submitted to the Department of Computer Science.
Thesis Ph.D. Stanford University 2011
Location electronic resource

Access conditions

Copyright
© 2011 by Nathanael William Chambers
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...