Inducing event schemas and their participants from unlabeled text

Chambers, Nathanael William; Stanford University, Computer Science Department

Inducing event schemas and their participants from unlabeled text

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fqk051hh1569" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The majority of information on the Internet is expressed in written text. Understanding and extracting this information is crucial to building intelligent systems that can organize this knowledge, but most algorithms focus on learning atomic facts and relations. For instance, we can reliably extract facts like "Stanford is a University" and "Professors teach Science" by observing redundant word patterns across a corpus. However, these facts do not capture richer knowledge like the way detonating a bomb is related to destroying a building, or that the perpetrator who was convicted must have been arrested. A structured model of these events and entities is needed to understand language across many genres, including news, blogs, and even social media. This dissertation describes a new approach to knowledge acquisition and extraction that learns rich structures of events (e.g., plant, detonate, destroy) and participants (e.g., suspect, target, victim) over a large corpus of news articles, beginning from scratch and without human involvement. As opposed to early event models in Natural Language Processing (NLP) such as scripts and frames, modern statistical approaches and advances in NLP now enable new representations and large-scale learning over many domains. This dissertation begins by describing a new model of events and entities called Narrative Event Schemas. A Narrative Event Schema is a collection of events that occur together in the real world, linked by the typical entities involved. I describe the representation itself, followed by a statistical learning algorithm that observes chains of entities repeatedly connecting the same sets of events within documents. The learning process extracts thousands of verbs within schemas from 14 years of newspaper data. I present novel contributions in the field of temporal ordering to build classifiers that order the events and infer likely schema orderings. I then present several new evaluations for the extracted knowledge. Finally, I apply Narrative Event Schemas to the field of Information Extraction, learning templates of events with sets of semantic roles. Most Information Extraction approaches assume foreknowledge of the domain's templates, but I instead start from scratch and learn schemas as templates, and then extract the entities from text as in a standard extraction task. My algorithm is the first to learn templates without human guidance, and its results approach those of supervised algorithms.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2011
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Chambers, Nathanael William
Associated with	Stanford University, Computer Science Department
Primary advisor	Jurafsky, Dan, 1962-
Thesis advisor	Jurafsky, Dan, 1962-
Thesis advisor	Manning, Christopher D
Thesis advisor	Ng, Andrew Y, 1976-
Advisor	Manning, Christopher D
Advisor	Ng, Andrew Y, 1976-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Nathanael William Chambers.
Note	Submitted to the Department of Computer Science.
Thesis	Ph.D. Stanford University 2011
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...