Learning open domain knowledge from text

Placeholder Show Content

Abstract/Contents

Abstract
The increasing availability of large text corpora holds the promise of acquiring an unprecedented amount of knowledge from this text. However, current techniques are either specialized to particular domains or do not scale to large corpora. This dissertation develops a new technique for learning open-domain knowledge from unstructured web-scale text corpora. A first application aims to capture common sense facts: given a candidate statement about the world and a large corpus of known facts, is the statement likely to be true? We appeal to a probabilistic relaxation of natural logic -- a logic which uses the syntax of natural language as its logical formalism -- to define a search problem from the query statement to its appropriate support in the knowledge base over valid (or approximately valid) logical inference steps. We show a 4x improvement in recall over lemmatized lookup for querying common sense facts, while maintaining above 90% precision. This approach is extended to handle longer, more complex premises by segmenting these utterances into a set of atomic statements entailed through natural logic. We evaluate this system in isolation by using it as the main component in an Open Information Extraction system, and show that it achieves a 3% absolute improvement in F1 compared to prior work on a competitive knowledge base population task. A remaining challenge is elegantly handling cases where we could not find a supporting premise for our query. To address this, we create an analogue of an evaluation function in gameplaying search: a shallow lexical classifier is folded into the search program to serve as a heuristic function to assess how likely we would have been to find a premise. Results on answering 4th grade science questions show that this method improves over both the classifier in isolation and a strong IR baseline, and outperforms prior work on the task.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2016
Issuance monographic
Language English

Creators/Contributors

Associated with Angeli, Gabor Gyorgy
Associated with Stanford University, Department of Computer Science.
Primary advisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Liang, Percy
Advisor Jurafsky, Dan, 1962-
Advisor Liang, Percy

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Gábor György Angeli.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2016.
Location electronic resource

Access conditions

Copyright
© 2016 by Gabor Gyorgy Angeli
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...