Natural language interfaces for semi-structured web pages

Pasupat, Panupong

Natural language interfaces for semi-structured web pages

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxc062mv6268" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Question answering (QA) systems take natural language questions and then compute answers based on a knowledge source. This dissertation focuses on improving QA systems along two axes. First, instead of operating on knowledge sources with a fixed schema such as a database, we propose to use web pages, which contain a large amount of up-to-date open-domain information (high BREADTH), as the knowledge source. Second, we want the QA system to understand more complex questions and perform different types of multistep reasoning to compute the answer (high DEPTH). Unlike most previous works on retrieval-based QA (which operate on open-domain unstructured text but target only factoid questions) and knowledge-based QA (which can handle compositional questions but on knowledge sources with fixed schemata), we aim to address the two axes simultaneously. One important aspect of web pages is that they are semi-structured: they contain structural constructs such as tables and template-generated product listings, but the schemata of such structures are not known in advance by the QA system. To explore the semi-structured nature of web pages, we first investigate the task of extracting a list of entities from the web page based on the natural language specification (e.g., from "(What are) hiking trails near Baltimore", extract the trail names from a table column). Then, to increase the complexity of the questions, we next study the task of answering complex questions on open-domain semistructured web tables using question-answer pairs as supervision (e.g., answering "Where did the last 1st place finish occur?" in an athlete's statistics table). To handle compositional questions with different types of operations, we frame the task as learning a semantic parser, which maps questions into compositional logical forms that can be executed to get the answer. Our semantic parser can answer complex questions on unseen web tables and achieves an accuracy of 43.7% on the dataset. Overall, we show that while the unknown schema of the tables (increased BREADTH) and complexity in the questions (increased DEPTH) lead to an exploding search space of logical forms, our proposed methods control the search space to a manageable size, enabling us to train a QA system that can operate on open-domain web pages. The resulting QA system can potentially enable virtual assistants, search engines, and other similar products to handle a much wider range of user's utterances.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Pasupat, Panupong
Degree supervisor	Liang, Percy
Thesis advisor	Liang, Percy
Thesis advisor	Jurafsky, Dan, 1962-
Thesis advisor	Manning, Christopher D
Degree committee member	Jurafsky, Dan, 1962-
Degree committee member	Manning, Christopher D
Associated with	Stanford University, Computer Science Department.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Panupong Pasupat.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2019.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...