Natural language interfaces for semi-structured web pages
Abstract/Contents
- Abstract
- Question answering (QA) systems take natural language questions and then compute answers based on a knowledge source. This dissertation focuses on improving QA systems along two axes. First, instead of operating on knowledge sources with a fixed schema such as a database, we propose to use web pages, which contain a large amount of up-to-date open-domain information (high BREADTH), as the knowledge source. Second, we want the QA system to understand more complex questions and perform different types of multistep reasoning to compute the answer (high DEPTH). Unlike most previous works on retrieval-based QA (which operate on open-domain unstructured text but target only factoid questions) and knowledge-based QA (which can handle compositional questions but on knowledge sources with fixed schemata), we aim to address the two axes simultaneously. One important aspect of web pages is that they are semi-structured: they contain structural constructs such as tables and template-generated product listings, but the schemata of such structures are not known in advance by the QA system. To explore the semi-structured nature of web pages, we first investigate the task of extracting a list of entities from the web page based on the natural language specification (e.g., from "(What are) hiking trails near Baltimore", extract the trail names from a table column). Then, to increase the complexity of the questions, we next study the task of answering complex questions on open-domain semistructured web tables using question-answer pairs as supervision (e.g., answering "Where did the last 1st place finish occur?" in an athlete's statistics table). To handle compositional questions with different types of operations, we frame the task as learning a semantic parser, which maps questions into compositional logical forms that can be executed to get the answer. Our semantic parser can answer complex questions on unseen web tables and achieves an accuracy of 43.7% on the dataset. Overall, we show that while the unknown schema of the tables (increased BREADTH) and complexity in the questions (increased DEPTH) lead to an exploding search space of logical forms, our proposed methods control the search space to a manageable size, enabling us to train a QA system that can operate on open-domain web pages. The resulting QA system can potentially enable virtual assistants, search engines, and other similar products to handle a much wider range of user's utterances.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2019; ©2019 |
Publication date | 2019; 2019 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Pasupat, Panupong | |
---|---|---|
Degree supervisor | Liang, Percy | |
Thesis advisor | Liang, Percy | |
Thesis advisor | Jurafsky, Dan, 1962- | |
Thesis advisor | Manning, Christopher D | |
Degree committee member | Jurafsky, Dan, 1962- | |
Degree committee member | Manning, Christopher D | |
Associated with | Stanford University, Computer Science Department. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Panupong Pasupat. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2019. |
Location | electronic resource |
Access conditions
- Copyright
- © 2019 by Panupong Pasupat
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...