Noising and denoising natural language
Abstract/Contents
- Abstract
- Written communication is a crucial human activity, and one that has increasingly been altered by technology, including Artificial Intelligence. In this thesis, we describe an application of AI to build computational writing assistants that suggest edits to text, with the aim of correcting errors, improving fluency, and modifying style. Existing AI writing assistants are highly limited in their ability to provide useful suggestions due to the challenge of data sparsity. When state-of-the-art neural sequence transduction models are fed inputs that do not match the training distribution, low-quality and noninterpretable outputs often result. Thus such models must be trained on large parallel corpora—often on the order of millions of sentence pairs—to attain good performance, and even then, they continue to improve with more data. With the end goal of developing more effective AI writing assistants, this thesis addresses the challenge of data sparsity by investigating the effects and applications of noise in the sequence modeling setting. We begin with a theoretical analysis of the effects of sequence-level noise that illustrates the insufficiency of existing approaches for understanding and modeling such noise. With this understanding in place, we develop a method for synthesizing more diverse and realistic noise in natural language, thus remedying the need for parallel data for the task of "denoising" or suggesting edits to writing. To demonstrate the broader applicability of this method, we describe an extension to generate stylistic edits without parallel data between different styles. We close by describing an AI writing assistant that we deployed to validate the methods proposed in this thesis, along with findings to improve such AI assistants in production.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2018; ©2018 |
Publication date | 2018; 2018 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Xie, Ziang | |
---|---|---|
Degree supervisor | Jurafsky, Dan, 1962- | |
Degree supervisor | Ng, Andrew Y, 1976- | |
Thesis advisor | Jurafsky, Dan, 1962- | |
Thesis advisor | Ng, Andrew Y, 1976- | |
Thesis advisor | Savarese, Silvio | |
Degree committee member | Savarese, Silvio | |
Associated with | Stanford University, Computer Science Department. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Ziang Xie. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2018. |
Location | electronic resource |
Access conditions
- Copyright
- © 2018 by Ziang Xie
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...