Noising and denoising natural language

Placeholder Show Content

Abstract/Contents

Abstract
Written communication is a crucial human activity, and one that has increasingly been altered by technology, including Artificial Intelligence. In this thesis, we describe an application of AI to build computational writing assistants that suggest edits to text, with the aim of correcting errors, improving fluency, and modifying style. Existing AI writing assistants are highly limited in their ability to provide useful suggestions due to the challenge of data sparsity. When state-of-the-art neural sequence transduction models are fed inputs that do not match the training distribution, low-quality and noninterpretable outputs often result. Thus such models must be trained on large parallel corpora—often on the order of millions of sentence pairs—to attain good performance, and even then, they continue to improve with more data. With the end goal of developing more effective AI writing assistants, this thesis addresses the challenge of data sparsity by investigating the effects and applications of noise in the sequence modeling setting. We begin with a theoretical analysis of the effects of sequence-level noise that illustrates the insufficiency of existing approaches for understanding and modeling such noise. With this understanding in place, we develop a method for synthesizing more diverse and realistic noise in natural language, thus remedying the need for parallel data for the task of "denoising" or suggesting edits to writing. To demonstrate the broader applicability of this method, we describe an extension to generate stylistic edits without parallel data between different styles. We close by describing an AI writing assistant that we deployed to validate the methods proposed in this thesis, along with findings to improve such AI assistants in production.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Xie, Ziang
Degree supervisor Jurafsky, Dan, 1962-
Degree supervisor Ng, Andrew Y, 1976-
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Ng, Andrew Y, 1976-
Thesis advisor Savarese, Silvio
Degree committee member Savarese, Silvio
Associated with Stanford University, Computer Science Department.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Ziang Xie.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Ziang Xie
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...