Efficient and scalable transfer learning for natural language processing

Placeholder Show Content

Abstract/Contents

Abstract
Neural networks work best when trained on large amounts of data, but most labeled datasets in natural language processing (NLP) are small. As a result, neural NLP models often overfit to idiosyncrasies and artifacts in their training data rather than learning generalizable patterns. Transfer learning offers a solution: instead of learning a single task from scratch and in isolation, the model can benefit from the wealth of text on the web or other tasks with rich annotations. This additional data enables the training of bigger, more expressive networks. However, it also dramatically increases the computational cost of training, with recent models taking up to hundreds of GPU years to train. To alleviate this cost, I develop transfer learning methods that learn much more efficiently than previous approaches while remaining highly scalable. First, I present a multi-task learning algorithm based on knowledge distillation that consistently improves over single-task training even when learning many diverse tasks. I next develop Cross-View Training, which revitalizes semi-supervised learning methods from the statistical era of NLP (self-training and co-training) while taking advantage of neural methods. The resulting models outperform pre-trained LSTM language models such as ELMo while training 10x faster. Lastly, I present ELECTRA, a self-supervised pre-training method for transformer networks based on energy-based models. ELECTRA learns 4x--10x faster than previous approaches such as BERT, resulting in excellent performance on natural language understanding tasks both when trained at large scale or even when it is trained on a single GPU.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Clark, Kevin Stefan
Degree supervisor Manning, Christopher D
Thesis advisor Manning, Christopher D
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Le, Quoc V
Degree committee member Jurafsky, Dan, 1962-
Degree committee member Le, Quoc V
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Kevin Stefan Clark.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/nq750zq8333

Access conditions

Copyright
© 2021 by Kevin Stefan Clark
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...