Pre-finetuning methods for domain and task adaptation, with applications to discourse and translation

Placeholder Show Content

Abstract/Contents

Abstract
A recently adopted standard technique for training models for NLP applications is to pretrain a model on a large generic dataset and then finetune the model on the target domain training data. However, the pretraining task often differs greatly from the target task. Target tasks can use data from different domains, incorporate custom labels, and require modifications to base models, such as adding parameters. Performance on downstream tasks can be improved by bridging the gap between the pretraining and finetuning phases. An intermediate phase, called pre-finetuning, can be used to adapt to the target domain prior to finetuning. Introducing an intermediate phase incorporates some domain knowledge into the model, resulting in better performance after finetuning. We study applications to discourse and translation, both of which are settings where high quality in-domain labels are expensive to acquire. We present a theoretical framework for language model adaptation and three applications of pre-finetuning, each an exemplar of pre-finetuning focused on the data, model and annotations respectively: (1) data selection for machine translation, (2) sentence-level objectives for discourse performance of language models and (3) weakly-supervised discourse relation recognition. We present empirical results on data selection for neural machine translation and show that pre-finetuning on a subset of pretraining data that is most similar to the target domain improves the performance of the final model. However, we show that while trivially selecting the most similar data improves performance, the optimal setting requires finding data that complements the target domain data rather than mirroring it. Then, we present Conpono, a novel objective introduced during pre-finetuning. The inter-sentence objective models discourse coherence and the distance between sentences. We show that by pre-finetuning a pretrained language model with Conpono, the model improves on the previous state-of-the art on discourse representation evaluation benchmarks. Lastly, we introduce DiscoMtB, a method for discourse representation learning to discover discourse structure that acts as pseudo-labels in a text corpus. The weakly supervised discourse relations are used to both create new pre-finetuning training data for sentence relation classification tasks and to augment text generation models as an interpretable knob for controlling generation to introduce more diversity to the generation space while maintaining discourse coherence in the generated text. This dissertation presents the benefits of a three-phase training process and presents three applications, which bridge the gap between pretraining and finetuning during pre-funetuning by adapting the data, model architecture and labels, respectively.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Iter, Dan
Degree supervisor Jurafsky, Dan, 1962-
Thesis advisor Jurafsky, Dan, 1962-
Thesis advisor Hashimoto, Tatsunori
Thesis advisor Liang, Percy
Degree committee member Hashimoto, Tatsunori
Degree committee member Liang, Percy
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Dan Iter.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/xn665xc5858

Access conditions

Copyright
© 2022 by Dan Iter
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...