Inner product matrix algorithms for transfer and adaptation of language representations

Placeholder Show Content

Abstract/Contents

Abstract
Language is our primary mode of communication. We use language to speak to one another, represent knowledge, interact with and program computers, and express ourselves. Resultantly, for many decades teaching computers to understand language has been a cornerstone task in Artificial Intelligence. The fundamental challenge in this field has been one of representation - how do we convert language, an arbitrary composition of discrete symbols, into the mathematical structures, vectors and matrices, with which Artificial Intelligence systems operate? Recent breakthroughs in learning this mapping, known as representation learning, have resulted in models which have achieved human parity on a number of language tasks - translation, question answering and text generation to name a few. Furthermore, these models have influenced and unified perceptual tasks in the fields of vision, acoustics, and decision making - advancing our ability to build intelligent systems. In this work, we design and analyze algorithms used for adapting language representations - the mapping between language as we perceive it to the vectors and matrices that computers can understand. How do we teach computers how to convert words, sentences, and documents to numerical form? Naturally, one may think that a good conversion of language to numbers would preserve similarity. Similar language such as "hotel" and "inn" and dissimilar language such as "good" and "bad" should be represented using close by and far away numbers respectively. The field of language representation learning seeks to answer these questions by encoding language as vectors and matrices. In this setting, the notion of similarity and dissimilarity is mathematically represented by inner products between vectors. Two recent neural network based models for learning representations of language, word embeddings and transformers, have led to breakthroughs by encoding these similarities and dissimilarities using unstructured large text corpora from the Internet. However, some fundamental challenges remain. In this work, we develop algorithms which allow computational models to adapt language representations to different domains, languages, and modalities - a line of work formally known as domain adaptation and transfer learning. This enables a single Artificial Intelligence model to understand legal reports, medical documents, financial due diligence, text from various languages, and even programming code. The unifying theme for the algorithms discussed will be that they operate on the inner products between language inputs - the source of encoded information in these representations. The initial works in this thesis will focus on domain adaptation which allows models to understand language that is specific to individual communities - such as that used by engineers, doctors, or lawyers. The first algorithm we discuss shows a metric equivalence between the Frobenius norm of the gram matrices of two sets of representations and the residual of the Orthogonal Procrustes problem. The former metric, the Global Anchor Method, is more general as it can be applied agnostic of dimensionalities. We highlight the benefits of this algorithm in adapting a conversational agent to perform diagnostics and troubleshooting in the networking domain. Next, we describe methods for domain adaptation of transformer language model tokenizers and highlight applications for domain specific text classification. The latter two works in this thesis detail recent approaches in transfer learning for connecting, or aligning, representations from different types of inputs such as text from different languages or correspondences between natural and programming languages. The first work in this line of research will discuss algorithms for mapping sets of representations to a common inner product space. We propose a supervised algorithm, Filtered Inner Product Projection, which operates on pairwise inner products and aligns a source and target embedding on common pieces of information. This approach is shown to provide state-of-the-art performance in word translation retrieval tasks. Lastly, I will discuss work which utilizes inner product matrices to assign batches in contrastive learning - a scalable framework which is commonly used to connect representations trained on different types of inputs such as modalities. In this work, an upper bound on the gap between the total and observed losses in standard contrastive learning settings can be relaxed to a Matrix Bandwidth Minimization problem. An efficient algorithm using bandwidth minimization heuristics, Tail Batch Sampling, is then designed, shown to reduce the gap between total and observed contrastive losses, and obtains state-of-the-art results in both sentence embedding and code search tasks.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Sachidananda, Vinayak
Degree supervisor Prabhakar, Balaji, 1967-
Thesis advisor Prabhakar, Balaji, 1967-
Thesis advisor Weissman, Tsachy
Thesis advisor Zhu, Chenguang, (Computer scientist)
Degree committee member Weissman, Tsachy
Degree committee member Zhu, Chenguang, (Computer scientist)
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Vinayak Sachidananda.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/xj717fb0201

Access conditions

Copyright
© 2022 by Vinayak Sachidananda

Also listed in

Loading usage metrics...