Learned Syntax Aware Embeddings Using Equivariant Graph Neural Networks

Placeholder Show Content

Abstract/Contents

Abstract
In this paper, we aim to improve foundational model embeddings by explicitly leveraging syntactic information. Present foundational models do not explicitly model structural information, which can be problematic in certain domains such as biochemistry. We model this syntactic information as a graph to auto-encode domain-specific information into the large model embeddings. Particularly, we implement an Equivariant GNN version of the SIWR model to further improve downstream performance by considering a given sequence as a vector backbone. We pose a generalized learning framework for Syntactically Aware Embeddings (SAEs) that extend across learning domains. We pretrain SAE models using a small amount of data (~ 30,000 samples) and test downstream performance on a variety of learning domains and tasks. We test SAEs on the NLP (GLUE and CoNLL benchmarks) and biochemical (SMP) domains. Across the NLP domain, the SAEs outperform baseline embeddings on 10/11 tasks. In the biochemical domain, we observe improvements in 11/20 tasks. SAEs show particular promise in settings with limited fine-tuning when compared to baselines.

Description

Type of resource text
Publication date May 19, 2023; May 10, 2023

Creators/Contributors

Author Soni, Pratham ORCiD icon https://orcid.org/0000-0001-7076-9817 (unverified)
Advisor Dror, Ron

Subjects

Subject Embeddings
Subject graph neural networks
Subject Natural language processing (Computer science)
Subject Biochemical engineering > Computer simulation
Genre Text
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).

Preferred citation

Preferred citation
Soni, P. (2023). Learned Syntax Aware Embeddings Using Equivariant Graph Neural Networks. Stanford Digital Repository. Available at https://purl.stanford.edu/pm430pg8453. https://doi.org/10.25740/pm430pg8453.

Collection

Undergraduate Theses, School of Engineering

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...