Learned Syntax Aware Embeddings Using Equivariant Graph Neural Networks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpm430pg8453" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In this paper, we aim to improve foundational model embeddings by explicitly leveraging syntactic information. Present foundational models do not explicitly model structural information, which can be problematic in certain domains such as biochemistry. We model this syntactic information as a graph to auto-encode domain-specific information into the large model embeddings. Particularly, we implement an Equivariant GNN version of the SIWR model to further improve downstream performance by considering a given sequence as a vector backbone. We pose a generalized learning framework for Syntactically Aware Embeddings (SAEs) that extend across learning domains. We pretrain SAE models using a small amount of data (~ 30,000 samples) and test downstream performance on a variety of learning domains and tasks. We test SAEs on the NLP (GLUE and CoNLL benchmarks) and biochemical (SMP) domains. Across the NLP domain, the SAEs outperform baseline embeddings on 10/11 tasks. In the biochemical domain, we observe improvements in 11/20 tasks. SAEs show particular promise in settings with limited fine-tuning when compared to baselines.

Type of resource	text
Publication date	May 19, 2023; May 10, 2023

Author	Soni, Pratham	https://orcid.org/0000-0001-7076-9817 (unverified)
Advisor	Dror, Ron

Subject	Embeddings
Subject	graph neural networks
Subject	Natural language processing (Computer science)
Subject	Biochemical engineering > Computer simulation
Genre	Text
Genre	Thesis

DOI	https://doi.org/10.25740/pm430pg8453
Location	https://purl.stanford.edu/pm430pg8453

Use and reproduction: User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License: This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).

Preferred citation: Soni, P. (2023). Learned Syntax Aware Embeddings Using Equivariant Graph Neural Networks. Stanford Digital Repository. Available at https://purl.stanford.edu/pm430pg8453. https://doi.org/10.25740/pm430pg8453.

Undergraduate Theses, School of Engineering

Loading usage metrics...