Learned Syntax Aware Embeddings Using Equivariant Graph Neural Networks
Abstract/Contents
- Abstract
- In this paper, we aim to improve foundational model embeddings by explicitly leveraging syntactic information. Present foundational models do not explicitly model structural information, which can be problematic in certain domains such as biochemistry. We model this syntactic information as a graph to auto-encode domain-specific information into the large model embeddings. Particularly, we implement an Equivariant GNN version of the SIWR model to further improve downstream performance by considering a given sequence as a vector backbone. We pose a generalized learning framework for Syntactically Aware Embeddings (SAEs) that extend across learning domains. We pretrain SAE models using a small amount of data (~ 30,000 samples) and test downstream performance on a variety of learning domains and tasks. We test SAEs on the NLP (GLUE and CoNLL benchmarks) and biochemical (SMP) domains. Across the NLP domain, the SAEs outperform baseline embeddings on 10/11 tasks. In the biochemical domain, we observe improvements in 11/20 tasks. SAEs show particular promise in settings with limited fine-tuning when compared to baselines.
Description
Type of resource | text |
---|---|
Publication date | May 19, 2023; May 10, 2023 |
Creators/Contributors
Author | Soni, Pratham | https://orcid.org/0000-0001-7076-9817 (unverified) |
---|---|---|
Advisor | Dror, Ron |
Subjects
Subject | Embeddings |
---|---|
Subject | graph neural networks |
Subject | Natural language processing (Computer science) |
Subject | Biochemical engineering > Computer simulation |
Genre | Text |
Genre | Thesis |
Bibliographic information
Access conditions
- Use and reproduction
- User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).
Preferred citation
- Preferred citation
- Soni, P. (2023). Learned Syntax Aware Embeddings Using Equivariant Graph Neural Networks. Stanford Digital Repository. Available at https://purl.stanford.edu/pm430pg8453. https://doi.org/10.25740/pm430pg8453.
Collection
Undergraduate Theses, School of Engineering
View other items in this collection in SearchWorksContact information
- Contact
- prathams@stanford.edu
Also listed in
Loading usage metrics...