Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context
Abstract/Contents
- Abstract
- Translanguaging, or the act of using multiple languages within a speech utterance (e.g., sentence and/or word), is a global phenomenon for multilingual communities. In the context of the United States, translanguaging is a frequent occurrence among Latin American immigrant communities. While there are several large multilingual models such as XLM-RoBERTa and multilingual BERT, these models have been trained on and evaluated with parallel monolingual data. Upholding parallel monolingualism as the standard definition of multilingualism erases the language practices of many communities of color, including Latin American immigrants in the United States. The consequences are even worse for racialized children in the schooling system who may be labeled as English Language Learners (ELL) for the very notion that their fluency in multiple languages must be separate and apart. This ELL label has immediate consequences regarding future classes they have access to, as well as their own sentiment around and through their language practices. Moreover, there is currently no labeled NLP dataset that includes translanguaging between Spanish and English for the task of sentiment analysis. In collaboration with the Stanford Graduate School of Education, this research aims to center the voices of first-generation Indigenous Latin American immigrant students in NLP research through the task of sentiment analysis. Specifically, this thesis constructs the Interview Transcripts Dataset, an innovate trilingual dataset composed of transcribed interview data that contain instances of translanguaging, as well as a framework for developing these datasets. The findings of this project provide a promising starting point, and emphasize the need to leverage current pre-trained models on similar domains as well as develop a more robust large-scale dataset that centers translanguaging. Ultimately, translanguaging remains an open problem in NLP research tasks.
Description
Type of resource | text |
---|---|
Date modified | December 5, 2022 |
Publication date | June 8, 2022; May 2022 |
Creators/Contributors
Author | Pattichis, Rebecca |
---|---|
Thesis advisor | Manning, Christopher |
Thesis advisor | Martínez, Ramón |
Degree granting institution | Stanford University |
Department | Department of Computer Science |
Subjects
Subject | Translanguaging (Linguistics) |
---|---|
Subject | Education |
Subject | Natural language processing (Computer science) |
Subject | Multilingualism |
Subject | Multilingualism in children |
Subject | Transcription |
Subject | Interviewing |
Subject | Spanish language |
Subject | Native language and education |
Subject | Children of immigrants > Attitudes |
Subject | Children of immigrants > Education (Elementary) |
Subject | Zapotec language > Study and teaching (Elementary) > Spanish speakers |
Genre | Text |
Genre | Thesis |
Bibliographic information
Access conditions
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).
Preferred citation
- Preferred citation
- Pattichis, R. (2023). Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context. Stanford Digital Repository. Available at https://purl.stanford.edu/nd602zq5759
Collection
Undergraduate Theses, School of Engineering
View other items in this collection in SearchWorksContact information
- Contact
- pattichi@stanford.edu
Also listed in
Loading usage metrics...