Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context

Placeholder Show Content

Abstract/Contents

Abstract
Translanguaging, or the act of using multiple languages within a speech utterance (e.g., sentence and/or word), is a global phenomenon for multilingual communities. In the context of the United States, translanguaging is a frequent occurrence among Latin American immigrant communities. While there are several large multilingual models such as XLM-RoBERTa and multilingual BERT, these models have been trained on and evaluated with parallel monolingual data. Upholding parallel monolingualism as the standard definition of multilingualism erases the language practices of many communities of color, including Latin American immigrants in the United States. The consequences are even worse for racialized children in the schooling system who may be labeled as English Language Learners (ELL) for the very notion that their fluency in multiple languages must be separate and apart. This ELL label has immediate consequences regarding future classes they have access to, as well as their own sentiment around and through their language practices. Moreover, there is currently no labeled NLP dataset that includes translanguaging between Spanish and English for the task of sentiment analysis. In collaboration with the Stanford Graduate School of Education, this research aims to center the voices of first-generation Indigenous Latin American immigrant students in NLP research through the task of sentiment analysis. Specifically, this thesis constructs the Interview Transcripts Dataset, an innovate trilingual dataset composed of transcribed interview data that contain instances of translanguaging, as well as a framework for developing these datasets. The findings of this project provide a promising starting point, and emphasize the need to leverage current pre-trained models on similar domains as well as develop a more robust large-scale dataset that centers translanguaging. Ultimately, translanguaging remains an open problem in NLP research tasks.

Description

Type of resource text
Date modified December 5, 2022
Publication date June 8, 2022; May 2022

Creators/Contributors

Author Pattichis, Rebecca
Thesis advisor Manning, Christopher
Thesis advisor Martínez, Ramón
Degree granting institution Stanford University
Department Department of Computer Science

Subjects

Subject Translanguaging (Linguistics)
Subject Education
Subject Natural language processing (Computer science)
Subject Multilingualism
Subject Multilingualism in children
Subject Transcription
Subject Interviewing
Subject Spanish language
Subject Native language and education
Subject Children of immigrants > Attitudes
Subject Children of immigrants > Education (Elementary)
Subject Zapotec language > Study and teaching (Elementary) > Spanish speakers
Genre Text
Genre Thesis

Bibliographic information

Access conditions

Preferred citation

Preferred citation
Pattichis, R. (2023). Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context. Stanford Digital Repository. Available at https://purl.stanford.edu/nd602zq5759

Collection

Undergraduate Theses, School of Engineering

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...