Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context

Pattichis, Rebecca

doi:10.25740/nd602zq5759

Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnd602zq5759" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Translanguaging, or the act of using multiple languages within a speech utterance (e.g., sentence and/or word), is a global phenomenon for multilingual communities. In the context of the United States, translanguaging is a frequent occurrence among Latin American immigrant communities. While there are several large multilingual models such as XLM-RoBERTa and multilingual BERT, these models have been trained on and evaluated with parallel monolingual data. Upholding parallel monolingualism as the standard definition of multilingualism erases the language practices of many communities of color, including Latin American immigrants in the United States. The consequences are even worse for racialized children in the schooling system who may be labeled as English Language Learners (ELL) for the very notion that their fluency in multiple languages must be separate and apart. This ELL label has immediate consequences regarding future classes they have access to, as well as their own sentiment around and through their language practices. Moreover, there is currently no labeled NLP dataset that includes translanguaging between Spanish and English for the task of sentiment analysis. In collaboration with the Stanford Graduate School of Education, this research aims to center the voices of first-generation Indigenous Latin American immigrant students in NLP research through the task of sentiment analysis. Specifically, this thesis constructs the Interview Transcripts Dataset, an innovate trilingual dataset composed of transcribed interview data that contain instances of translanguaging, as well as a framework for developing these datasets. The findings of this project provide a promising starting point, and emphasize the need to leverage current pre-trained models on similar domains as well as develop a more robust large-scale dataset that centers translanguaging. Ultimately, translanguaging remains an open problem in NLP research tasks.

Description

Type of resource	text
Date modified	December 5, 2022
Publication date	June 8, 2022; May 2022

Creators/Contributors

Author	Pattichis, Rebecca
Thesis advisor	Manning, Christopher
Thesis advisor	Martínez, Ramón
Degree granting institution	Stanford University
Department	Department of Computer Science

Subjects

Subject	Translanguaging (Linguistics)
Subject	Education
Subject	Natural language processing (Computer science)
Subject	Multilingualism
Subject	Multilingualism in children
Subject	Transcription
Subject	Interviewing
Subject	Spanish language
Subject	Native language and education
Subject	Children of immigrants > Attitudes
Subject	Children of immigrants > Education (Elementary)
Subject	Zapotec language > Study and teaching (Elementary) > Spanish speakers
Genre	Text
Genre	Thesis

Bibliographic information

DOI	https://doi.org/10.25740/nd602zq5759
Location	https://purl.stanford.edu/nd602zq5759

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).

Preferred citation

Preferred citation: Pattichis, R. (2023). Centering the Voices of First-Generation Immigrant Youth: Multilingual NLP Methods in the Translanguaging Context. Stanford Digital Repository. Available at https://purl.stanford.edu/nd602zq5759

Collection

Undergraduate Theses, School of Engineering

View other items in this collection in SearchWorks

Contact information

Contact: pattichi@stanford.edu

Also listed in

View in SearchWorks

Loading usage metrics...