Assessing the evolution of written language through data mining in large corpora

Placeholder Show Content

Abstract/Contents

Abstract
Across the centuries, the question of the origin of language has captivated the human imagination. Many theories have been proposed to address fundamental questions such as: Where do languages come from? How do they evolve? What are the societal drivers of this change? Historically, one of the biggest challenges in addressing these questions has been a lack of large-scale empirical data, which has made it difficult to rigorously test hypotheses and correlate language change with cultural patterns. My dissertation research analyzes how written Spanish and Portuguese in the Americas diverged from their European counterparts, focusing on within-language shifts. Methodologically, I carry out this research by data mining in large digitized corpora (~300,000 documents ranging from the twelfth century to today) as well as performing close reading and contextual analysis of selected material. The interdisciplinary nature of my research was carried out as collaborative work between the Division of Literatures, Cultures, and Languages, the Department of Biology and the Stanford Libraries. The breadth of my project requires close attention to historical and social context, which may often be drivers of the written language changes that we trace computationally. For instance, Portugal imposed a 300-year prohibition of the printing press and universities in Brazil, whereas these institutions were introduced into Hispanic America soon after the arrival of Columbus. Accordingly, I found that written Spanish changed relatively "smoothly", reflecting a continuous assimilation of changes in the spoken form, while comparable changes appear much more "abruptly" in written Brazilian Portuguese. This work has an extensive study of personal pronoun evolution in Portuguese, and have found that past prohibition of the printing press, coupled with shifts in literary movements reflecting increased national sentiment, was a major driver of pronoun shift in nineteenth-century Brazilian Portuguese (Chapters 1 and 2). In Chapter 3, I analyze the evolution of personal pronouns in Spanish and show how regional writing styles affected the overall pronoun shift. Finally, I hope that this research will shed light on the main complex socioeconomic and geographical factors that led to divergent evolution of written language between the Americas and the Iberian Peninsula.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2016
Issuance monographic
Language English

Creators/Contributors

Associated with García-García, Cuauhtémoc
Associated with Stanford University, Department of Iberian and Latin American Cultures.
Primary advisor Resina, Joan Ramon
Thesis advisor Resina, Joan Ramon
Thesis advisor Feldman, Marcus W
Thesis advisor Rocha, Marília Librandi
Thesis advisor Predmore, Michael P
Advisor Feldman, Marcus W
Advisor Rocha, Marília Librandi
Advisor Predmore, Michael P

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Cuauhtémoc García-García.
Note Submitted to the Department of Iberian and Latin American Cultures.
Thesis Thesis (Ph.D.)--Stanford University, 2016.
Location electronic resource

Access conditions

Copyright
© 2016 by Cuauhtemoc Garcia-Garcia
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...