The holistic voice : examining vocal expression through context, perception, and production
Abstract/Contents
- Abstract
- In this dissertation, I explore the complex, crucial role of acoustic paralinguistic attributes in vocal expression, emphasizing the need for speech technologies to accurately capture these nuances. Central to my research are two key propositions: (1) the integration of context awareness and adaptive modifications in vocal expression production and perception significantly enhances the alignment of speech technologies with human communication, and (2) the introduction of the "vocal persona," a concept defined as a chosen set of vocal expressions that orient and respond to a communication context, enriching our understanding of both natural and synthesized voices. Employing a blend of qualitative research, audio signal processing, and machine learning, these studies examine the production, encoding, modification, and perception of paralinguistic attributes in both speech and singing. Together, the studies clarify and formalize the influences of multimodal context on vocal expression alongside the role of acoustic paralinguistic cues in communication. They result in the introduction of the vocal persona as a novel framework for holistic vocal expressiveness. The dissertation encompasses a comprehensive literature review on vocal expression and affect, expressive speech technologies, and the intersection of voice with context and personality, along with the role of machine learning in speech and audio processing. Perception studies on how affect is perceived in speech and song are included, as well as machine learning experiments, such as accent classification and prosodic context embeddings. The research extends to analysis for re-synthesis techniques and specific applications like tracking acoustic speech features post-pediatric traumatic brain injury. A significant portion of the dissertation is dedicated to the thematic analysis of vocal persona, proposing a model and framework for natural persona-guided expression. This research culminates in synthesizing these insights to propose a framework for persona-guided speech synthesis, including context-adaptive voice conversion and text-to-speech applications.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Noufi, Camille |
---|---|
Degree supervisor | Berger, Jonathan, 1954- |
Thesis advisor | Berger, Jonathan, 1954- |
Thesis advisor | Chafe, Chris |
Thesis advisor | Smith, Julius O. (Julius Orion) |
Degree committee member | Chafe, Chris |
Degree committee member | Smith, Julius O. (Julius Orion) |
Associated with | Stanford University, School of Humanities and Sciences |
Associated with | Stanford University, Department of Music |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Camille Noufi. |
---|---|
Note | Submitted to the Department of Music. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/fv310sy3280 |
Access conditions
- Copyright
- © 2023 by Camille Noufi
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...