The holistic voice : examining vocal expression through context, perception, and production

Placeholder Show Content

Abstract/Contents

Abstract
In this dissertation, I explore the complex, crucial role of acoustic paralinguistic attributes in vocal expression, emphasizing the need for speech technologies to accurately capture these nuances. Central to my research are two key propositions: (1) the integration of context awareness and adaptive modifications in vocal expression production and perception significantly enhances the alignment of speech technologies with human communication, and (2) the introduction of the "vocal persona," a concept defined as a chosen set of vocal expressions that orient and respond to a communication context, enriching our understanding of both natural and synthesized voices. Employing a blend of qualitative research, audio signal processing, and machine learning, these studies examine the production, encoding, modification, and perception of paralinguistic attributes in both speech and singing. Together, the studies clarify and formalize the influences of multimodal context on vocal expression alongside the role of acoustic paralinguistic cues in communication. They result in the introduction of the vocal persona as a novel framework for holistic vocal expressiveness. The dissertation encompasses a comprehensive literature review on vocal expression and affect, expressive speech technologies, and the intersection of voice with context and personality, along with the role of machine learning in speech and audio processing. Perception studies on how affect is perceived in speech and song are included, as well as machine learning experiments, such as accent classification and prosodic context embeddings. The research extends to analysis for re-synthesis techniques and specific applications like tracking acoustic speech features post-pediatric traumatic brain injury. A significant portion of the dissertation is dedicated to the thematic analysis of vocal persona, proposing a model and framework for natural persona-guided expression. This research culminates in synthesizing these insights to propose a framework for persona-guided speech synthesis, including context-adaptive voice conversion and text-to-speech applications.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Noufi, Camille
Degree supervisor Berger, Jonathan, 1954-
Thesis advisor Berger, Jonathan, 1954-
Thesis advisor Chafe, Chris
Thesis advisor Smith, Julius O. (Julius Orion)
Degree committee member Chafe, Chris
Degree committee member Smith, Julius O. (Julius Orion)
Associated with Stanford University, School of Humanities and Sciences
Associated with Stanford University, Department of Music

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Camille Noufi.
Note Submitted to the Department of Music.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/fv310sy3280

Access conditions

Copyright
© 2023 by Camille Noufi
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...