Nuisance compensation and prosodic modeling on high-level speech tasks

Sanchez, Michelle Hewlett; Stanford University, Department of Electrical Engineering

Nuisance compensation and prosodic modeling on high-level speech tasks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnj926wh9436" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: As automatic speech processing has matured, research has expanded its focus from automatic speech recognition or keyword spotting to applications that focus on paralinguistic speech problems that aim to detect "beyond-the-words" information. Researchers have focused on automatically deriving speaker characteristics from speech and classifying speakers into categories ranging from age, identity, language, dialect, idiolect, and sociolect to truthfulness, cognitive health, and emotion. This dissertation focuses on three of these categories, namely, emotion recognition, psychological state detection, and speaker verification. One of the many difficulties in the areas of emotion recognition and psychological state detection is the lack of real data. Most publicly available emotion databases use acted speech, which is not representative of real speakers' emotions because the emotions are stereotypical and exaggerated. In this work, features and approaches that have been found successful in other speech areas, are applied to the new tasks of emotion and psychological state detection using three different databases of real non-acted speech. These various methods, including modeling cepstral features and prosodic features with Gaussian mixture models and applying nuisance compensation on cepstral features to reduce the speaker and channel variability, outperform the standard linear classifier approach on simple prosodic features. Because these techniques require large amounts of data and these emotion and psychological health databases are small, data with only speaker identity labels are used to initially train the models. Although this data is not tagged with emotion or psychological state labels, it is shown in this dissertation that this data can be used successfully for these tasks during training. In this dissertation, all emotion and psychological state detection tasks are binary classifications, including detecting fear vs. neutral in 911 emergency calls, distinguishing severely depressed from nondepressed older males, and differentiating high risk suicidal adults from both depressed and nondepressed adults in addition to adults who have ideas of committing suicide. With N-fold leave-one-out cross-validation, performance with these new systems is 19% better on average than a basic linear discriminative classifier that uses only prosodic features. Performance is also 17% better than state-of-the-art research on the same data. Results show that fear in 911 calls can be detected with 85% accuracy; and high risk suicidal males are discriminated from males with ideation, depressed males, and nondepressed males with 90% accuracy. Constrained speaker verification, or systems that model standard cepstral features that fall within particular types of speech regions, are studied. A question in modeling such systems is whether to constrain universal background model (UBM) training, joint factor analysis (JFA), or both. This question is explored, as well as how to optimize the UBM model size, using a corpus of Arabic male speakers. Over a large set of phonetic and prosodic constraints, the performance of a system using constrained JFA and UBM is found to be on average 5.2% better than when using constraint-independent (all frames) JFA and UBM. Further improvement is found from optimizing the UBM size based on the percentage of frames covered by the constraint.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2011
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Sanchez, Michelle Hewlett
Associated with	Stanford University, Department of Electrical Engineering
Primary advisor	El Gamal, Abbas A
Primary advisor	Gray, Robert M, 1943-
Thesis advisor	El Gamal, Abbas A
Thesis advisor	Gray, Robert M, 1943-
Thesis advisor	Ferrer, Luciana
Thesis advisor	Olshen, Richard A, 1942-
Advisor	Ferrer, Luciana
Advisor	Olshen, Richard A, 1942-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Michelle Hewlett Sanchez.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2011.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...