Exploiting speech structure for noise estimation in single channel speech enhancement

Cho, Eunjoon; Stanford University, Department of Electrical Engineering.

Exploiting speech structure for noise estimation in single channel speech enhancement

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffn051hh6090" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: When a speaker's voice is mixed with background noise, the noise can be a nuisance and greatly affect the overall perceived quality and intelligibility of the underlying speech. The ability to enhance noisy speech has strong implications for cell phones, teleconference systems, hearing aids and automatic speech recognition systems. In many applications we are constrained by having only a single microphone input of the noisy speech. Having to estimate the underlying clean speech without additional noise references makes the single channel speech enhancement problem challenging. Most single channel speech enhancement methods (e.g., spectral subtraction, Wiener filter, short time spectral amplitude estimator) require an accurate estimate of the background noise. The majority of prior work on background noise estimation assumes that noise is stationary or at least slowly varying with respect to speech. This, however, makes it difficult to reduce noise in more realistic environments where the background noise is non-stationary. In this dissertation we address two approaches to estimating the background noise by exploiting knowledge on the underlying speech structure. We first present a statistical model that incorporates the harmonic structure in voiced speech. By observing in-between the harmonics of voiced speech, a more accurate estimate of the back- ground noise can be obtained. The second approach trains a dictionary from clean speech to capture the formant structures of speech. The noise is detected using an outlier framework, where the noisy input is compared with an entry in the dictionary. By taking advantage of how the speech is structured, the noise estimate can be updated more promptly, and we can thus reduce the noise under non-stationary and more realistic environments.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2013
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Cho, Eunjoon
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Smith, Julius O. (Julius Orion)
Primary advisor	Widrow, Bernard, 1929-
Thesis advisor	Smith, Julius O. (Julius Orion)
Thesis advisor	Widrow, Bernard, 1929-
Thesis advisor	Schafer, Ronald W, 1938-
Advisor	Schafer, Ronald W, 1938-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Eunjoon Cho.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2013.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...