A hybrid model for timbre perception : quantitative representations of sound color and density

Terasawa, Hiroko Shiraiwa; Stanford University, Department of Music

A hybrid model for timbre perception : quantitative representations of sound color and density

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fdx445fh9189" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Timbre, or the quality of sound, is a fundamental attribute of sound. It is important in differentiating between musical sounds, speech utterances, everyday sounds in our environment, and novel synthetic sounds. This dissertation presents quantitative and perceptually valid metrics for sound color and density, where sound color denotes an instantaneous (or atemporal) spectral energy distribution, and density denotes the fine-scale temporal attribute of sound. In support of the proposed metrics, a series of psychoacoustic experiments was performed. The quantitative relationship between the spectral envelope and subjective perception of complex tones was investigated using Mel-frequency cepstral coefficients (MFCC) as a representation of sound color. The experiments consistently showed that the MFCC model provides a linear and orthogonal coordinate space for human perception of sound color. The statistics for all twelve MFCC were similar at average correlation (R-squared or R2) of 85%, suggesting that each MFCC contains perceptually important information. The regression coefficients did suggest, however, the lower-order Mel-cepstrum coefficients may be more important in human perception than the higher-order coefficients. The quantitative relationship between the fine-scale temporal attribute and subjective perception of noise-like stimuli was investigated using normalized echo density (NED). Regardless of the sound color of the noise-like stimuli, the absolute difference in NED showed a strong correlation to the perceived dissimilarity with R2 of 93% on average. The other experiments showed that NED could represent the density perception in a consistent and robust manner across bandwidths--static noise-like stimuli having similar NED values were perceived as similar regardless of their bandwidth. Overall, with these experiments, NED showed a strong linear correlation to human perception of density, along with robustness in estimating the perceived density across various bandwidths, demonstrating that NED is a promising model for density perception. The elusive nature of timbre description has been a barrier to music analysis, speech research, and psychoacoustics. It is hoped that the metrics presented in this dissertation will form the basis of a quantitative model of timbre perception.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Copyright date	2010
Publication date	2009, c2010; 2009
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Terasawa, Hiroko Shiraiwa
Associated with	Stanford University, Department of Music
Primary advisor	Berger, Jonathan
Thesis advisor	Berger, Jonathan
Thesis advisor	Chafe, Chris
Thesis advisor	Smith, Julius O. (Julius Orion)
Advisor	Chafe, Chris
Advisor	Smith, Julius O. (Julius Orion)

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Hiroko Shiraiwa Terasawa.
Note	Submitted to the Department of Music.
Thesis	Thesis (Ph.D.)--Stanford University, 2010.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...