A hybrid model for timbre perception : quantitative representations of sound color and density

Placeholder Show Content

Abstract/Contents

Abstract
Timbre, or the quality of sound, is a fundamental attribute of sound. It is important in differentiating between musical sounds, speech utterances, everyday sounds in our environment, and novel synthetic sounds. This dissertation presents quantitative and perceptually valid metrics for sound color and density, where sound color denotes an instantaneous (or atemporal) spectral energy distribution, and density denotes the fine-scale temporal attribute of sound. In support of the proposed metrics, a series of psychoacoustic experiments was performed. The quantitative relationship between the spectral envelope and subjective perception of complex tones was investigated using Mel-frequency cepstral coefficients (MFCC) as a representation of sound color. The experiments consistently showed that the MFCC model provides a linear and orthogonal coordinate space for human perception of sound color. The statistics for all twelve MFCC were similar at average correlation (R-squared or R2) of 85%, suggesting that each MFCC contains perceptually important information. The regression coefficients did suggest, however, the lower-order Mel-cepstrum coefficients may be more important in human perception than the higher-order coefficients. The quantitative relationship between the fine-scale temporal attribute and subjective perception of noise-like stimuli was investigated using normalized echo density (NED). Regardless of the sound color of the noise-like stimuli, the absolute difference in NED showed a strong correlation to the perceived dissimilarity with R2 of 93% on average. The other experiments showed that NED could represent the density perception in a consistent and robust manner across bandwidths--static noise-like stimuli having similar NED values were perceived as similar regardless of their bandwidth. Overall, with these experiments, NED showed a strong linear correlation to human perception of density, along with robustness in estimating the perceived density across various bandwidths, demonstrating that NED is a promising model for density perception. The elusive nature of timbre description has been a barrier to music analysis, speech research, and psychoacoustics. It is hoped that the metrics presented in this dissertation will form the basis of a quantitative model of timbre perception.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Copyright date 2010
Publication date 2009, c2010; 2009
Issuance monographic
Language English

Creators/Contributors

Associated with Terasawa, Hiroko Shiraiwa
Associated with Stanford University, Department of Music
Primary advisor Berger, Jonathan
Thesis advisor Berger, Jonathan
Thesis advisor Chafe, Chris
Thesis advisor Smith, Julius O. (Julius Orion)
Advisor Chafe, Chris
Advisor Smith, Julius O. (Julius Orion)

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Hiroko Shiraiwa Terasawa.
Note Submitted to the Department of Music.
Thesis Thesis (Ph.D.)--Stanford University, 2010.
Location electronic resource

Access conditions

Copyright
© 2010 by Hiroko Shiraiwa Terasawa
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...