A hybrid model for timbre perception : quantitative representations of sound color and density
Abstract/Contents
- Abstract
- Timbre, or the quality of sound, is a fundamental attribute of sound. It is important in differentiating between musical sounds, speech utterances, everyday sounds in our environment, and novel synthetic sounds. This dissertation presents quantitative and perceptually valid metrics for sound color and density, where sound color denotes an instantaneous (or atemporal) spectral energy distribution, and density denotes the fine-scale temporal attribute of sound. In support of the proposed metrics, a series of psychoacoustic experiments was performed. The quantitative relationship between the spectral envelope and subjective perception of complex tones was investigated using Mel-frequency cepstral coefficients (MFCC) as a representation of sound color. The experiments consistently showed that the MFCC model provides a linear and orthogonal coordinate space for human perception of sound color. The statistics for all twelve MFCC were similar at average correlation (R-squared or R2) of 85%, suggesting that each MFCC contains perceptually important information. The regression coefficients did suggest, however, the lower-order Mel-cepstrum coefficients may be more important in human perception than the higher-order coefficients. The quantitative relationship between the fine-scale temporal attribute and subjective perception of noise-like stimuli was investigated using normalized echo density (NED). Regardless of the sound color of the noise-like stimuli, the absolute difference in NED showed a strong correlation to the perceived dissimilarity with R2 of 93% on average. The other experiments showed that NED could represent the density perception in a consistent and robust manner across bandwidths--static noise-like stimuli having similar NED values were perceived as similar regardless of their bandwidth. Overall, with these experiments, NED showed a strong linear correlation to human perception of density, along with robustness in estimating the perceived density across various bandwidths, demonstrating that NED is a promising model for density perception. The elusive nature of timbre description has been a barrier to music analysis, speech research, and psychoacoustics. It is hoped that the metrics presented in this dissertation will form the basis of a quantitative model of timbre perception.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Copyright date | 2010 |
Publication date | 2009, c2010; 2009 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Terasawa, Hiroko Shiraiwa | |
---|---|---|
Associated with | Stanford University, Department of Music | |
Primary advisor | Berger, Jonathan | |
Thesis advisor | Berger, Jonathan | |
Thesis advisor | Chafe, Chris | |
Thesis advisor | Smith, Julius O. (Julius Orion) | |
Advisor | Chafe, Chris | |
Advisor | Smith, Julius O. (Julius Orion) |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Hiroko Shiraiwa Terasawa. |
---|---|
Note | Submitted to the Department of Music. |
Thesis | Thesis (Ph.D.)--Stanford University, 2010. |
Location | electronic resource |
Access conditions
- Copyright
- © 2010 by Hiroko Shiraiwa Terasawa
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...