Learning feature representations for music classification
Abstract/Contents
- Abstract
- In the recent past music has become ubiquitous as digital data. The scale of music collections in some online music services surpasses ten million tracks. This significant growth and resulting changes in the music content industry pose challenges in terms of efficient and effective content search, retrieval and organization. The most common approach to these needs involves the use of text-based metadata or user data. However, limitations of these methods, such as popularity bias, have prompted research in content-based methods that use audio data directly. The content-based methods are generally composed of two processing modules--extracting features from audio and training a system using the features and ground truth. The audio features, the main interest of this thesis, are conventionally designed in a highly engineered manner based on acoustic knowledge, such as in mel-frequency cepstral coefficients (MFCCs) or chroma. As an alternative approach, there is increasing interest in learning features automatically from data without relying on domain knowledge or manual refinement. This feature representation approach has been studied primarily in the areas of computer vision or speech recognition. In this thesis, we investigate the learning-based feature representation with applications to content-based music information retrieval. Specifically, we suggest a data processing pipeline to effectively learn short-term acoustic dependencies from musical signals and build a song-level feature for music genre classification and music annotation/retrieval. While visualizing the learned acoustics patterns, we will attempt to interpret how they are associated with high-level musical semantics such as genre, emotion or song quality. Through a detailed analysis, we will show the effect of individual processing units in the pipeline and meta parameters of learning algorithms on performance. In addition to these tasks, we also examine the feature learning approach for classification-based piano transcriptions. Throughout experiments on popularly used datasets, we will show that the learned feature representations achieve results comparable to state-of-the-art algorithms or outperform them.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2012 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Nam, Juhan | |
---|---|---|
Associated with | Stanford University, Department of Music | |
Primary advisor | Smith, Julius O. (Julius Orion) | |
Thesis advisor | Smith, Julius O. (Julius Orion) | |
Thesis advisor | Berger, Jonathan | |
Thesis advisor | Slaney, Malcolm | |
Advisor | Berger, Jonathan | |
Advisor | Slaney, Malcolm |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Juhan Nam. |
---|---|
Note | Submitted to the Department of Music. |
Thesis | Thesis (Ph.D.)--Stanford University, 2012. |
Location | electronic resource |
Access conditions
- Copyright
- © 2012 by Juhan Nam
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...