Learning feature representations for music classification

Placeholder Show Content

Abstract/Contents

Abstract
In the recent past music has become ubiquitous as digital data. The scale of music collections in some online music services surpasses ten million tracks. This significant growth and resulting changes in the music content industry pose challenges in terms of efficient and effective content search, retrieval and organization. The most common approach to these needs involves the use of text-based metadata or user data. However, limitations of these methods, such as popularity bias, have prompted research in content-based methods that use audio data directly. The content-based methods are generally composed of two processing modules--extracting features from audio and training a system using the features and ground truth. The audio features, the main interest of this thesis, are conventionally designed in a highly engineered manner based on acoustic knowledge, such as in mel-frequency cepstral coefficients (MFCCs) or chroma. As an alternative approach, there is increasing interest in learning features automatically from data without relying on domain knowledge or manual refinement. This feature representation approach has been studied primarily in the areas of computer vision or speech recognition. In this thesis, we investigate the learning-based feature representation with applications to content-based music information retrieval. Specifically, we suggest a data processing pipeline to effectively learn short-term acoustic dependencies from musical signals and build a song-level feature for music genre classification and music annotation/retrieval. While visualizing the learned acoustics patterns, we will attempt to interpret how they are associated with high-level musical semantics such as genre, emotion or song quality. Through a detailed analysis, we will show the effect of individual processing units in the pipeline and meta parameters of learning algorithms on performance. In addition to these tasks, we also examine the feature learning approach for classification-based piano transcriptions. Throughout experiments on popularly used datasets, we will show that the learned feature representations achieve results comparable to state-of-the-art algorithms or outperform them.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Nam, Juhan
Associated with Stanford University, Department of Music
Primary advisor Smith, Julius O. (Julius Orion)
Thesis advisor Smith, Julius O. (Julius Orion)
Thesis advisor Berger, Jonathan
Thesis advisor Slaney, Malcolm
Advisor Berger, Jonathan
Advisor Slaney, Malcolm

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Juhan Nam.
Note Submitted to the Department of Music.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Juhan Nam
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...