Learning feature representations for music classification

Nam, Juhan; Stanford University, Department of Music

Learning feature representations for music classification

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fjn972gn0355" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In the recent past music has become ubiquitous as digital data. The scale of music collections in some online music services surpasses ten million tracks. This significant growth and resulting changes in the music content industry pose challenges in terms of efficient and effective content search, retrieval and organization. The most common approach to these needs involves the use of text-based metadata or user data. However, limitations of these methods, such as popularity bias, have prompted research in content-based methods that use audio data directly. The content-based methods are generally composed of two processing modules--extracting features from audio and training a system using the features and ground truth. The audio features, the main interest of this thesis, are conventionally designed in a highly engineered manner based on acoustic knowledge, such as in mel-frequency cepstral coefficients (MFCCs) or chroma. As an alternative approach, there is increasing interest in learning features automatically from data without relying on domain knowledge or manual refinement. This feature representation approach has been studied primarily in the areas of computer vision or speech recognition. In this thesis, we investigate the learning-based feature representation with applications to content-based music information retrieval. Specifically, we suggest a data processing pipeline to effectively learn short-term acoustic dependencies from musical signals and build a song-level feature for music genre classification and music annotation/retrieval. While visualizing the learned acoustics patterns, we will attempt to interpret how they are associated with high-level musical semantics such as genre, emotion or song quality. Through a detailed analysis, we will show the effect of individual processing units in the pipeline and meta parameters of learning algorithms on performance. In addition to these tasks, we also examine the feature learning approach for classification-based piano transcriptions. Throughout experiments on popularly used datasets, we will show that the learned feature representations achieve results comparable to state-of-the-art algorithms or outperform them.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2012
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Nam, Juhan
Associated with	Stanford University, Department of Music
Primary advisor	Smith, Julius O. (Julius Orion)
Thesis advisor	Smith, Julius O. (Julius Orion)
Thesis advisor	Berger, Jonathan
Thesis advisor	Slaney, Malcolm
Advisor	Berger, Jonathan
Advisor	Slaney, Malcolm

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Juhan Nam.
Note	Submitted to the Department of Music.
Thesis	Thesis (Ph.D.)--Stanford University, 2012.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...