Interactive sound source separation

Placeholder Show Content

Abstract/Contents

Abstract
In applications such as audio denoising, music transcription, music remixing, and audio-based forensics, it is desirable to decompose a single-channel recording into its respective sources. One of the most promising and effective classes of methods to do so is based on non-negative matrix factorization (NMF) and related probabilistic latent variable models (PLVMs). Such techniques, however, typically perform poorly when no isolated training data is given and offer no mechanism to improve upon unsatisfactory results. To overcome these issues, we present a new interaction paradigm and separation algorithm for single-channel source separation. The method works by allowing an end-user to roughly paint on time-frequency displays of sound. The rough annotations are then used to constrain, regularize, or otherwise inform an NMF/PLVM algorithm using the framework of posterior regularization and to perform separation. The output estimates are presented back to the user and the entire process is repeated in an interactive manner, until a desired result is achieved. To test the proposed method, we developed and released an open-source software project embodying our approach, conducted user studies, and submitted separation results to a community-based signal separation evaluation campaign. For a variety of real-world tasks, we found that expert users of our proposed method can achieve state-of-the-art separation quality according to standard evaluation metrics, and inexperienced users can achieve good separation quality with minimal instruction. In addition, we show that our method can perform well with or without isolated training data and is relatively insensitive to model selection, thus improving upon past methods in a variety of ways. Overall, these results demonstrate that our proposed approach is both a general and powerful separation method and motivates further work on interactive approaches to source separation. To download the application, code, and audio/video demonstrations, please see http://ccrma.stanford.edu/~njb/thesis.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Bryan, Nicholas James
Associated with Stanford University, Department of Music.
Primary advisor Wang, Ge
Thesis advisor Wang, Ge
Thesis advisor Abel, Jonathan (Jonathan Stuart)
Thesis advisor Chafe, Chris
Thesis advisor Smith, Julius O. (Julius Orion)
Advisor Abel, Jonathan (Jonathan Stuart)
Advisor Chafe, Chris
Advisor Smith, Julius O. (Julius Orion)

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Nicholas J. Bryan.
Note Submitted to the Department of Music.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Nicholas James Bryan
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...