Close-microphone cross-talk cancellation in ensemble recordings with statistical estimation

Placeholder Show Content

Abstract/Contents

Abstract
While recording an ensemble of musicians, microphone cross-talk, or "bleed", is considered a nuisance by audio engineers. When two microphones pick up the same signal with a time delay, comb filtering artifacts are present. An expensive solution to the microphone bleed problem is to use acoustic isolation panels between musicians. Obviously, this is not feasible in a live setting. Simpler solutions include using directional microphones with specific polar patterns to pick up radiation from a desired direction and using the close-miking technique where microphones are placed at a distance of 5 - 50 cm from the sound source. Interference can become significant in such cases due to the effect of nearby strong reflective surfaces. The complexity lies in the fact that there is usually an arbitrary number and distribution of instruments and microphones, and results are influenced by the room acoustics of the studio where the ensemble is recorded. In this thesis, I propose statistically optimal estimators to cancel microphone bleed offline in the mixing and production stage. First, a calibration stage is proposed, where one instrument is played at a time and recorded by all the microphones. This single-input, multiple-output (SIMO) system is used to estimate an approximate relative transfer function matrix, which represents the acoustic path from each source to each microphone and encodes the room response, as well as the mic directivity and source radiation patterns. A convex cost function is derived in the time-frequency domain that simultaneously optimizes the sources and the relative transfer function matrix, which is assumed to be time-invariant. It is shown that minimizing this cost function gives the Maximum Likelihood (ML) estimate when the microphone signals are assumed to be normally distributed. The ML estimator is extended to include apriori statistics of the sources, and the Maximum Aposteriori Probability (MAP) estimator is derived. The proposed methods are evaluated against a state-of-the-art Multichannel Wiener Filter based algorithm on a simulated dataset of string quartet recordings in a shoebox room, and on a drum-kit recorded in the CCRMA recording studio. The results show that cross-talk cancellation is achieved while maintaining the perceptual quality of the separated sources.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Das, Orchisama
Degree supervisor Chafe, Chris
Degree supervisor Smith, Julius O. (Julius Orion)
Thesis advisor Chafe, Chris
Thesis advisor Smith, Julius O. (Julius Orion)
Thesis advisor Abel, Jonathan (Jonathan Stuart)
Degree committee member Abel, Jonathan (Jonathan Stuart)
Associated with Stanford University, Department of Music

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Orchisama Das.
Note Submitted to the Department of Music.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/nm995ps4623

Access conditions

Copyright
© 2021 by Orchisama Das
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...