Low power audio feature extraction for machine learning applications

Villamizar, Daniel Augusto

Low power audio feature extraction for machine learning applications

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxf872vs2626" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Always-on sound classification is a desirable but power-intensive function for a variety of emerging applications such as wearables and IoT devices. The hardware energy consumption of sound classifiers is typically driven by signal digitization, feature extraction, classification model storage, and execution. Yet, the complexity of the signal of interest is often much lower than that of the raw signal acquired from the microphone and processed by a machine learning engine. For instance, semi-stationary sounds (e.g., engine noise, baby cry, running water, human chatter, etc.) are signals with lower information content than more complex sounds such as music or speech. In this dissertation, I will present the benefits of leveraging an engineered feature set to efficiently classify semi-stationary sounds. This approach requires one to three orders of magnitude fewer parameters and can be therefore trained over ten times faster than competitive deep learning models. I will also describe a circuit topology and system architecture that can be used to extract both engineered features as well as more general purpose ones. Our work resulted in a 32-channel analog filterbank IC for audio front-end signal processing. It employs a passive N-path switched capacitor topology to achieve high power efficiency and reconfigurability. The circuit's unwanted harmonic mixing products are absorbed by the machine learning model during training. To enable a systematic pre-silicon study of this effect, we develop a computationally efficient circuit model that can process large machine learning datasets in practical run-times. Measured results using a 130 nm CMOS prototype IC indicate competitive classification accuracy on datasets for baby cry detection (93.7% AUC) and voice commands (92.4% average precision), while lowering the feature extraction energy compared to digital realizations by approximately 2x and 10x, respectively. The 1.44 mm2 chip consumes 800 nW, which corresponds to the lowest normalized power per simultaneously sampled channel in recent literature.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Villamizar, Daniel Augusto
Degree supervisor	Murmann, Boris
Thesis advisor	Murmann, Boris
Thesis advisor	Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Thesis advisor	Rivas-Davila, Juan
Degree committee member	Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Degree committee member	Rivas-Davila, Juan
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Daniel Augusto Villamizar Valenzuela.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/xf872vs2626

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...