Low power audio feature extraction for machine learning applications

Placeholder Show Content

Abstract/Contents

Abstract
Always-on sound classification is a desirable but power-intensive function for a variety of emerging applications such as wearables and IoT devices. The hardware energy consumption of sound classifiers is typically driven by signal digitization, feature extraction, classification model storage, and execution. Yet, the complexity of the signal of interest is often much lower than that of the raw signal acquired from the microphone and processed by a machine learning engine. For instance, semi-stationary sounds (e.g., engine noise, baby cry, running water, human chatter, etc.) are signals with lower information content than more complex sounds such as music or speech. In this dissertation, I will present the benefits of leveraging an engineered feature set to efficiently classify semi-stationary sounds. This approach requires one to three orders of magnitude fewer parameters and can be therefore trained over ten times faster than competitive deep learning models. I will also describe a circuit topology and system architecture that can be used to extract both engineered features as well as more general purpose ones. Our work resulted in a 32-channel analog filterbank IC for audio front-end signal processing. It employs a passive N-path switched capacitor topology to achieve high power efficiency and reconfigurability. The circuit's unwanted harmonic mixing products are absorbed by the machine learning model during training. To enable a systematic pre-silicon study of this effect, we develop a computationally efficient circuit model that can process large machine learning datasets in practical run-times. Measured results using a 130 nm CMOS prototype IC indicate competitive classification accuracy on datasets for baby cry detection (93.7% AUC) and voice commands (92.4% average precision), while lowering the feature extraction energy compared to digital realizations by approximately 2x and 10x, respectively. The 1.44 mm2 chip consumes 800 nW, which corresponds to the lowest normalized power per simultaneously sampled channel in recent literature.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Villamizar, Daniel Augusto
Degree supervisor Murmann, Boris
Thesis advisor Murmann, Boris
Thesis advisor Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Thesis advisor Rivas-Davila, Juan
Degree committee member Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Degree committee member Rivas-Davila, Juan
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Daniel Augusto Villamizar Valenzuela.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/xf872vs2626

Access conditions

Copyright
© 2021 by Daniel Augusto Villamizar
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...