Low power audio feature extraction for machine learning applications
Abstract/Contents
- Abstract
- Always-on sound classification is a desirable but power-intensive function for a variety of emerging applications such as wearables and IoT devices. The hardware energy consumption of sound classifiers is typically driven by signal digitization, feature extraction, classification model storage, and execution. Yet, the complexity of the signal of interest is often much lower than that of the raw signal acquired from the microphone and processed by a machine learning engine. For instance, semi-stationary sounds (e.g., engine noise, baby cry, running water, human chatter, etc.) are signals with lower information content than more complex sounds such as music or speech. In this dissertation, I will present the benefits of leveraging an engineered feature set to efficiently classify semi-stationary sounds. This approach requires one to three orders of magnitude fewer parameters and can be therefore trained over ten times faster than competitive deep learning models. I will also describe a circuit topology and system architecture that can be used to extract both engineered features as well as more general purpose ones. Our work resulted in a 32-channel analog filterbank IC for audio front-end signal processing. It employs a passive N-path switched capacitor topology to achieve high power efficiency and reconfigurability. The circuit's unwanted harmonic mixing products are absorbed by the machine learning model during training. To enable a systematic pre-silicon study of this effect, we develop a computationally efficient circuit model that can process large machine learning datasets in practical run-times. Measured results using a 130 nm CMOS prototype IC indicate competitive classification accuracy on datasets for baby cry detection (93.7% AUC) and voice commands (92.4% average precision), while lowering the feature extraction energy compared to digital realizations by approximately 2x and 10x, respectively. The 1.44 mm2 chip consumes 800 nW, which corresponds to the lowest normalized power per simultaneously sampled channel in recent literature.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Villamizar, Daniel Augusto |
---|---|
Degree supervisor | Murmann, Boris |
Thesis advisor | Murmann, Boris |
Thesis advisor | Raina, Priyanka, (Assistant Professor of Electrical Engineering) |
Thesis advisor | Rivas-Davila, Juan |
Degree committee member | Raina, Priyanka, (Assistant Professor of Electrical Engineering) |
Degree committee member | Rivas-Davila, Juan |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Daniel Augusto Villamizar Valenzuela. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/xf872vs2626 |
Access conditions
- Copyright
- © 2021 by Daniel Augusto Villamizar
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...