Dataset of "Soundr: Head Position and Orientation Prediction Using a Microphone Array"

Yang, Jackie (Junrui); Banerjee, Gaurab; Gupta, Vishesh

Dataset of "Soundr: Head Position and Orientation Prediction Using a Microphone Array"

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fjn901sr3775" class="su-underline">Show Content</a>

Abstract/Contents

Abstract

Although state-of-the-art smart speakers can hear a user's speech, unlike a human assistant these devices cannot figure out users' verbal references based on their head location and orientation. Soundr presents a novel interaction technique that leverages the built-in microphone array found in most smart speakers to infer the user's spatial location and head orientation using only their voice. With that extra information, Soundr can figure out users references to objects, people, and locations based on the speakers' gaze, and also provide relative directions.

To provide training data for our neural network, we collected 751 minutes of data (50x that of the best prior work) from human speakers leveraging a virtual reality headset to accurately provide head tracking ground truth. Our results achieve an average positional error of 0.31m and an orientation angle accuracy of 34.3° for each voice command. A user study to evaluate user preferences for controlling IoT appliances by talking at them found this new approach to be fast and easy to use.

Description

Type of resource	software, multimedia
Date created	August 16, 2019 - September 9, 2019

Creators/Contributors

Author	Yang, Jackie (Junrui)
Author	Banerjee, Gaurab
Author	Gupta, Vishesh
Advisor	Lam, Monica S.
Advisor	Landay, James A.

Subjects

Subject	Smart speakers
Subject	Internet of Things
Subject	Machine learning
Subject	Acoustic source localization
Genre	Dataset

Bibliographic information

Related Publication	Jackie (Junrui) Yang, Gaurab Banerjee, Vishesh Gupta, Monica S. Lam, and James A. Landay. 2020. Soundr: Head Position and Orientation Prediction Using a Microphone Array. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA. DOI:http://doi.org/10.1145/3313831.3376427
Related item	Title Soundr: Head Position and Orientation Prediction Using a Microphone Array
Location	https://purl.stanford.edu/jn901sr3775

Access conditions

Use and reproduction: User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License: This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).

Preferred citation

Preferred Citation: Yang, Jackie (Junrui) and Banerjee, Gaurab and Gupta, Vishesh and Lam, Monica S. and Landay, James A.. Dataset of Soundr: Head Position and Orientation Prediction Using a Microphone Array. Stanford Digital Repository. Available at: https://purl.stanford.edu/jn901sr3775

Collection

Stanford Research Data

View other items in this collection in SearchWorks

Contact information

Contact: jackiey@stanford.edu

Also listed in

View in SearchWorks

Loading usage metrics...