Tiered representations for audio-based multimedia and speech retrieval

Placeholder Show Content

Abstract/Contents

Abstract
As society continues to move into an information age, millions of digital documents are created and stored for private or public viewing every day. Beyond text (e.g., searching for websites or emails), these documents can come in many forms, including speech, audio or video. With a rapidly increasing quantity of digital content floating around there is an increasing demand and challenge in retrieving those items effectively. In many cases the meaningful information for matching a query with the relevant documents is embedded in a raw signal (e.g., digitized sound wave), and this makes the retrieval even more challenging. This dissertation proposes methods for performing retrieval in two particularly challenging scenarios where both the query and retrieval item are in an audio format. The first scenario involves personalized spoken utterances and the second involves audio-based retrieval of videos that contain specific events (e.g., a birthday party). In both cases, because the audio is in a raw format, it first needs to be converted into a meaningful representation that allows for comparison with the previously created documents. Further, the audio is recorded from personal recording devices which introduces additional challenges. There are various ways to represent an audio signal, ranging from the unsupervised frame level (tens of milliseconds) to the supervised, concept level (a few seconds). Since each tier of representation has its own strengths and weaknesses, in addition to a presentation of my work in developing diverse representations for audio retrieval, I also present how these diverse representations can be combined to leverage the benefits of both tasks.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Pancoast, Stephanie Lynne
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Gray, Robert M, 1943-
Primary advisor Osgood, Brad
Thesis advisor Gray, Robert M, 1943-
Thesis advisor Osgood, Brad
Thesis advisor Akbacak, Murat
Thesis advisor Gill, John T III
Advisor Akbacak, Murat
Advisor Gill, John T III

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Stephanie Lynne Pancoast.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Stephanie Lynne Pancoast
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...