Tiered representations for audio-based multimedia and speech retrieval

Pancoast, Stephanie Lynne; Stanford University, Department of Electrical Engineering.

Tiered representations for audio-based multimedia and speech retrieval

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fww989pm4603" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: As society continues to move into an information age, millions of digital documents are created and stored for private or public viewing every day. Beyond text (e.g., searching for websites or emails), these documents can come in many forms, including speech, audio or video. With a rapidly increasing quantity of digital content floating around there is an increasing demand and challenge in retrieving those items effectively. In many cases the meaningful information for matching a query with the relevant documents is embedded in a raw signal (e.g., digitized sound wave), and this makes the retrieval even more challenging. This dissertation proposes methods for performing retrieval in two particularly challenging scenarios where both the query and retrieval item are in an audio format. The first scenario involves personalized spoken utterances and the second involves audio-based retrieval of videos that contain specific events (e.g., a birthday party). In both cases, because the audio is in a raw format, it first needs to be converted into a meaningful representation that allows for comparison with the previously created documents. Further, the audio is recorded from personal recording devices which introduces additional challenges. There are various ways to represent an audio signal, ranging from the unsupervised frame level (tens of milliseconds) to the supervised, concept level (a few seconds). Since each tier of representation has its own strengths and weaknesses, in addition to a presentation of my work in developing diverse representations for audio retrieval, I also present how these diverse representations can be combined to leverage the benefits of both tasks.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2015
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Pancoast, Stephanie Lynne
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Gray, Robert M, 1943-
Primary advisor	Osgood, Brad
Thesis advisor	Gray, Robert M, 1943-
Thesis advisor	Osgood, Brad
Thesis advisor	Akbacak, Murat
Thesis advisor	Gill, John T III
Advisor	Akbacak, Murat
Advisor	Gill, John T III

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Stephanie Lynne Pancoast.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2015.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...