Tiered representations for audio-based multimedia and speech retrieval
Abstract/Contents
- Abstract
- As society continues to move into an information age, millions of digital documents are created and stored for private or public viewing every day. Beyond text (e.g., searching for websites or emails), these documents can come in many forms, including speech, audio or video. With a rapidly increasing quantity of digital content floating around there is an increasing demand and challenge in retrieving those items effectively. In many cases the meaningful information for matching a query with the relevant documents is embedded in a raw signal (e.g., digitized sound wave), and this makes the retrieval even more challenging. This dissertation proposes methods for performing retrieval in two particularly challenging scenarios where both the query and retrieval item are in an audio format. The first scenario involves personalized spoken utterances and the second involves audio-based retrieval of videos that contain specific events (e.g., a birthday party). In both cases, because the audio is in a raw format, it first needs to be converted into a meaningful representation that allows for comparison with the previously created documents. Further, the audio is recorded from personal recording devices which introduces additional challenges. There are various ways to represent an audio signal, ranging from the unsupervised frame level (tens of milliseconds) to the supervised, concept level (a few seconds). Since each tier of representation has its own strengths and weaknesses, in addition to a presentation of my work in developing diverse representations for audio retrieval, I also present how these diverse representations can be combined to leverage the benefits of both tasks.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2015 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Pancoast, Stephanie Lynne |
---|---|
Associated with | Stanford University, Department of Electrical Engineering. |
Primary advisor | Gray, Robert M, 1943- |
Primary advisor | Osgood, Brad |
Thesis advisor | Gray, Robert M, 1943- |
Thesis advisor | Osgood, Brad |
Thesis advisor | Akbacak, Murat |
Thesis advisor | Gill, John T III |
Advisor | Akbacak, Murat |
Advisor | Gill, John T III |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Stephanie Lynne Pancoast. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis (Ph.D.)--Stanford University, 2015. |
Location | electronic resource |
Access conditions
- Copyright
- © 2015 by Stephanie Lynne Pancoast
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...