Efficient event understanding in videos and language
Abstract/Contents
- Abstract
- The visual world offers a smorgasbord of interesting events: human-object interactions, dynamic visual relationships, and activities of daily living. The ability to comprehend them is critical to the development of real-world, interactive AI systems. However, making sense of these events as humans do -- from a continuous and high-volume sensory stream in an efficient and effective manner -- remains a daunting endeavor. The challenges are chiefly two-fold. First, videos are computationally expensive to process; we need more than traditional extensions of systems designed for images. Second, videos capture a broad spectrum of event complexity, from low-level action primitives to higher-order spatiotemporal relationships; we need techniques to learn these semantics from natural language without expensive, dense annotations. This dissertation presents several research contributions aimed at addressing these challenges. First, we will discuss new architectures for recognizing actions in videos, which learn how to allocate a fixed computation budget to improve efficiency-accuracy by an order of magnitude over traditional techniques. Second, we will present new frameworks that advance our capability for efficiently learning about dense visual events from weak natural language supervision, including settings where language is not well-structured or contains ambiguous coreferences. Finally, we will discuss how a novel technique, leveraging progress in multimodal foundation models, reveals fundamental insights into pressing challenges and opportunities for deeper temporal event understanding with improved efficiency
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Buch, Shyamal Deep |
---|---|
Degree supervisor | Li, Fei Fei, 1976- |
Degree supervisor | Niebles Duque, Juan Carlos, 1980- |
Thesis advisor | Li, Fei Fei, 1976- |
Thesis advisor | Niebles Duque, Juan Carlos, 1980- |
Thesis advisor | Goodman, Noah (Noah D.) |
Thesis advisor | Wu, Jiajun, (Computer scientist) |
Degree committee member | Goodman, Noah (Noah D.) |
Degree committee member | Wu, Jiajun, (Computer scientist) |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Shyamal Buch |
---|---|
Note | Submitted to the Computer Science Department |
Thesis | Thesis Ph.D. Stanford University 2022 |
Location | https://purl.stanford.edu/zj148rk8758 |
Access conditions
- Copyright
- © 2022 by Shyamal Deep Buch
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...