Efficient event understanding in videos and language

Placeholder Show Content

Abstract/Contents

Abstract
The visual world offers a smorgasbord of interesting events: human-object interactions, dynamic visual relationships, and activities of daily living. The ability to comprehend them is critical to the development of real-world, interactive AI systems. However, making sense of these events as humans do -- from a continuous and high-volume sensory stream in an efficient and effective manner -- remains a daunting endeavor. The challenges are chiefly two-fold. First, videos are computationally expensive to process; we need more than traditional extensions of systems designed for images. Second, videos capture a broad spectrum of event complexity, from low-level action primitives to higher-order spatiotemporal relationships; we need techniques to learn these semantics from natural language without expensive, dense annotations. This dissertation presents several research contributions aimed at addressing these challenges. First, we will discuss new architectures for recognizing actions in videos, which learn how to allocate a fixed computation budget to improve efficiency-accuracy by an order of magnitude over traditional techniques. Second, we will present new frameworks that advance our capability for efficiently learning about dense visual events from weak natural language supervision, including settings where language is not well-structured or contains ambiguous coreferences. Finally, we will discuss how a novel technique, leveraging progress in multimodal foundation models, reveals fundamental insights into pressing challenges and opportunities for deeper temporal event understanding with improved efficiency

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Buch, Shyamal Deep
Degree supervisor Li, Fei Fei, 1976-
Degree supervisor Niebles Duque, Juan Carlos, 1980-
Thesis advisor Li, Fei Fei, 1976-
Thesis advisor Niebles Duque, Juan Carlos, 1980-
Thesis advisor Goodman, Noah (Noah D.)
Thesis advisor Wu, Jiajun, (Computer scientist)
Degree committee member Goodman, Noah (Noah D.)
Degree committee member Wu, Jiajun, (Computer scientist)
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Shyamal Buch
Note Submitted to the Computer Science Department
Thesis Thesis Ph.D. Stanford University 2022
Location https://purl.stanford.edu/zj148rk8758

Access conditions

Copyright
© 2022 by Shyamal Deep Buch
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...