Efficient event understanding in videos and language

Buch, Shyamal Deep

Efficient event understanding in videos and language

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fzj148rk8758" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The visual world offers a smorgasbord of interesting events: human-object interactions, dynamic visual relationships, and activities of daily living. The ability to comprehend them is critical to the development of real-world, interactive AI systems. However, making sense of these events as humans do -- from a continuous and high-volume sensory stream in an efficient and effective manner -- remains a daunting endeavor. The challenges are chiefly two-fold. First, videos are computationally expensive to process; we need more than traditional extensions of systems designed for images. Second, videos capture a broad spectrum of event complexity, from low-level action primitives to higher-order spatiotemporal relationships; we need techniques to learn these semantics from natural language without expensive, dense annotations. This dissertation presents several research contributions aimed at addressing these challenges. First, we will discuss new architectures for recognizing actions in videos, which learn how to allocate a fixed computation budget to improve efficiency-accuracy by an order of magnitude over traditional techniques. Second, we will present new frameworks that advance our capability for efficiently learning about dense visual events from weak natural language supervision, including settings where language is not well-structured or contains ambiguous coreferences. Finally, we will discuss how a novel technique, leveraging progress in multimodal foundation models, reveals fundamental insights into pressing challenges and opportunities for deeper temporal event understanding with improved efficiency

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Buch, Shyamal Deep
Degree supervisor	Li, Fei Fei, 1976-
Degree supervisor	Niebles Duque, Juan Carlos, 1980-
Thesis advisor	Li, Fei Fei, 1976-
Thesis advisor	Niebles Duque, Juan Carlos, 1980-
Thesis advisor	Goodman, Noah (Noah D.)
Thesis advisor	Wu, Jiajun, (Computer scientist)
Degree committee member	Goodman, Noah (Noah D.)
Degree committee member	Wu, Jiajun, (Computer scientist)
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Shyamal Buch
Note	Submitted to the Computer Science Department
Thesis	Thesis Ph.D. Stanford University 2022
Location	https://purl.stanford.edu/zj148rk8758

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...