Towards comprehensive action understanding in videos
Abstract/Contents
- Abstract
- An enormous amount of videos are created, spread, and watched daily. In the ocean of videos, the actions and activities of humans are often the pivots. We desire machines to understand human actions in videos as this is essential to various applications, including but not limited to healthcare, security system, and human-robot interactions. For these applications to be realized, action understanding must go beyond simply answering "what is the action", but more comprehensive. An intelligent agent should be able to know "who/where is the actor", "what/where is the object", "what interaction is happening between the actor and the object", "when does an action start and end", and more. Achieving comprehensive action understanding is non-trivial since the need for data and labels combinatorially increases when trying to solve multiple problems, not to mention that video data and labels are expensive to collect, store, and consume. Therefore, to obtain comprehensive action understanding, we not only need to perform multiple tasks but also have to ensure data efficiency. In this dissertation, we discuss three questions to realize data-efficient and comprehensive action understanding. How to reduce the need for data and labels? How to perform multiple tasks without combinatorial growth of data? How to solve new problems efficiently with some other problems solved? For the first question, our works on few-shot video classification and semi-supervised temporal action proposals introduce video-specific techniques and strategies for learning with less supervision. For the second question, we demonstrate how to avoid enumerating all combinations of categories from subtasks by knowledge disentanglement in a study on actor-action segmentation. For the third question, we propose constructing compositional representation from human-object relationships in videos, and such representation leads to better generalizability in action recognition models.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Ji, Jingwei |
---|---|
Degree supervisor | Li, Fei Fei, 1976- |
Degree supervisor | Niebles, Juan |
Thesis advisor | Li, Fei Fei, 1976- |
Thesis advisor | Niebles, Juan |
Thesis advisor | Guibas, Leonidas J |
Thesis advisor | Savarese, Silvio |
Thesis advisor | Yeung, Serena |
Degree committee member | Guibas, Leonidas J |
Degree committee member | Savarese, Silvio |
Degree committee member | Yeung, Serena |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Jingwei Ji. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/wc099nh9969 |
Access conditions
- Copyright
- © 2021 by Jingwei Ji
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...