Learned motion models for the perception and generation of dynamic humans and objects

Placeholder Show Content


Understanding the motion of humans and objects is key for intelligent systems. Motions are the result of physics, but non-physical dynamics also play an important role, for example, social norms and traffic laws determine how pedestrians and vehicles behave. The ability to perceive and generate these motions enables important applications, such as autonomous robots that operate in the real world, mixed reality that augments the real world, and animation and simulation that imitate the real world. Despite often being approached as separate problems, the perception and generation of motion both fundamentally rely on having an accurate model of dynamics for humans and objects in a scene. Perception problems like pose estimation, tracking, and shape estimation require motion understanding to reason about occlusions and noise from partial and ambiguous inputs. Generation problems such as forecasting future motion rely entirely on being able to predict motion. A promising avenue to solve these problems is learning models of motion, however, it is challenging to develop models that accurately reflect the real world, capture the diversity of motion due to inherent uncertainty, and robustly generalize to many possible scenarios. This thesis explores how to effectively learn models of motion to solve important perception and generation problems. We propose several data-driven methods to accurately capture the dynamics of humans, objects, and how they interact with each other and their environment. In the first part of the thesis, we introduce two methods for perceiving 3D human pose and 3D object shape, respectively. The first uses a robust generative model of 3D human pose transitions, while the second learns a continuous motion representation entirely from point cloud observations. The second part of the thesis focuses on motion models for synthesizing high-level human behavior in the form of 2D top-down trajectories. In these works, we introduce two new generative models that handle complex multi-agent interactions and can be controlled by a user to produce trajectories with desirable properties. We show this is useful to create rare scenarios for testing autonomous vehicles and to animate crowds of pedestrians. Finally, the thesis ends with a discussion of important future directions to continue improving learned models of motion for humans and objects.


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English


Author Rempe, Davis Winston
Degree supervisor Guibas, Leonidas J
Thesis advisor Guibas, Leonidas J
Thesis advisor Bohg, Jeannette, 1981-
Thesis advisor Liu, Cheng-Yun Karen, 1977-
Degree committee member Bohg, Jeannette, 1981-
Degree committee member Liu, Cheng-Yun Karen, 1977-
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Davis Rempe.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/kc338bg9787

Access conditions

© 2023 by Davis Winston Rempe

Also listed in

Loading usage metrics...