See, act, and conceptualize : a learning system for robots to interact with the world

Placeholder Show Content

Abstract/Contents

Abstract
Building intelligent systems for robots to interact with the world is a challenging problem due to many factors, such as the high dimensionality of the state and action space, the enormous variability of tasks, and the uncertainty of surrounding environments. This dissertation contains our work to guide robots to learn to perceive objects and motion, master various manipulation skills, and develop concepts to better interact with the world. Robots need to develop their visual systems to perceive objects and motion. When robots start to move and interact with surrounding environments, they thereby autonomously induce motion in the scene. Such motion creates a rich, visual sensory signal facilitating better scene understanding. We introduce our work that jointly estimates the segmentation of a scene into a finite number of rigidly moving objects, the motion trajectories of these objects, and the object scene flow. After robots perceive objects, they can move their hands to manipulate these objects. We present our approaches to train robots to master various primitive and complex manipulation skills. Choosing the right action representation is important to master primitive skills. We present a data-driven grasp synthesis method that considers both the object geometry and gripper attributes. Our method leverages contact points as an abstraction that can be re-used by a diverse set of robot hands. Besides contact points between objects and robotic hands, we propose a contact point matching representation between two objects and utilize it to train robots to learn how to hang arbitrary objects onto diverse supporting items such as racks or hooks. For complex skills such as tool manipulation and robotic assembly, we describe a learning framework that allows a robot to autonomously modify the environment and discover how to ease manipulation skill learning. As robots master more and more skills, it becomes important for robots to learn to abstract and represent these skills. We present our learning framework that endows robots with the ability to acquire various concepts to represent manipulation skills. These manipulation concepts act as mental representations of verbs in natural language instructions. We propose a learning from demonstration approach to learn manipulation actions from large-scale video data sets that are annotated with natural language instructions. Thus, we could use natural language instructions to guide robots to better interact with the world. We conclude by summarizing our efforts and discuss future directions.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Shao, Lin
Degree supervisor Bohg, Jeannette, 1981-
Thesis advisor Bohg, Jeannette, 1981-
Thesis advisor Guibas, Leonidas J
Thesis advisor Khatib, Oussama
Degree committee member Guibas, Leonidas J
Degree committee member Khatib, Oussama
Associated with Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Lin Shao.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/bd998td4251

Access conditions

Copyright
© 2021 by Lin Shao
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...