Deep learning in vision-based robotic manipulation : towards generalization and fast inference

Placeholder Show Content

Abstract/Contents

Abstract
In the past decade, researchers are looking at bringing robots into our daily lives, and automating services such as taxis, delivery, house-works, and even medical procedures. One of the major roadblocks in making this leap is the diversity and uncertainty in the environments that the robots need to work in. Machine perception, i.e. understanding of the environment through visual, audio, and contact signals, is indispensable in such diverse and uncertain environments, and is a hard problem in itself. Further, the environment is changing, due to human activities and other factors, and robots need to react to the changes quickly. Recent developments in deep learning, especially computer vision, has brought us closer to achieving the goal of bringing robots into our daily environments. However, deep learning methods require a large amount of data with annotated labels, and new datasets and annotations need to be collected for each new task. Deep reinforcement learning algorithms have also achieved good performance on a range of locomotion or manipulation tasks, but the amount of interactions required to train most algorithms is so large that it could take days even with parallel simulation engines. Highly data-efficient models and learning algorithms are needed to help robots learn faster and with less human effort. Additionally, when designing a learning-based solution to a robotics task, inference speed needs to be taken into consideration so that the robot can respond to changes quickly. This thesis introduces methods to improve training data efficiency and inference speed for vision-based robotic manipulation. To improve data efficiency of models, we analyze properties and structures of the specific problems, and build structural biases into the models based on the insights obtained. In addition, we demonstrate self-supervised learning of the perception model on real images, enabling robots to collect their own training data without requiring human annotations. To improve robots' response speed, when learning motion policies we design learning algorithms to always explicitly learn the distribution of promising actions, instead of learning an action evaluation function which requires online optimization during runtime. The proposed methods are integrated into end-to-end systems and tested on real robots on two tasks: vision-based robotic grasping, and rope manipulation and knotting

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Yan, Mengyuan
Degree supervisor Bohg, Jeannette, 1981-
Thesis advisor Bohg, Jeannette, 1981-
Thesis advisor Finn, Chelsea
Thesis advisor Sadigh, Dorsa
Degree committee member Finn, Chelsea
Degree committee member Sadigh, Dorsa
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Mengyuan Yan
Note Submitted to the Department of Electrical Engineering
Thesis Thesis Ph.D. Stanford University 2020
Location electronic resource

Access conditions

Copyright
© 2020 by Mengyuan Yan
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...