Deep learning in vision-based robotic manipulation : towards generalization and fast inference

Yan, Mengyuan

Deep learning in vision-based robotic manipulation : towards generalization and fast inference

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmd000cz6702" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In the past decade, researchers are looking at bringing robots into our daily lives, and automating services such as taxis, delivery, house-works, and even medical procedures. One of the major roadblocks in making this leap is the diversity and uncertainty in the environments that the robots need to work in. Machine perception, i.e. understanding of the environment through visual, audio, and contact signals, is indispensable in such diverse and uncertain environments, and is a hard problem in itself. Further, the environment is changing, due to human activities and other factors, and robots need to react to the changes quickly. Recent developments in deep learning, especially computer vision, has brought us closer to achieving the goal of bringing robots into our daily environments. However, deep learning methods require a large amount of data with annotated labels, and new datasets and annotations need to be collected for each new task. Deep reinforcement learning algorithms have also achieved good performance on a range of locomotion or manipulation tasks, but the amount of interactions required to train most algorithms is so large that it could take days even with parallel simulation engines. Highly data-efficient models and learning algorithms are needed to help robots learn faster and with less human effort. Additionally, when designing a learning-based solution to a robotics task, inference speed needs to be taken into consideration so that the robot can respond to changes quickly. This thesis introduces methods to improve training data efficiency and inference speed for vision-based robotic manipulation. To improve data efficiency of models, we analyze properties and structures of the specific problems, and build structural biases into the models based on the insights obtained. In addition, we demonstrate self-supervised learning of the perception model on real images, enabling robots to collect their own training data without requiring human annotations. To improve robots' response speed, when learning motion policies we design learning algorithms to always explicitly learn the distribution of promising actions, instead of learning an action evaluation function which requires online optimization during runtime. The proposed methods are integrated into end-to-end systems and tested on real robots on two tasks: vision-based robotic grasping, and rope manipulation and knotting

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Yan, Mengyuan
Degree supervisor	Bohg, Jeannette, 1981-
Thesis advisor	Bohg, Jeannette, 1981-
Thesis advisor	Finn, Chelsea
Thesis advisor	Sadigh, Dorsa
Degree committee member	Finn, Chelsea
Degree committee member	Sadigh, Dorsa
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Mengyuan Yan
Note	Submitted to the Department of Electrical Engineering
Thesis	Thesis Ph.D. Stanford University 2020
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...