Training and deploying visual agents at scale

Fan, Linxi

Training and deploying visual agents at scale

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fjk266yw1361" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Autonomous agents that perceive and interact with the world, such as home robots and self-driving vehicles, hold great promises to a future that automates mundane tasks and improves the living standards for billions of people. However, two major obstacles stand in our way towards this grand goal. First, modern AI systems require huge amount of data to learn meaningful behaviors, yet training them directly on physics robots is unscalable due to high cost and low efficiency. Second, mobile robot platforms typically have limited onboard computing resources but demand low reaction latency, which hinders the mass deployment of large-capacity visual models. In this dissertation, we will explore an effective recipe towards developing algorithms and systems that are able to train and deploy visual agents at scale. The key idea is to train the agents in rich simulation, then overcome the sim-to-real gap, and finally deploy efficiently on edge devices with lightweight video processing architectures. This dissertation is organized around 4 primary components in the pipeline. First, we propose an open-source distributed framework that provides a full-stack solution to accelerate reinforcement learning (RL) significantly for complex robotics tasks. Second, we construct an ecologically valid and visually realistic simulator for home robotic tasks. Third, we introduce a novel policy learning method that achieves zero-shot generalization to unseen visual environments with large distributional shifts, which facilitates sim-to-real transfer. Finally, we design a new family of video learning architectures that enables deep video understanding for visual agents on resource-constrained devices. We hope that the techniques and ideas presented in this dissertation will bring us one step closer to the future where intelligent robots will become as ubiquitous as smartphones in our lives.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Fan, Linxi
Degree supervisor	Li, Fei Fei, 1976-
Thesis advisor	Li, Fei Fei, 1976-
Thesis advisor	Niebles Duque, Juan Carlos, 1980-
Thesis advisor	Wu, Jiajun, (Computer scientist)
Degree committee member	Niebles Duque, Juan Carlos, 1980-
Degree committee member	Wu, Jiajun, (Computer scientist)
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Linxi Fan.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/jk266yw1361

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...