Autonomous learning for control systems with continuous state/action space

Placeholder Show Content

Abstract/Contents

Abstract
The principal purpose of this work is to develop an autonomous and unsupervised learning methodology for control problems that have continuous state/action spaces. The purpose of this learning is to build a control policy that generally works well for random initial states and achieves a locally minimal cost for a selected initial state. As a performance index, the discounted quadratic cost function was used. In this work, we present a learning method originally designed for continuous state/action space control problems. The proposed learning method has two learning stages, what we call phase-I learning and phase-II learning. Phase-I learning is a learning process that builds a working policy for random initial states. Phase-I learning starts from a randomly initialized policy, and the learning agent gradually builds up control knowledge by adding locally optimal actions, calculated based on the local dynamics estimation and the Riccati recursion, to the policy. The resulting policy can control the system for most of the random initial states. The second state, phase-II learning, is a policy refinement process for a given initial state, and it can be understood as a trajectory optimization process. For phase-II learning, a gradient descent trajectory optimization algorithm is developed. Using the trajectory optimization algorithm, the learning agent gradually adjusts the trajectory, which corresponds to the given initial state, to incrementally improve the policy. Phase-II learning continues until the cost reaches its local minimum; the resulting policy is a locally optimal policy for the given initial state. To test our method, we used three systems, the double integrator with gravity (2D), the single inverted pendulum (4D), and the double inverted pendulum on a card (6D). The results show that the proposed learning can build locally optimal policies within a reasonable number of learning cycles (less than 300 cycles in our experiments). The results also show that only a small portion of the state space was visited during the learning process, which supports the sparsity of the policy approximator grid (PAG). The main contributions of this work are as follows: 1) Development of an autonomous and unsupervised learning method that is designed for and effectively works for continuous state/action space control problems. 2) Development of data structures, which we call the policy approximator grid (PAG), the transition memory set (TMSET), and the action memory set (AMSET), that work seamlessly together with the proposed learning method. The insertion and retrieval times of these data structures are constant and independent of the amount of data that the data structures contain. 3) Experimental results that show that only a small portion of the state space is actually visited during the learning, as the state dimension increases. This "sparsity" allows the learning agent to govern the learning process without an exponentially increasing memory usage for the policy approximator implementation. 4) Development of a gradient descent trajectory optimization algorithm for phase-II learning. This algorithm is a general form of the 1st-order differential dynamic programming (DDP) algorithm.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Park, Dookun
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Pavone, Marco, 1980-
Primary advisor Widrow, Bernard, 1929-
Thesis advisor Pavone, Marco, 1980-
Thesis advisor Widrow, Bernard, 1929-
Thesis advisor Friedman, Jerome
Thesis advisor Schafer, Ronald W, 1938-
Advisor Friedman, Jerome
Advisor Schafer, Ronald W, 1938-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Dookun Park.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Dookun Park
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...