Autonomous learning for control systems with continuous state/action space

Park, Dookun; Stanford University, Department of Electrical Engineering.

Autonomous learning for control systems with continuous state/action space

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fsz024rj1359" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The principal purpose of this work is to develop an autonomous and unsupervised learning methodology for control problems that have continuous state/action spaces. The purpose of this learning is to build a control policy that generally works well for random initial states and achieves a locally minimal cost for a selected initial state. As a performance index, the discounted quadratic cost function was used. In this work, we present a learning method originally designed for continuous state/action space control problems. The proposed learning method has two learning stages, what we call phase-I learning and phase-II learning. Phase-I learning is a learning process that builds a working policy for random initial states. Phase-I learning starts from a randomly initialized policy, and the learning agent gradually builds up control knowledge by adding locally optimal actions, calculated based on the local dynamics estimation and the Riccati recursion, to the policy. The resulting policy can control the system for most of the random initial states. The second state, phase-II learning, is a policy refinement process for a given initial state, and it can be understood as a trajectory optimization process. For phase-II learning, a gradient descent trajectory optimization algorithm is developed. Using the trajectory optimization algorithm, the learning agent gradually adjusts the trajectory, which corresponds to the given initial state, to incrementally improve the policy. Phase-II learning continues until the cost reaches its local minimum; the resulting policy is a locally optimal policy for the given initial state. To test our method, we used three systems, the double integrator with gravity (2D), the single inverted pendulum (4D), and the double inverted pendulum on a card (6D). The results show that the proposed learning can build locally optimal policies within a reasonable number of learning cycles (less than 300 cycles in our experiments). The results also show that only a small portion of the state space was visited during the learning process, which supports the sparsity of the policy approximator grid (PAG). The main contributions of this work are as follows: 1) Development of an autonomous and unsupervised learning method that is designed for and effectively works for continuous state/action space control problems. 2) Development of data structures, which we call the policy approximator grid (PAG), the transition memory set (TMSET), and the action memory set (AMSET), that work seamlessly together with the proposed learning method. The insertion and retrieval times of these data structures are constant and independent of the amount of data that the data structures contain. 3) Experimental results that show that only a small portion of the state space is actually visited during the learning, as the state dimension increases. This "sparsity" allows the learning agent to govern the learning process without an exponentially increasing memory usage for the policy approximator implementation. 4) Development of a gradient descent trajectory optimization algorithm for phase-II learning. This algorithm is a general form of the 1st-order differential dynamic programming (DDP) algorithm.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2015
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Park, Dookun
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Pavone, Marco, 1980-
Primary advisor	Widrow, Bernard, 1929-
Thesis advisor	Pavone, Marco, 1980-
Thesis advisor	Widrow, Bernard, 1929-
Thesis advisor	Friedman, Jerome
Thesis advisor	Schafer, Ronald W, 1938-
Advisor	Friedman, Jerome
Advisor	Schafer, Ronald W, 1938-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Dookun Park.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2015.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...