Efficient reinforcement learning with agent states

Dong, Shi, (Researcher of reinforcement learning)

Efficient reinforcement learning with agent states

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmy903kt3306" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In a wide range of decision problems, much focus of academic research has been put on stylized models, whose capacities are usually limited by problem-specific assumptions. In the previous decade, approaches based on reinforcement learning (RL) have received growing attention. With these approaches, a unified method can be applied to a broad class of problems, circumventing the need for stylized solutions. Moreover, when it comes to real-life applications, such RL-based approaches, unfettered from the constraining models, can potentially leverage the growing amount of data and computational resources. As such, continuing innovations might empower RL to tackle problems in the complex physical world. So far, empirical accomplishments of RL have largely been limited to artificial environments, such as games. One reason is that the success of RL often hinges on the availability of a simulator that is able to mass-produce samples. Meanwhile, real environments, such as medical facilities, fulfillment centers, and the World Wide Web, exhibit complex dynamics that can hardly be captured by hard-coded simulators. To bring the achievement of RL into practice, it would be useful to think in terms of how the interactions between the agent and the real world ought to be modeled. Recent works on RL theory tend to focus on restrictive classes of environments that fail to capture certain aspects of the real world. For example, many of such works model the environment as a Markov Decision Process (MDP), which requires that the agent always observe a summary statistic of its situation. In practice, this means that the agent designer has to identify a set of "environmental states, " where each state incorporates all information about the environment relevant to decision-making. Moreover, to ensure that the agent learns from its trajectories, MDP models presume that some environmental states are visited infinitely often. This could be a significant simplification of the real world, as the gifted Argentine poet Jorge Luis Borges once said, "Every day, perhaps every hour, is different." To generate insights on agent design in authentic applications, in this dissertation we consider a more general framework of RL that relaxes such restrictions. Specifically, we demonstrate a simple RL agent that implements an optimistic version of Q-learning and establish through regret analysis that this agent can operate with some level of competence in any environment. While we leverage concepts from the literature on provably efficient RL, we consider a general agent-environment interface and provide a novel agent design and analysis that further develop the concept of agent state, which is defined as the collection of information that the agent maintains in order to make decisions. This level of generality positions our results to inform the design of future agents for operation in complex real environments. We establish that, as time progresses, our agent performs competitively relative to policies that require longer times to evaluate. The time it takes to approach asymptotic performance is polynomial in the complexity of the agent's state representation and the time required to evaluate the best policy that the agent can represent. Notably, there is no dependence on the complexity of the environment. The ultimate per-period performance loss of the agent is bounded by a constant multiple of a measure of distortion introduced by the agent's state representation. Our work is the first to establish that an algorithm approaches this asymptotic condition within a tractable time frame, and the results presented in this dissertation resolve multiple open issues in approximate dynamic programming.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Dong, Shi, (Researcher of reinforcement learning)
Degree supervisor	Van Roy, Benjamin
Thesis advisor	Van Roy, Benjamin
Thesis advisor	Ma, Tengyu
Thesis advisor	Montanari, Andrea
Degree committee member	Ma, Tengyu
Degree committee member	Montanari, Andrea
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Shi Dong.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/my903kt3306

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...