Data-driven sequential decision making by understanding and adopting rational behavior
- A remarkable feature of an intelligent agent is the ability to make sequences of smart decisions that are executed in coordination to reach goals. As can be seen by watching humans, a polished sequential decision making policy yields elegant behaviors such as smooth driving, dexterous locomotion, and prudent investments. Learning optimal policies for sequential decision making is challenging due to issues such as the difficulty of long-horizon credit assignment, exploration in exponentially large search spaces, and designing suitable reward functions to encourage the correct behavior. In this dissertation, we are interested in, perhaps, one of the most natural forms of learning that humans engage in: learning from observations. We would like to focus on algorithms that enable data-driven learning of sequential decision making policies by observing optimal behavior demonstrated by other rational agents. This process comprises two main steps: understanding and adoption. In the first part, we discuss how to design algorithms that allow an agent to understand and thus internalize rational behavior. We develop an active world model learning algorithm that enables an ego-agent to build models of complex behaviors demonstrated by human-like animate agents by efficiently directing its attention. We further investigate the feasibility of building models of other rational agents by Inverse Reinforcement Learning. In the second part, we develop methods to adopt rational behavior from demonstrations. We develop algorithms for Imitation Learning in the presence of domain mismatch such as morphological and viewpoint differences. We further propose algorithms for imitation via Inverse Reinforcement Learning where we propose algorithms that extract underlying rewards from demonstrations of complex behaviors such as robotic locomotion. We hope that these contributions bring us one step closer to solving real-world sequential decision making problems with machine learning.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Computer Science Department
|Statement of responsibility
|Kun Ho Kim.
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Kun ho Kim
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...