Data-driven sequential decision making with deep probabilistic modeling
Abstract/Contents
- Abstract
- A central ability of intelligent agents is learning by interacting with the surrounding environment. Bayesian optimization (BO) and reinforcement learning (RL) provide techniques to balance exploration and exploitation in face of uncertainty. The first part of this dissertation focuses on improving and generalizing current BO approaches. From the perspective of utility maximization, we develop a general likelihood-free BO method, which directly models the acquisition function without an explicit probabilistic surrogate model and applies to any acquisition function with a non-negative utility, thus leading to a more scalable and sample-efficient optimization procedure. From the perspective of maximal uncertainty reduction, we propose a generalized BO framework based on decision theoretic entropies, which not only provides a unified view of multiple commonly used acquisition functions, but also yields a flexible family of acquisition functions that can be easily customized for novel tasks beyond black-box optimization such as top-k selection and level sets estimation. The second part focuses on learning from demonstrations to bypass limitations of RL (which requires carefully designed reward functions and unsafe online exploration, and struggles to generalize to new tasks and multi-agent systems), including extensions of maximum entropy inverse reinforcement learning to multi-agent systems and the meta-learning setting, as well as an offline imitation learning algorithm to seek proper regularization from suboptimal demonstrations via relaxed distribution matching. As deep probabilistic modeling is important to capture uncertainty and balance exploration and exploitation during decision making, the third part centers around the fundamental problem of learning expressive probabilistic models. We present new methods for training deep energy-based models by minimizing general f-divergences and maximizing homogeneous proper scoring rules to achieve more flexible modeling preferences, better inference performance, and robust parameter estimation under data contamination. These contributions enable machines to make better decisions in face of uncertainty.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Yu, Lantao |
---|---|
Degree supervisor | Ermon, Stefano |
Thesis advisor | Ermon, Stefano |
Thesis advisor | Grover, Aditya |
Thesis advisor | Ma, Tengyu |
Degree committee member | Grover, Aditya |
Degree committee member | Ma, Tengyu |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Lantao Yu. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/hj568zg8037 |
Access conditions
- Copyright
- © 2022 by Lantao Yu
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...