Data-driven sequential decision making with deep probabilistic modeling

Placeholder Show Content

Abstract/Contents

Abstract
A central ability of intelligent agents is learning by interacting with the surrounding environment. Bayesian optimization (BO) and reinforcement learning (RL) provide techniques to balance exploration and exploitation in face of uncertainty. The first part of this dissertation focuses on improving and generalizing current BO approaches. From the perspective of utility maximization, we develop a general likelihood-free BO method, which directly models the acquisition function without an explicit probabilistic surrogate model and applies to any acquisition function with a non-negative utility, thus leading to a more scalable and sample-efficient optimization procedure. From the perspective of maximal uncertainty reduction, we propose a generalized BO framework based on decision theoretic entropies, which not only provides a unified view of multiple commonly used acquisition functions, but also yields a flexible family of acquisition functions that can be easily customized for novel tasks beyond black-box optimization such as top-k selection and level sets estimation. The second part focuses on learning from demonstrations to bypass limitations of RL (which requires carefully designed reward functions and unsafe online exploration, and struggles to generalize to new tasks and multi-agent systems), including extensions of maximum entropy inverse reinforcement learning to multi-agent systems and the meta-learning setting, as well as an offline imitation learning algorithm to seek proper regularization from suboptimal demonstrations via relaxed distribution matching. As deep probabilistic modeling is important to capture uncertainty and balance exploration and exploitation during decision making, the third part centers around the fundamental problem of learning expressive probabilistic models. We present new methods for training deep energy-based models by minimizing general f-divergences and maximizing homogeneous proper scoring rules to achieve more flexible modeling preferences, better inference performance, and robust parameter estimation under data contamination. These contributions enable machines to make better decisions in face of uncertainty.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Yu, Lantao
Degree supervisor Ermon, Stefano
Thesis advisor Ermon, Stefano
Thesis advisor Grover, Aditya
Thesis advisor Ma, Tengyu
Degree committee member Grover, Aditya
Degree committee member Ma, Tengyu
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Lantao Yu.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/hj568zg8037

Access conditions

Copyright
© 2022 by Lantao Yu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...