Data-driven sequential decision making with deep probabilistic modeling

Yu, Lantao

Data-driven sequential decision making with deep probabilistic modeling

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fhj568zg8037" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: A central ability of intelligent agents is learning by interacting with the surrounding environment. Bayesian optimization (BO) and reinforcement learning (RL) provide techniques to balance exploration and exploitation in face of uncertainty. The first part of this dissertation focuses on improving and generalizing current BO approaches. From the perspective of utility maximization, we develop a general likelihood-free BO method, which directly models the acquisition function without an explicit probabilistic surrogate model and applies to any acquisition function with a non-negative utility, thus leading to a more scalable and sample-efficient optimization procedure. From the perspective of maximal uncertainty reduction, we propose a generalized BO framework based on decision theoretic entropies, which not only provides a unified view of multiple commonly used acquisition functions, but also yields a flexible family of acquisition functions that can be easily customized for novel tasks beyond black-box optimization such as top-k selection and level sets estimation. The second part focuses on learning from demonstrations to bypass limitations of RL (which requires carefully designed reward functions and unsafe online exploration, and struggles to generalize to new tasks and multi-agent systems), including extensions of maximum entropy inverse reinforcement learning to multi-agent systems and the meta-learning setting, as well as an offline imitation learning algorithm to seek proper regularization from suboptimal demonstrations via relaxed distribution matching. As deep probabilistic modeling is important to capture uncertainty and balance exploration and exploitation during decision making, the third part centers around the fundamental problem of learning expressive probabilistic models. We present new methods for training deep energy-based models by minimizing general f-divergences and maximizing homogeneous proper scoring rules to achieve more flexible modeling preferences, better inference performance, and robust parameter estimation under data contamination. These contributions enable machines to make better decisions in face of uncertainty.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Yu, Lantao
Degree supervisor	Ermon, Stefano
Thesis advisor	Ermon, Stefano
Thesis advisor	Grover, Aditya
Thesis advisor	Ma, Tengyu
Degree committee member	Grover, Aditya
Degree committee member	Ma, Tengyu
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Lantao Yu.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/hj568zg8037

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...