Two parameter inference methods in likelihood-free models : approximate Bayesian computation and contrastive divergence

Placeholder Show Content

Abstract/Contents

Abstract
Parameter inference is perhaps the most fundamental problem in the field of Statistics. Both the Bayesians' posterior distribution and the frequentists' maximum likelihood estimate method critically reply on the availability of the probability mass or density function, namely, the likelihood function $l(\theta; X) = p_\theta(X)$. However, in many applications, the likelihood function cannot be explicitly obtained, or is intractable to compute. This unavailability precludes the possibility of direct Bayesian computation or maximum likelihood learning. In these cases, approximate inference can still be performed as long as it is possible to simulate data samples $X$ from the likelihood-free model given a certain parameter $\theta$, using the methods of ABC. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, specially when dealing with high-dimensional data. But it is unclear which guiding principles can be used to construct effective summary statistics. In Chapter 2, we explore the possibility of automating the process of constructing summary statistics by training deep neural networks (DNN) to predict the parameters from artificially generated data: the resulting summary statistics are approximately posterior means of the parameters. With minimal model-specific tuning, our method constructs summary statistics for the Ising model and the moving-average model, which match or exceed theoretically-motivated summary statistics in terms of the accuracy of the resulting posteriors. In many important models, the likelihood function is not entirely available but conditionally computable or known up to a normalizing constant. An example is the model of the form $$p_\theta(x) = e^{-E(x, \theta)-\Lambda(\theta)}, $$ where the \textit{energy function} $E(x, \theta)$ is known while the \textit{log-partition function} $\Lambda(\theta)$ is unknown. As the Markov Chain Monte Carlo (MCMC) method can sample data $X$ given $\theta$, ABC can be used in principle. But a more powerful alternative is Geoffrey E. Hinton's Contrastive Divergence (CD) learning algorithm that approximates $\nabla \Lambda(\theta)$, the missing term in the gradient of the log-likelihood function, by a short MCMC run and maximizes the log-likelihood function with the approximate gradient. Despite CD's empirical success, both computer simulation and theoretical analysis show that CD may fail to converge to the maximum likelihood estimate or the true parameter. In Chapter 3, we study the asymptotic properties of CD algorithm with a fixed learning rate in exponential families and establish the conditions that guarantee the convergence of CD algorithm. We prove that, given a data sample $X_1, \dots, X_n \sim p_{\theta^*}$ i.i.d. and let $\{\theta\}_{t \ge 0}$ be the sequence generated by the CD algorithm, then any limit point of their time average is an asymptotically consistent estimate in the sense that $$\lim_{n \to \infty} \mathbb{P}\left(\limsup_{t \to \infty} \left\Vert \frac{1}{t} \sum_{s=0}^{t-1} \theta_s - \theta^*\right\Vert_2 \ge A_m n^{-(1-2\gamma)/3}\right) = 0$$ for any $\gamma \in (0,1/2)$ and some coefficient constant $A_m$ depending on $m$, the number of transition steps in Markov Chain Monte Carlo in each iteration of CD algorithm. In Chapter 4, we extend the results in Chapter 3 to CD algorithm with an annealed learning rate and get analogous asymptotic properties.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2016
Issuance monographic
Language English

Creators/Contributors

Associated with Jiang, Bai
Associated with Stanford University, Department of Statistics.
Primary advisor Wong, Wing Hung
Thesis advisor Wong, Wing Hung
Thesis advisor Diaconis, Persi
Thesis advisor Lai, T. L
Advisor Diaconis, Persi
Advisor Lai, T. L

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Bai Jiang.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2016.
Location electronic resource

Access conditions

Copyright
© 2016 by Bai Jiang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...