Two parameter inference methods in likelihood-free models : approximate Bayesian computation and contrastive divergence

Jiang, Bai; Stanford University, Department of Statistics.

Two parameter inference methods in likelihood-free models : approximate Bayesian computation and contrastive divergence

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fzr919cy4680" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Parameter inference is perhaps the most fundamental problem in the field of Statistics. Both the Bayesians' posterior distribution and the frequentists' maximum likelihood estimate method critically reply on the availability of the probability mass or density function, namely, the likelihood function $l(\theta; X) = p_\theta(X)$. However, in many applications, the likelihood function cannot be explicitly obtained, or is intractable to compute. This unavailability precludes the possibility of direct Bayesian computation or maximum likelihood learning. In these cases, approximate inference can still be performed as long as it is possible to simulate data samples $X$ from the likelihood-free model given a certain parameter $\theta$, using the methods of ABC. Both the accuracy and computational efficiency of ABC depend on the choice of summary statistic, specially when dealing with high-dimensional data. But it is unclear which guiding principles can be used to construct effective summary statistics. In Chapter 2, we explore the possibility of automating the process of constructing summary statistics by training deep neural networks (DNN) to predict the parameters from artificially generated data: the resulting summary statistics are approximately posterior means of the parameters. With minimal model-specific tuning, our method constructs summary statistics for the Ising model and the moving-average model, which match or exceed theoretically-motivated summary statistics in terms of the accuracy of the resulting posteriors. In many important models, the likelihood function is not entirely available but conditionally computable or known up to a normalizing constant. An example is the model of the form $$p_\theta(x) = e^{-E(x, \theta)-\Lambda(\theta)}, $$ where the \textit{energy function} $E(x, \theta)$ is known while the \textit{log-partition function} $\Lambda(\theta)$ is unknown. As the Markov Chain Monte Carlo (MCMC) method can sample data $X$ given $\theta$, ABC can be used in principle. But a more powerful alternative is Geoffrey E. Hinton's Contrastive Divergence (CD) learning algorithm that approximates $\nabla \Lambda(\theta)$, the missing term in the gradient of the log-likelihood function, by a short MCMC run and maximizes the log-likelihood function with the approximate gradient. Despite CD's empirical success, both computer simulation and theoretical analysis show that CD may fail to converge to the maximum likelihood estimate or the true parameter. In Chapter 3, we study the asymptotic properties of CD algorithm with a fixed learning rate in exponential families and establish the conditions that guarantee the convergence of CD algorithm. We prove that, given a data sample $X_1, \dots, X_n \sim p_{\theta^*}$ i.i.d. and let $\{\theta\}_{t \ge 0}$ be the sequence generated by the CD algorithm, then any limit point of their time average is an asymptotically consistent estimate in the sense that $$\lim_{n \to \infty} \mathbb{P}\left(\limsup_{t \to \infty} \left\Vert \frac{1}{t} \sum_{s=0}^{t-1} \theta_s - \theta^*\right\Vert_2 \ge A_m n^{-(1-2\gamma)/3}\right) = 0$$ for any $\gamma \in (0,1/2)$ and some coefficient constant $A_m$ depending on $m$, the number of transition steps in Markov Chain Monte Carlo in each iteration of CD algorithm. In Chapter 4, we extend the results in Chapter 3 to CD algorithm with an annealed learning rate and get analogous asymptotic properties.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2016
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Jiang, Bai
Associated with	Stanford University, Department of Statistics.
Primary advisor	Wong, Wing Hung
Thesis advisor	Wong, Wing Hung
Thesis advisor	Diaconis, Persi
Thesis advisor	Lai, T. L
Advisor	Diaconis, Persi
Advisor	Lai, T. L

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Bai Jiang.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2016.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...