TypeVote algorithm : theory, computation, extensions and applications

Placeholder Show Content

Abstract/Contents

Abstract
This thesis is devoted to describe an algorithm --- TypeVote algorithm that enables us to compute more accurate probability distribution estimators (both probability mass and density estimators) in the presence of missing data of which the missing pattern is MCAR (missing completely at random). We first review some of the background literatures of MCAR data analysis and present a number of motivating heuristics underneath the algorithm design. Then we elaborate the generic TypeVote algorithm right after introducing two related concepts: vagueness and type graph; there are two parts of the algorithm: the typing process which standardizes the raw data and the voting process which is an EM algorithm that finds the maximum likelihood estimator by using both completely and incompletely observed data entries. Subsequently, we prove the properties of the generic TypeVote algorithm such as the convergence condition, the convergence rate and the superiority to the classical multinomial estimator that only use completely observed data entries in terms of asymptotic efficiency and some others. We also cover the computational aspects of the generic TypeVote including the run-time data structure, scalability and flexibility. Additionally, we provide a simulation-based numerical experiment to support the theory. We then extend the study to categorical data and furthermore we derive TypeVote kernel density estimator (TV-KDE) which is more accurate than the classical KDE by taking both completely and incompletely observed data entries into account. Overall, TypeVote algorithm not only gives us an unprecedented methodology on how to handle probability distribution estimation with MCAR data, but also opens some new opportunities for us because a number of acclaimed multi-step estimators employing probability distribution estimators as one critical step. Practically, probability distribution estimation with MCAR data (MCAR data is getting prevalent in the computational era) is one of the key problems in the realm of applied statistics.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Wang, Boyu
Associated with Stanford University, Institute for Computational and Mathematical Engineering.
Primary advisor Hong, Han
Thesis advisor Hong, Han
Thesis advisor Papanicolaou, George
Thesis advisor Ye, Yinyu
Advisor Papanicolaou, George
Advisor Ye, Yinyu

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Boyu Wang.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Boyu Wang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...