TypeVote algorithm : theory, computation, extensions and applications
- This thesis is devoted to describe an algorithm --- TypeVote algorithm that enables us to compute more accurate probability distribution estimators (both probability mass and density estimators) in the presence of missing data of which the missing pattern is MCAR (missing completely at random). We first review some of the background literatures of MCAR data analysis and present a number of motivating heuristics underneath the algorithm design. Then we elaborate the generic TypeVote algorithm right after introducing two related concepts: vagueness and type graph; there are two parts of the algorithm: the typing process which standardizes the raw data and the voting process which is an EM algorithm that finds the maximum likelihood estimator by using both completely and incompletely observed data entries. Subsequently, we prove the properties of the generic TypeVote algorithm such as the convergence condition, the convergence rate and the superiority to the classical multinomial estimator that only use completely observed data entries in terms of asymptotic efficiency and some others. We also cover the computational aspects of the generic TypeVote including the run-time data structure, scalability and flexibility. Additionally, we provide a simulation-based numerical experiment to support the theory. We then extend the study to categorical data and furthermore we derive TypeVote kernel density estimator (TV-KDE) which is more accurate than the classical KDE by taking both completely and incompletely observed data entries into account. Overall, TypeVote algorithm not only gives us an unprecedented methodology on how to handle probability distribution estimation with MCAR data, but also opens some new opportunities for us because a number of acclaimed multi-step estimators employing probability distribution estimators as one critical step. Practically, probability distribution estimation with MCAR data (MCAR data is getting prevalent in the computational era) is one of the key problems in the realm of applied statistics.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Institute for Computational and Mathematical Engineering.
|Statement of responsibility
|Submitted to the Institute for Computational and Mathematical Engineering.
|Thesis (Ph.D.)--Stanford University, 2012.
- © 2012 by Boyu Wang
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...