Group testing and federated analytics beyond traditional assumptions
- In scientific and engineering disciplines, it is common to make simplifying assumptions about the models and methods under consideration. For instance, data samples are often assumed to be independent and identically distributed (i.i.d.), and algorithms are often designed without regard to resource constraints or societal values. Incorporating practical complexities into problem structures can yield insights and algorithms better suited to the real world. In this thesis, we introduce new models, algorithms, and information-theoretic results for three problems that depart from traditional assumptions. First, we discuss group testing--a procedure for efficiently finding "special items" within a large population, such as people infected by a virus--which has recently proven useful for COVID-19 testing. The most widely adopted infection models in the literature fail to capture the reality that diseases often spread through interactions between human beings. Pushing beyond the canonical settings, we introduce a new community-oriented model of disease spread--called the stochastic block infection model (SBIM)--where infections can be correlated. We then propose an adaptive community-aware algorithm for the SBIM that provably saves testing resources compared to the community-oblivious binary splitting algorithm. Our novel lower bounds demonstrate that the community-aware algorithm is order-optimal in certain parameter regimes of the SBIM. Lastly, we extend our algorithms and lower bounds to a broader range of infection models, testing models, and noise models. As a special case, these results show that our community-oriented group testing approach offers the same gains in testing efficiency in the presence of noise as it does in the noiseless case. Next, we consider a federated analytics problem where a central server wishes to estimate a probability distribution from a limited number of samples while conserving network bandwidth and protecting the privacy of the data sources. We introduce a simple estimation scheme that achieves: (i) the minimax-order-optimal estimation error in all privacy and communication regimes; and (ii) the optimal leading constant in the privacy-dominant regime. If the underlying distribution is assumed to be sparse, our scheme can be extended to a two-stage, non-interactive algorithm that leverages this sparsity to achieve significant utility gains. We use this algorithm to characterize the minimax sample complexity of the sparse case up to logarithmic factors, unifying and extending prior results in the literature. Finally, we study the problem of inferring the true label of some entity given potentially noisy outputs from a set of heterogeneous experts, each of whom specializes in a subset of labels. This can be viewed as the problem of aggregating the predictions of a set of small classifiers to perform a global multiclass classification task. It is also applicable to the areas of dataset construction and cross-silo federated learning. Our assumption of heterogeneity deviates from the i.i.d. assumption typically found in the federated learning literature. We formulate and study this problem through an information-theoretic lens and derive bounds on the minimum number of experts needed to reliably infer the true label of an arbitrary input instance. We also present a scheme for finding a minimal set of experts for solving the global classification task and experimentally demonstrate its favorable performance relative to both a centralized classifier and the well-known FedAvg algorithm.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Ahn, Surin L
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Department of Electrical Engineering
|Statement of responsibility
|Submitted to the Department of Electrical Engineering.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Surin L Ahn
Also listed in
Loading usage metrics...