Group testing and federated analytics beyond traditional assumptions

Ahn, Surin L

Group testing and federated analytics beyond traditional assumptions

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fks626rv6997" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In scientific and engineering disciplines, it is common to make simplifying assumptions about the models and methods under consideration. For instance, data samples are often assumed to be independent and identically distributed (i.i.d.), and algorithms are often designed without regard to resource constraints or societal values. Incorporating practical complexities into problem structures can yield insights and algorithms better suited to the real world. In this thesis, we introduce new models, algorithms, and information-theoretic results for three problems that depart from traditional assumptions. First, we discuss group testing--a procedure for efficiently finding "special items" within a large population, such as people infected by a virus--which has recently proven useful for COVID-19 testing. The most widely adopted infection models in the literature fail to capture the reality that diseases often spread through interactions between human beings. Pushing beyond the canonical settings, we introduce a new community-oriented model of disease spread--called the stochastic block infection model (SBIM)--where infections can be correlated. We then propose an adaptive community-aware algorithm for the SBIM that provably saves testing resources compared to the community-oblivious binary splitting algorithm. Our novel lower bounds demonstrate that the community-aware algorithm is order-optimal in certain parameter regimes of the SBIM. Lastly, we extend our algorithms and lower bounds to a broader range of infection models, testing models, and noise models. As a special case, these results show that our community-oriented group testing approach offers the same gains in testing efficiency in the presence of noise as it does in the noiseless case. Next, we consider a federated analytics problem where a central server wishes to estimate a probability distribution from a limited number of samples while conserving network bandwidth and protecting the privacy of the data sources. We introduce a simple estimation scheme that achieves: (i) the minimax-order-optimal estimation error in all privacy and communication regimes; and (ii) the optimal leading constant in the privacy-dominant regime. If the underlying distribution is assumed to be sparse, our scheme can be extended to a two-stage, non-interactive algorithm that leverages this sparsity to achieve significant utility gains. We use this algorithm to characterize the minimax sample complexity of the sparse case up to logarithmic factors, unifying and extending prior results in the literature. Finally, we study the problem of inferring the true label of some entity given potentially noisy outputs from a set of heterogeneous experts, each of whom specializes in a subset of labels. This can be viewed as the problem of aggregating the predictions of a set of small classifiers to perform a global multiclass classification task. It is also applicable to the areas of dataset construction and cross-silo federated learning. Our assumption of heterogeneity deviates from the i.i.d. assumption typically found in the federated learning literature. We formulate and study this problem through an information-theoretic lens and derive bounds on the minimum number of experts needed to reliably infer the true label of an arbitrary input instance. We also present a scheme for finding a minimal set of experts for solving the global classification task and experimentally demonstrate its favorable performance relative to both a centralized classifier and the well-known FedAvg algorithm.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Ahn, Surin L
Degree supervisor	Ozgur, Ayfer
Thesis advisor	Ozgur, Ayfer
Thesis advisor	Pilanci, Mert
Thesis advisor	Weissman, Tsachy
Degree committee member	Pilanci, Mert
Degree committee member	Weissman, Tsachy
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Surin Ahn.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/ks626rv6997

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...