Decentralized data analysis : genome-wide association studies and other biomedical applications
- The recent growth of data with potential medical ramifications has led to a better understanding of complex disease pathways and risk factors. Currently, much of the medically and clinically-relevant data is generated and maintained in decentralized silos. As the size and sensitivity of this data increases, it is becoming exorbitantly expensive, unsafe, and impractical to host entire datasets in a centralized location. Meta-analysis techniques offer a feasible solution; however, they can introduce bias or may not be applicable in certain cases (e.g. small sample sizes). Federated learning can be used to combine some of the advantages of both data centralization and meta-analysis. I will describe how three optimization algorithms (Newton's method, alternating directions method of multiplier and Anderson accelerated Douglas-Rachford splitting) can be used with secure sum or partially-homomorphic encryption techniques to perform decentralized regression-based association analysis. I will show that these techniques are practical in many low-dimensional settings and that some can be scaled to computationally intensive tasks such as genome-wide association studies (GWAS) at consortium scale. Finally, I will introduce our federated GWAS platform, HyDRA
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource
|Degree committee member
|Stanford University, Department of Physics.
|Statement of responsibility
|Submitted to the Department of Physics
|Thesis Ph.D. Stanford University 2020
- © 2020 by Armin Pourshafeie
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...