Empowering disease diagnostics and deriving biological insight with publicly available gene expression data
Abstract/Contents
- Abstract
- Public data repositories and other data sharing platforms have been a massive boon to researchers in the biomedical sciences. Data sharing can reduce costs, save valuable time, and helps ensure the transparency of public research. In our lab, these public data have become the foundation of most of our analyses -- many projects would not be possible without access to a wide array of datasets comprising many different diseases, regions, ethnicities, medical histories, and so on. Furthermore, as we have repeatedly demonstrated, the robustness that we can achieve from integrating large amount of heterogeneous data has allowed us to make findings that are both significant and durable. However, designing projects around public data can be a double-edged sword. When relevant data is available, it can allow for more efficient and more powerful analyses. But when the data is not present or lacking in quality, it can impose severe limitations on the types of analyses that can be done and on the types of questions that can be asked. Here, we designed a novel multi-cohort analysis framework called Multicohort ANalysis of AggregaTed gEne Expression (MANATEE) to integrate large numbers of gene expression datasets for use in generating signatures of disease. MANATEE utilizes a conormalization method to pool samples across many datasets, as long as each dataset contains healthy control samples. This framework lets us utilize far more datasets than was previously possible, which not only allows for existing analyses to be made substantially more robust, but it also opens new avenues of exploration that were previously impossible to analyze using publicly available data. By utilizing MANATEE with publicly available gene expression datasets, we developed multiple host-response-based signatures of disease, all of which were derived from gene expression in human blood. These include a diagnostic for differentiating between bacterial and viral infection in febrile individuals, a prognostic for assessing whether a patient with viral infection will have a severe or mild outcome, and a signature for both distinguishing tuberculosis from other conditions and for predicting when patients with latent tuberculosis infection will progress to active tuberculosis. We validated each of these signatures in prospective cohorts, demonstrating their ability to generalize to new data. These results highlight MANATEE's ability to leverage public data in creating signatures that maintain performance across the heterogeneity present in real-world patient populations. Furthermore, the signatures we have developed are in the process of being translated into point-of-care, non-invasive diagnostic and prognostic tests which have the potential to significantly improve clinical practice.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Rao, Aditya Manohar |
---|---|
Degree supervisor | Khatri, Purvesh |
Thesis advisor | Khatri, Purvesh |
Thesis advisor | Andrews, Jason |
Thesis advisor | Jagannathan, Prasanna |
Thesis advisor | Utz, Paul |
Degree committee member | Andrews, Jason |
Degree committee member | Jagannathan, Prasanna |
Degree committee member | Utz, Paul |
Associated with | Stanford University, Department of Immunology |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Aditya Manohar Rao. |
---|---|
Note | Submitted to the Department of Immunology. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/qn911xh9571 |
Access conditions
- Copyright
- © 2021 by Aditya Manohar Rao
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...