Data representations and data quality in the context of machine learning and healthcare
Abstract/Contents
- Abstract
- The focus of this thesis is data representations and data quality in the context of machine learning and healthcare. In the first half of this thesis, we discuss three projects focusing on data representations. First, inspired by the recent progress in implicit neural networks, we show how to find a compact implicit representation to fit three-dimensional dose distribution data accurately. We evaluate the quality of the proposed representation using dose distributions of prostate, spine, and head and neck tumor cases. This study lays the groundwork for future applications of neural representations of dose data in radiation oncology. Second, Monte Carlo (MC) simulation is considered the most accurate of the available radiotherapy dose calculation methods. However, its clinical use is limited due to the extensive required computation times. To overcome this, we develop a model to predict high-resolution low-uncertainty MC dose distributions from low-resolution high-uncertainty MC dose distributions with reduced computation time. We evaluate the model using dose distributions of spine tumor cases. Our results show that the predicted dose distributions are qualitatively and quantitatively comparable to those generated from high-resolution low-uncertainty MC simulations. Third, we fill the gap in the literature by investigating image classification using graphs generated from a multiscale superpixel representation that can be considered as in between the regular-grid and similar-sized superpixel representations. Prior studies using graph neural networks have focused either on regular-grid or similar-sized superpixel representations. To perform the study we propose WaveMesh, a new wavelet-based superpixeling algorithm, and WavePool, a novel spatially heterogeneous pooling scheme tailored to WaveMesh superpixels. We perform extensive experiments on three benchmark datasets to show the effect of choice of superpixel representation and pooling scheme on the performance of the network. In the second half of this thesis, we design a framework to assess the quality of Coronavirus disease 2019 (COVID-19) data reporting. We use this framework to assess the data reporting quality from India, and to calculate a COVID-19 data reporting score for each state in India during three time periods between 2020 and 2021. We discuss the significant disparities in reporting observed during our initial assessment and the improvements observed in the subsequent assessments. We further discuss the lessons learned from our assessments and the societal impact of our work.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Vasudevan, Varun Ayakulangara |
---|---|
Degree supervisor | Xing, Lei |
Degree supervisor | Ye, Yinyu |
Thesis advisor | Xing, Lei |
Thesis advisor | Ye, Yinyu |
Thesis advisor | Bibault, Jean-Emmanuel |
Degree committee member | Bibault, Jean-Emmanuel |
Associated with | Stanford University, Institute for Computational and Mathematical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Varun Vasudevan. |
---|---|
Note | Submitted to the Institute for Computational and Mathematical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/fy373mx0459 |
Access conditions
- Copyright
- © 2022 by Varun Ayakulangara Vasudevan
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...