Data representations and data quality in the context of machine learning and healthcare

Placeholder Show Content

Abstract/Contents

Abstract
The focus of this thesis is data representations and data quality in the context of machine learning and healthcare. In the first half of this thesis, we discuss three projects focusing on data representations. First, inspired by the recent progress in implicit neural networks, we show how to find a compact implicit representation to fit three-dimensional dose distribution data accurately. We evaluate the quality of the proposed representation using dose distributions of prostate, spine, and head and neck tumor cases. This study lays the groundwork for future applications of neural representations of dose data in radiation oncology. Second, Monte Carlo (MC) simulation is considered the most accurate of the available radiotherapy dose calculation methods. However, its clinical use is limited due to the extensive required computation times. To overcome this, we develop a model to predict high-resolution low-uncertainty MC dose distributions from low-resolution high-uncertainty MC dose distributions with reduced computation time. We evaluate the model using dose distributions of spine tumor cases. Our results show that the predicted dose distributions are qualitatively and quantitatively comparable to those generated from high-resolution low-uncertainty MC simulations. Third, we fill the gap in the literature by investigating image classification using graphs generated from a multiscale superpixel representation that can be considered as in between the regular-grid and similar-sized superpixel representations. Prior studies using graph neural networks have focused either on regular-grid or similar-sized superpixel representations. To perform the study we propose WaveMesh, a new wavelet-based superpixeling algorithm, and WavePool, a novel spatially heterogeneous pooling scheme tailored to WaveMesh superpixels. We perform extensive experiments on three benchmark datasets to show the effect of choice of superpixel representation and pooling scheme on the performance of the network. In the second half of this thesis, we design a framework to assess the quality of Coronavirus disease 2019 (COVID-19) data reporting. We use this framework to assess the data reporting quality from India, and to calculate a COVID-19 data reporting score for each state in India during three time periods between 2020 and 2021. We discuss the significant disparities in reporting observed during our initial assessment and the improvements observed in the subsequent assessments. We further discuss the lessons learned from our assessments and the societal impact of our work.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Vasudevan, Varun Ayakulangara
Degree supervisor Xing, Lei
Degree supervisor Ye, Yinyu
Thesis advisor Xing, Lei
Thesis advisor Ye, Yinyu
Thesis advisor Bibault, Jean-Emmanuel
Degree committee member Bibault, Jean-Emmanuel
Associated with Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Varun Vasudevan.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/fy373mx0459

Access conditions

Copyright
© 2022 by Varun Ayakulangara Vasudevan
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...