Data representations and data quality in the context of machine learning and healthcare

Vasudevan, Varun Ayakulangara

Data representations and data quality in the context of machine learning and healthcare

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffy373mx0459" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The focus of this thesis is data representations and data quality in the context of machine learning and healthcare. In the first half of this thesis, we discuss three projects focusing on data representations. First, inspired by the recent progress in implicit neural networks, we show how to find a compact implicit representation to fit three-dimensional dose distribution data accurately. We evaluate the quality of the proposed representation using dose distributions of prostate, spine, and head and neck tumor cases. This study lays the groundwork for future applications of neural representations of dose data in radiation oncology. Second, Monte Carlo (MC) simulation is considered the most accurate of the available radiotherapy dose calculation methods. However, its clinical use is limited due to the extensive required computation times. To overcome this, we develop a model to predict high-resolution low-uncertainty MC dose distributions from low-resolution high-uncertainty MC dose distributions with reduced computation time. We evaluate the model using dose distributions of spine tumor cases. Our results show that the predicted dose distributions are qualitatively and quantitatively comparable to those generated from high-resolution low-uncertainty MC simulations. Third, we fill the gap in the literature by investigating image classification using graphs generated from a multiscale superpixel representation that can be considered as in between the regular-grid and similar-sized superpixel representations. Prior studies using graph neural networks have focused either on regular-grid or similar-sized superpixel representations. To perform the study we propose WaveMesh, a new wavelet-based superpixeling algorithm, and WavePool, a novel spatially heterogeneous pooling scheme tailored to WaveMesh superpixels. We perform extensive experiments on three benchmark datasets to show the effect of choice of superpixel representation and pooling scheme on the performance of the network. In the second half of this thesis, we design a framework to assess the quality of Coronavirus disease 2019 (COVID-19) data reporting. We use this framework to assess the data reporting quality from India, and to calculate a COVID-19 data reporting score for each state in India during three time periods between 2020 and 2021. We discuss the significant disparities in reporting observed during our initial assessment and the improvements observed in the subsequent assessments. We further discuss the lessons learned from our assessments and the societal impact of our work.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Vasudevan, Varun Ayakulangara
Degree supervisor	Xing, Lei
Degree supervisor	Ye, Yinyu
Thesis advisor	Xing, Lei
Thesis advisor	Ye, Yinyu
Thesis advisor	Bibault, Jean-Emmanuel
Degree committee member	Bibault, Jean-Emmanuel
Associated with	Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Varun Vasudevan.
Note	Submitted to the Institute for Computational and Mathematical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/fy373mx0459

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...