On the evaluation of deep generative models

Zhou, Sharon

On the evaluation of deep generative models

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffr445th8838" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Evaluation drives and tracks progress in every field. Metrics of evaluation are designed to assess important criteria in an area, and aid us in understanding the quantitative differences between one breakthrough and another. In machine learning, evaluation metrics have historically acted as north stars towards which researchers have optimized and organized their methods and findings. While evaluation metrics have been straightforward to construct and implement in some subfields of machine learning, they have been notoriously difficult to design in generative models. Several reasons emerge to explain this: (1) there are no gold standard outputs to compare against, unlike held-out test sets, (2) because of their diverse training methods and formulations, inherent model properties are difficult to measure consistently, and sampled outputs are often used for evaluation instead, (3) dependence on external (pretrained) models that add another layer of bias and uncertainty, and (4) inconsistent results without a large number of samples. As a result, generative models have suffered from noisy assessments that occupy a changing evaluation landscape, in contrast to the relative stability of their discriminative counterparts. In this manuscript, we examine several important criteria for generative models and introduce evaluation metrics to address each one while discussing the aforementioned issues in generative model evaluation. In particular, we examine the challenge of measuring the perceptual realism of generated outputs and introduce a human-in-the-loop evaluation system that leverages psychophysics theory to ground the method in human perception literature and crowdsourcing techniques to construct an efficient, reliable, and consistent method for comparing different models. In addition to this, we analyze disentanglement, an increasingly important property for assessing learned representations, by measuring an intrinsic property of a generative model's data manifold using persistent homology. The final work in this manuscript takes a step towards assessing a generative model and its different modes with a key application in mind, specifically the stylistic fidelity across different generated modes in a multimodal setting.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Zhou, Sharon
Degree supervisor	Ermon, Stefano
Degree supervisor	Ng, Andrew Y, 1976-
Thesis advisor	Ermon, Stefano
Thesis advisor	Ng, Andrew Y, 1976-
Thesis advisor	Bernstein, Michael S, 1984-
Degree committee member	Bernstein, Michael S, 1984-
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Sharon Zhou.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/fr445th8838

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...