On the evaluation of deep generative models

Placeholder Show Content

Abstract/Contents

Abstract
Evaluation drives and tracks progress in every field. Metrics of evaluation are designed to assess important criteria in an area, and aid us in understanding the quantitative differences between one breakthrough and another. In machine learning, evaluation metrics have historically acted as north stars towards which researchers have optimized and organized their methods and findings. While evaluation metrics have been straightforward to construct and implement in some subfields of machine learning, they have been notoriously difficult to design in generative models. Several reasons emerge to explain this: (1) there are no gold standard outputs to compare against, unlike held-out test sets, (2) because of their diverse training methods and formulations, inherent model properties are difficult to measure consistently, and sampled outputs are often used for evaluation instead, (3) dependence on external (pretrained) models that add another layer of bias and uncertainty, and (4) inconsistent results without a large number of samples. As a result, generative models have suffered from noisy assessments that occupy a changing evaluation landscape, in contrast to the relative stability of their discriminative counterparts. In this manuscript, we examine several important criteria for generative models and introduce evaluation metrics to address each one while discussing the aforementioned issues in generative model evaluation. In particular, we examine the challenge of measuring the perceptual realism of generated outputs and introduce a human-in-the-loop evaluation system that leverages psychophysics theory to ground the method in human perception literature and crowdsourcing techniques to construct an efficient, reliable, and consistent method for comparing different models. In addition to this, we analyze disentanglement, an increasingly important property for assessing learned representations, by measuring an intrinsic property of a generative model's data manifold using persistent homology. The final work in this manuscript takes a step towards assessing a generative model and its different modes with a key application in mind, specifically the stylistic fidelity across different generated modes in a multimodal setting.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Zhou, Sharon
Degree supervisor Ermon, Stefano
Degree supervisor Ng, Andrew Y, 1976-
Thesis advisor Ermon, Stefano
Thesis advisor Ng, Andrew Y, 1976-
Thesis advisor Bernstein, Michael S, 1984-
Degree committee member Bernstein, Michael S, 1984-
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Sharon Zhou.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/fr445th8838

Access conditions

Copyright
© 2021 by Sharon Zhou
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...