Effective usage of muscle simulation and deep learning for high-quality facial performance capture

Placeholder Show Content

Abstract/Contents

Abstract
This dissertation explores the problem of the uncanny valley in digital facial performances; however, it is unclear what exactly causes the psychological rejection of near-realistic digital human faces. While the uncanny valley may be, in part, caused by an imperfect digital rendering of the face, this dissertation hypothesizes that the largest hurdle to solving the uncanny valley problem are unrealistic facial shapes and motions. Such imperfections are due to sources of error in the facial deformation model, the minimized objective function(s), as well as the regularization and optimization method of choice. Linear blendshape-based deformation models have parameter spaces which permit a wide range of subtly to implausible facial shapes. Commonly used objective functions assume that one has perfect correspondences between the captured data and the synthetic model; this is an assumption that is rarely correct. Lastly, in an attempt to prevent the model from wandering into uncanny territory, simple regularization terms are used to restrain the parameter values to tend towards zero. This dissertation addresses these problems by introducing and exploring the usage of a fully differentiable muscle simulation model, deep learning in objective functions, and an alternative minimization method that avoids ad hoc regularization. Firstly, we build upon the previously introduced facial muscle track simulation model and make it fully differentiable from end-to-end. This is accomplished by driving the muscle tracks using a parallel set of blendshape parameter values. We prove that this model is not only fully differentiable but also as expressive as the muscle track simulation model and, in certain cases, mathematically equivalent. We also demonstrate that this model can be used and is effective as the deformation model in an optimization problem for targeting 3D geometry and 2D monocular RGB images. Furthermore, we show that the resulting activation values are a promising basis for future work on semantic interpretability. Secondly, we address the manual correspondence problem when capturing 2D RGB images by applying the same pretrained deep neural networks to both the captured image and a synthetic differentiable render of the face model. Such an approach can seamlessly be used in an optimization problem due to the fully differentiable nature of both the neural network and differentiable renderer. We demonstrate the efficacy of this approach for estimating facial pose and expression using facial alignment and optical flow networks. By relying on a trained network, we can remove human judgement from the facial performance capture process which presents a clear path towards improvement in future work. Lastly, we briefly explore the usage of regularization for facial performance capture and demonstrate how an alternative nonlinear least squares optimization method can produce comparable results without modifying the energy landscape.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Bao, Michael H
Degree supervisor Fedkiw, Ronald P, 1968-
Thesis advisor Fedkiw, Ronald P, 1968-
Thesis advisor Grabli, Stéphane, 1977-
Thesis advisor Liu, Cheng-Yun Karen, 1977-
Degree committee member Grabli, Stéphane, 1977-
Degree committee member Liu, Cheng-Yun Karen, 1977-
Associated with Stanford University, Computer Science Department.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Michael H. Bao.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Michael H Bao
License
This work is licensed under a Creative Commons Attribution Non Commercial No Derivatives 3.0 Unported license (CC BY-NC-ND).

Also listed in

Loading usage metrics...