Thallo : a domain specific language for non-linear least squares optimization in visual computing

Placeholder Show Content


Large-scale non-linear least squares (NLLS) optimization problems lie at the core of many graphics, vision, and imaging applications. The mathematical descriptions of these objective functions are extremely concise, but in order to use them in interactive applications, solvers are often implemented by hand in tedious and error-prone processes (particularly on GPUs). High-level libraries exist but are orders of magnitude slower than specialized optimized implementations. Even competing handwritten implementations can vary in performance by multiple orders of magnitude, depending on the problem. A survey of existing work reveals that the keys for high performance are twofold: exposing the data-parallelism in these problems and a problem-specific schedule that enables efficient usage of the underlying hardware. This thesis argues that cleanly separating the energy formulation and organization of computation from implementation details of NLLS using a domain-specific language (DSL) enables the generation of high performance massively parallel GPU solvers from high-level specifications. Thus I introduce Thallo, a DSL for large-scale non-linear least-squares optimization problems. I observed various code reorganizations performed by implementers of high-performance solvers in the literature and then define a set of basic operations that span these scheduling choices, thereby defining a large scheduling space. Users can either specify code transformations in a scheduling co-language or use an autoscheduler. Thallo takes as input a compact, shader-like representation of an energy function and a (potentially auto-generated) schedule, translating the combination into high-performance GPU solvers. Since Thallo can generate solvers from a large scheduling space, it can handle a large set of large-scale non-linear and non-smooth problems with various degrees of non-locality and compute-to-memory ratios, including diverse applications such as as-rigid-as-possible mesh deformation, bundle adjustment, blendshape fitting, and spatially-varying Poisson deconvolution. By abstracting schedules from the energy formulation and using a domain-specific compiler, we outperform the fastest state-of-the-art comparable high-level systems, the Ceres solver library, by multiple orders of magnitude and even outperform hand-tuned problem-specific GPU implementations.


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English


Author Mara, Michael Thomas
Degree supervisor Hanrahan, P. M. (Patrick Matthew)
Thesis advisor Hanrahan, P. M. (Patrick Matthew)
Thesis advisor Fatahalian, Kayvon
Thesis advisor James, Doug L
Degree committee member Fatahalian, Kayvon
Degree committee member James, Doug L
Associated with Stanford University, Computer Science Department


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Michael Mara.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2020.
Location electronic resource

Access conditions

© 2020 by Michael Thomas Mara
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...