Design and optimization of a stencil engine

Placeholder Show Content


Application specific processors exploit the structure of algorithms to reduce energy costs and increase performance. These kinds of optimizations have become more and more important as the historical trends in technology scaling and energy scaling have slowed or stopped. Image processing and computer image understanding algorithms contain the kinds of embarrassingly parallel structures that application specific processors can exploit. Further, these algorithms have very high compute demands, which makes efficient computation critical. So these specialized processors are found on many SoCs today. Yet, these image processors are hard to design and program, which slows architectural innovation. To address this issue we leverage the fact that most image applications can be composed as a set of ``stencil'' kernels and then provide a virtual machine model for stencil computation onto which many applications in the domains of image signal processing, computational photography, and computer vision can be mapped. Stencil kernels are a class of function (e.g. convolution) in which a given pixel within an output image is calculated from a fixed-size sliding window of pixels in its corresponding input image. This fixed window in the input data, where each data element is reused between concurrent computations, allows for a significant reduction in memory traffic through buffering and provides much of the efficiency in specialized image processors. Additionally, the predictable data flow for stencil kernels, allows for the producer consumer relationships between stencil kernels in large applications to be statically determined and exploited, further reducing memory traffic. Finally, the functional nature of the computation and the significant number of times it is invoked allows for the implementation of the computation to be highly optimized. Stencil kernels play a recurring role in image signal processing, computer vision, and computational photography. Any process that creates a filter, constructs low level image features, evaluates relationships of nearby pixels or features, etc. is implementable as a stencil kernel. Many applications in the domain image processing and understanding are built by cascading these operations (e.g. filtering noise, looking for local features and local segments, then localizing regions and objects from those segments and features). These applications also play a significant role in society, whether it is to automate the home, car, or factory or to improve the capabilities of our mobile devices in capturing and understanding the world around us. While the computation model may seem restrictive and domain specific any improvement in the efficiency of this computation for this domain would permeate many fields and society increasing the capability and decrease the cost of innovation and progress. When applications are written in a domain specific language restricted to stencil computation, it can be compiled to the stencil virtual machine model proposed in this thesis. This model allows for an application's behavior to be specified without knowledge of the underlying system implementation. Conversely, such a model allows for a great degree of flexibility in the implementation of that underlying system, which provides opportunity for optimization. The input to this virtual machine model is an intermediate language called Data Path Description Assembler (DPDA), which represents a compiler target for high level languages. While many hardware-software systems implement the virtual machine and execute DPDA, this thesis presents a method to generate fixed function hardware from DPDA code. The resulting hardware is two orders of magnitude more efficient than a comparable CPU or GPU implementation. This hardware generator greatly reduces cost of designing customized engine for new imaging applications, and also serves as a critical reference for research exploring the overheads of more flexible compute engines.


Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English


Associated with Brunhaver, John S II
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Horowitz, Mark
Thesis advisor Horowitz, Mark
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Olukotun, Oyekunle Ayinde
Advisor Kozyrakis, Christoforos, 1974-
Advisor Olukotun, Oyekunle Ayinde


Genre Theses

Bibliographic information

Statement of responsibility John S. Brunhaver, II.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

© 2015 by John S. Brunhaver
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...