Techniques for building predictable stream processing pipelines

Bansal, Manu Kumar

Techniques for building predictable stream processing pipelines

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fkh272hk0556" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: This dissertation presents techniques for easily building real-time parallel stream processing pipelines with predictable performance. These techniques enable automatically finding layouts of pipelines onto parallel-processing hardware that guarantee required performance and use resources efficiently. The automated workflow replaces tedious and error-prone process of laying out pipelines by trial-and-error. The key to the automated workflow is a novel performance modeling approach for stream processing pipelines based on two simple principles. First, pipelines are built out of components with predictable performance that also compose in predictable ways. Second, pipelines are built entirely out of compute and data-transfer as the basic operations. A large class of stream processing pipelines can be built following those principles. For any pipeline built in such a manner, techniques presented in this dissertation enable development of accurate models that can predict the performance of parallelized layouts. In turn, those models enable automated search for efficient pipeline layouts that can meet target performance requirements. The dissertation also includes design and implementation of two challenging real-time stream processing pipelines from the context of software-defined wireless networks - a WiFi data-plane and an LTE control-plane. Those pipelines are built following the proposed principles and techniques, thus demonstrating their effectiveness. The WiFi data-plane pipeline meets performance requirements of processing 20 million samples per second with latency bounds of tens of microseconds. This pipeline is built using Atomix, a novel programming framework for predictable signal processing embodying the proposed principles. The design and implementation of Atomix is presented along with that of the WiFi data-plane pipeline. The LTE control-plane pipeline meets peak performance requirements of processing event streams from 3,000 LTE base-stations with sub-second latency with processing load varying over two orders of magnitude daily. This pipeline is built using Trevor, a novel auto-scaling system for distributed stream processing leveraging the proposed principles. The design and implementation of Trevor is presented along with that of the LTE control-plane pipeline and similar pipelines realized using Trevor. The principles and techniques contained in this dissertation streamline continuous development, predictable execution, and efficient operation of parallel stream processing pipelines at scale through automated workflows. The two specific pipelines used to illustrate those techniques stretch the limits of real-time stream processing and demonstrate the power of model-based pipeline development. The contributions presented here apply broadly to real-time parallel stream processing in both multi-core and distributed settings.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2018; ©2018
Publication date	2018; 2018
Issuance	monographic
Language	English

Creators/Contributors

Author	Bansal, Manu Kumar
Degree supervisor	Katti, Sachin
Thesis advisor	Katti, Sachin
Thesis advisor	Kozyrakis, Christoforos, 1974-
Thesis advisor	Levis, Philip
Degree committee member	Kozyrakis, Christoforos, 1974-
Degree committee member	Levis, Philip
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Manu Kumar Bansal.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2018.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...