Techniques for building predictable stream processing pipelines

Placeholder Show Content

Abstract/Contents

Abstract
This dissertation presents techniques for easily building real-time parallel stream processing pipelines with predictable performance. These techniques enable automatically finding layouts of pipelines onto parallel-processing hardware that guarantee required performance and use resources efficiently. The automated workflow replaces tedious and error-prone process of laying out pipelines by trial-and-error. The key to the automated workflow is a novel performance modeling approach for stream processing pipelines based on two simple principles. First, pipelines are built out of components with predictable performance that also compose in predictable ways. Second, pipelines are built entirely out of compute and data-transfer as the basic operations. A large class of stream processing pipelines can be built following those principles. For any pipeline built in such a manner, techniques presented in this dissertation enable development of accurate models that can predict the performance of parallelized layouts. In turn, those models enable automated search for efficient pipeline layouts that can meet target performance requirements. The dissertation also includes design and implementation of two challenging real-time stream processing pipelines from the context of software-defined wireless networks - a WiFi data-plane and an LTE control-plane. Those pipelines are built following the proposed principles and techniques, thus demonstrating their effectiveness. The WiFi data-plane pipeline meets performance requirements of processing 20 million samples per second with latency bounds of tens of microseconds. This pipeline is built using Atomix, a novel programming framework for predictable signal processing embodying the proposed principles. The design and implementation of Atomix is presented along with that of the WiFi data-plane pipeline. The LTE control-plane pipeline meets peak performance requirements of processing event streams from 3,000 LTE base-stations with sub-second latency with processing load varying over two orders of magnitude daily. This pipeline is built using Trevor, a novel auto-scaling system for distributed stream processing leveraging the proposed principles. The design and implementation of Trevor is presented along with that of the LTE control-plane pipeline and similar pipelines realized using Trevor. The principles and techniques contained in this dissertation streamline continuous development, predictable execution, and efficient operation of parallel stream processing pipelines at scale through automated workflows. The two specific pipelines used to illustrate those techniques stretch the limits of real-time stream processing and demonstrate the power of model-based pipeline development. The contributions presented here apply broadly to real-time parallel stream processing in both multi-core and distributed settings.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Bansal, Manu Kumar
Degree supervisor Katti, Sachin
Thesis advisor Katti, Sachin
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Levis, Philip
Degree committee member Kozyrakis, Christoforos, 1974-
Degree committee member Levis, Philip
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Manu Kumar Bansal.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Manu Kumar Bansal
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...