Application acceleration with scalable programmable data center networks
Abstract/Contents
- Abstract
- The use of data centers, which are large facilities that house thousands of servers interconnected by fast network, has been growing tremendously in recent years. In fact, most modern small to large Internet applications such as cloud computing, search, machine learning and social networks, rely on data centers as their backbone to power these applications. Such growth surfaced unseen problems in scaling data center networks and improving the performance of the networks, which are factors directly related to the performance of the applications that rely on data centers. In this dissertation, I present designs and system implementations that focus on improving the scalability of the data center networks and the performance of the applications that rely on the data center networks. First, I provide a design for building software for data center switches and present a system, called FBOSS, that implements the method. The motivation for FBOSS comes from the measurement study that shows that software is the major reason why data center networks fail combined with recent arrival of customizable switch hardware which allows switches to run more customized and complex software. Unlike other switch software, FBOSS is built, tested and deployed just as any other software services that run on a commodity server, using a set of methods that has been well-tested to ensure scalability and reliability of general software services, allowing it to rapidly develop and scale without sacrificing reliability. I showed that FBOSS allowed production data center network at Facebook to scale very quickly, 30x over a two year time period. I then propose a system called λ-NIC that harnesses the architectural advantages inherent to programmable domain-specific network processors, or Network Processing Units (NPU), to increase performance of widely implemented general cloud workloads. The motivation of λ-NIC comes from the premise that CPU-based architectures are ill-suited to run large number of small, latency-sensitive workloads, mainly due to the high overheads associated with the operating system, software networking stack, limited concurrency and context switching. λ-NIC leverages SmartNIC's proximity to the network and a vast array of NPU cores to simultaneously run thousands of lambdas on a single NIC with strict tail-latency guarantees. I show that λ-NIC achieves up to 880x and 736x improvements in workloads' response latency and throughput, respectively, significantly reduces host CPU and memory usage, while incurring comparable infrastructure cost. Finally, I provide ARGUS, a system that provides a convincing evidence that approaches mentioned in λ-NIC can be applied to a more complex and widely used application. ARGUS allows a replication protocol, which is widely used across conventional distributed data storage services, like databases and file systems, for increased fault tolerance, to achieve lower latency and scale without compromising tail latencies by taking advantage of the architectural advantages of programmable NPUs. I shows that, in comparison to CURP, ARGUS reduces mean and 99.9th-percentile latencies by 2x and 2.2x respectively, with 6.7x higher throughput, lowers the gap between the 99.9th-percentile and median latencies by about 3.3x. Furthermore, increasing the replication factor in ARGUS has a negligible effect on the tail latency of the system, i.e., an increase of 0.12 us per witness compared to 12.86 us in CURP
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2019; ©2019 |
Publication date | 2019; 2019 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Choi, Sean Seol Woong | |
---|---|---|
Degree supervisor | Rosenblum, Mendel | |
Thesis advisor | Rosenblum, Mendel | |
Thesis advisor | McKeown, Nick | |
Thesis advisor | Prabhakar, Balaji, 1967- | |
Degree committee member | McKeown, Nick | |
Degree committee member | Prabhakar, Balaji, 1967- | |
Associated with | Stanford University, Department of Electrical Engineering. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Sean Choi |
---|---|
Note | Submitted to the Department of Electrical Engineering |
Thesis | Thesis Ph.D. Stanford University 2019 |
Location | electronic resource |
Access conditions
- Copyright
- © 2019 by Sean Seol Woong Choi
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...