Pushing transport layer latency down towards its physical limits in data centers with programmable architectures and algorithms
Abstract/Contents
- Abstract
- Data center applications keep scaling horizontally across many machines to accommodate more users and data. This makes the communication performance requirements even more stringent, i.e., higher bandwidth and lower latency. The increasing link capacities address the bandwidth demands, but the latency requirements necessitate more sophisticated solutions. In this thesis, I observe that the transport layer is the only layer in the networking stack to impact latency both at the end-hosts and the network. The way it handles packets sets the end-hosts processing delay. And its congestion control determines the queuing delay in the network. Hence, I study transport layer designs to push both latencies down to their physical limits. First, I argue that end-host latency can be minimized by offloading the transport layer to NIC hardware, but fixed-function chips prohibit custom solutions for diversified environments. As a solution, I introduce nanoTransport, a programmable NIC architecture for message-based Remote Procedure Calls. It is programmed using the P4 language, making it easy to modify (or create) transport protocols while the packets are processed orders of magnitude faster than traditional software stacks. It identifies common events and primitive operations for a streamlined, modular, and programmable pipeline; including packetization, reassembly, timeouts, and packet generation, all expressed by the programmer. Next, I argue that network latency can only be minimized with quick and accurate congestion control decisions, which require precise congestion signals and the shortest control loop delay. I present Bolt to address these requirements and push congestion control to its theoretical limits. Bolt is based on three core ideas, (I) Sub-RTT Control (SRC) reacts to congestion faster than one RTT, (II) Proactive Ramp-up (PRU) foresees flow completions to promptly occupy released bandwidth, and (III) Supply matching (SM) matches bandwidth demand with supply to maximize utilization. I show that these mechanisms reduce 99th-p latency by 80% and improve 99th-p flow completion time by up to 3X compared to Swift and HPCC even at 400Gb/s.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2024; ©2024 |
Publication date | 2024; 2024 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Arslan, Serhat |
---|---|
Degree supervisor | McKeown, Nick |
Thesis advisor | McKeown, Nick |
Thesis advisor | Katti, Sachin |
Thesis advisor | Prabhakar, Balaji, 1967- |
Degree committee member | Katti, Sachin |
Degree committee member | Prabhakar, Balaji, 1967- |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Serhat Arslan. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2024. |
Location | https://purl.stanford.edu/zj481vg3597 |
Access conditions
- Copyright
- © 2024 by Serhat Arslan
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...