Pushing transport layer latency down towards its physical limits in data centers with programmable architectures and algorithms

Placeholder Show Content

Abstract/Contents

Abstract
Data center applications keep scaling horizontally across many machines to accommodate more users and data. This makes the communication performance requirements even more stringent, i.e., higher bandwidth and lower latency. The increasing link capacities address the bandwidth demands, but the latency requirements necessitate more sophisticated solutions. In this thesis, I observe that the transport layer is the only layer in the networking stack to impact latency both at the end-hosts and the network. The way it handles packets sets the end-hosts processing delay. And its congestion control determines the queuing delay in the network. Hence, I study transport layer designs to push both latencies down to their physical limits. First, I argue that end-host latency can be minimized by offloading the transport layer to NIC hardware, but fixed-function chips prohibit custom solutions for diversified environments. As a solution, I introduce nanoTransport, a programmable NIC architecture for message-based Remote Procedure Calls. It is programmed using the P4 language, making it easy to modify (or create) transport protocols while the packets are processed orders of magnitude faster than traditional software stacks. It identifies common events and primitive operations for a streamlined, modular, and programmable pipeline; including packetization, reassembly, timeouts, and packet generation, all expressed by the programmer. Next, I argue that network latency can only be minimized with quick and accurate congestion control decisions, which require precise congestion signals and the shortest control loop delay. I present Bolt to address these requirements and push congestion control to its theoretical limits. Bolt is based on three core ideas, (I) Sub-RTT Control (SRC) reacts to congestion faster than one RTT, (II) Proactive Ramp-up (PRU) foresees flow completions to promptly occupy released bandwidth, and (III) Supply matching (SM) matches bandwidth demand with supply to maximize utilization. I show that these mechanisms reduce 99th-p latency by 80% and improve 99th-p flow completion time by up to 3X compared to Swift and HPCC even at 400Gb/s.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2024; ©2024
Publication date 2024; 2024
Issuance monographic
Language English

Creators/Contributors

Author Arslan, Serhat
Degree supervisor McKeown, Nick
Thesis advisor McKeown, Nick
Thesis advisor Katti, Sachin
Thesis advisor Prabhakar, Balaji, 1967-
Degree committee member Katti, Sachin
Degree committee member Prabhakar, Balaji, 1967-
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Serhat Arslan.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2024.
Location https://purl.stanford.edu/zj481vg3597

Access conditions

Copyright
© 2024 by Serhat Arslan
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...