Designing datacenter transports for low latency and high throughput

Placeholder Show Content

Abstract/Contents

Abstract
Recent trends in datacenter computing have created new operating conditions for network transport protocols. One of these trends is applications that operate at extreme low latency. Modern datacenter networking hardware offers the potential for very low latency communication; round-trip times of 5µs or less are now possible for short messages. A variety of applications have arisen that can utilize this low latency because their workloads are dominated by very short messages (a few hundred bytes or less); Facebook's Memcached and RAMCloud storage system are two instances of such low latency applications. Transport protocols, however, are traditionally designed to achieve high throughput rather than low latency. They sometimes even sacrifice low latency for the sake of achieving high throughput; TCP and TCP-like transports, which are the de facto transports used in most applications today are examples of high throughput transports. These transports are known to impose high latency for short messages when the network load is high. That's because their sender driven congestion control mechanism creates large queues in the network and short messages can experience long head-of-line blocking delays in these queues. So even though these large queues can bring high throughput and bandwidth utilization, they can extremely hurt short message latencies. In this thesis we postulate that low latency and high throughput are not mutually exclusive goals for transport protocols. In fact we design a new transport protocol named Homa that can achieve both of these goals in datacenter networks. Homa is a transport protocol for datacenter networks that provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network bandwidth utilization. Homa uses in-network priority queues to ensure low latency for short messages; priority allocation is managed dynamically by each receiver and integrated with a receiver-driven flow control mechanism. Homa also uses controlled overcommitment of receiver downlinks to ensure efficient bandwidth utilization at high load. We evaluate Homa in both simulations and a real system implementation. Our implementation of Homa delivers 99th percentile round-trip latencies less than 15 µs for short messages on a 10 Gbps network running at 80% network load. These latencies are almost 100x lower than the best published measurements of an implementation in prior works. In simulations, Homa's latency is roughly equal to pFabric and significantly better than pHost, PIAS, and NDP for almost all message sizes and workloads. Homa can also sustain higher network loads than pFabric, pHost, or PIAS.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Montazeri Najafabadi, Behnam
Degree supervisor Ousterhout, John K
Thesis advisor Ousterhout, John K
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Rosenblum, Mendel
Degree committee member Kozyrakis, Christoforos, 1974-
Degree committee member Rosenblum, Mendel
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Behnam Montazeri Najafabadi.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Behnam Montazeri Najafabadi
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...