Improving datacenter network performance with multicast

Placeholder Show Content

Abstract/Contents

Abstract
The network constitutes a significant portion of a datacenter's cost and its performance is critical to scaling datacenter applications. Datacenter operators thus strive to efficiently utilize network resources, while providing high, predictable network performance for applications. Recent trends in large-scale data storage and processing result in increasingly more data replicated within a datacenter for parallel access and fault tolerance. State-of-the-art systems employ unicast-based replication, which uses network bandwidth inefficiently. IP multicast enables data replication within the network at line rate. However, IP multicast is not congestion controlled, and therefore is incompatible with the dominant datacenter transport, TCP. In this thesis, I investigate an enhancement to TCP --- Congestion-controlled Single-source Multicast Optimization (TCP-COSMO). TCP-COSMO adds support for multicast transmissions to TCP, while congestion controlling multicast flows at high line rate (10Gbps). I show that, with multicast replication, one can scale distributed storage system read/write rates linearly with offered load, and reduce tail write latency by up to two orders of magnitude compared to to existing unicast-based replication schemes. I extend TCP-COSMO to support queue-aware congestion control algorithms like Datacenter TCP. This allows running throughput-oriented multicast flows without degrading the performance of concurrent, short-lived latency-sensitive flows. I further present PredNet, which integrates multicast into systems that provide bandwidth guarantees in datacenters. With bandwidth guarantees, I demonstrate predictable multicast transfer times and rapid convergence of bandwidth shares among competing sets of flows. Furthermore, I study multipath multicast forwarding for leaf-spine datacenter networks. I show that multipath forwarding enables multicast packet replication in the spine layer without causing extra congestion. I also leverage the network topology to scale the effective capacity of limited hardware multicast forwarding state. I show that distributed storage systems can scale to tens of thousands of servers participating in multicast write replication without exhausting multicast forwarding state in switches.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Chan, Chi Fung Michael
Associated with Stanford University, Department of Computer Science.
Primary advisor Cheriton, David R
Thesis advisor Cheriton, David R
Thesis advisor McKeown, Nick
Thesis advisor Rosenblum, Mendel
Advisor McKeown, Nick
Advisor Rosenblum, Mendel

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Chi Fung Michael Chan.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Chi Fung Michael Chan
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...