Improving datacenter network performance with multicast
- The network constitutes a significant portion of a datacenter's cost and its performance is critical to scaling datacenter applications. Datacenter operators thus strive to efficiently utilize network resources, while providing high, predictable network performance for applications. Recent trends in large-scale data storage and processing result in increasingly more data replicated within a datacenter for parallel access and fault tolerance. State-of-the-art systems employ unicast-based replication, which uses network bandwidth inefficiently. IP multicast enables data replication within the network at line rate. However, IP multicast is not congestion controlled, and therefore is incompatible with the dominant datacenter transport, TCP. In this thesis, I investigate an enhancement to TCP --- Congestion-controlled Single-source Multicast Optimization (TCP-COSMO). TCP-COSMO adds support for multicast transmissions to TCP, while congestion controlling multicast flows at high line rate (10Gbps). I show that, with multicast replication, one can scale distributed storage system read/write rates linearly with offered load, and reduce tail write latency by up to two orders of magnitude compared to to existing unicast-based replication schemes. I extend TCP-COSMO to support queue-aware congestion control algorithms like Datacenter TCP. This allows running throughput-oriented multicast flows without degrading the performance of concurrent, short-lived latency-sensitive flows. I further present PredNet, which integrates multicast into systems that provide bandwidth guarantees in datacenters. With bandwidth guarantees, I demonstrate predictable multicast transfer times and rapid convergence of bandwidth shares among competing sets of flows. Furthermore, I study multipath multicast forwarding for leaf-spine datacenter networks. I show that multipath forwarding enables multicast packet replication in the spine layer without causing extra congestion. I also leverage the network topology to scale the effective capacity of limited hardware multicast forwarding state. I show that distributed storage systems can scale to tens of thousands of servers participating in multicast write replication without exhausting multicast forwarding state in switches.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Chan, Chi Fung Michael
|Stanford University, Department of Computer Science.
|Cheriton, David R
|Cheriton, David R
|Statement of responsibility
|Chi Fung Michael Chan.
|Submitted to the Department of Computer Science.
|Thesis (Ph.D.)--Stanford University, 2014.
- © 2014 by Chi Fung Michael Chan
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...