Data center storage systems : performance analysis and network optimization

Placeholder Show Content

Abstract/Contents

Abstract
Data centers are large groups of networked servers -- typically on the order of tens of thousands of servers -- that power Internet services used by billions of people. As the Internet usage keeps growing and the Internet services get more sophisticated, it is evident that the data center performance will play an even bigger role in users' quality of experience in using these services. Thus, it becomes imperative that data center resources are scaled to cope with the projected increase in Internet services traffic. While data center performance can potentially be optimized along many dimensions, our dissertation focuses on analyzing and optimizing the performance of two key components of a typical data center. The first component is the storage systems, which store the data related to the services such as user state, profiles, accounting and authentication. In addition to maintaining the integrity and privacy of such data, storage systems should be extremely responsive in returning the data when queried by the services hosted by the data center. The second component is the network connecting the servers in the data center with each other. Clearly, the communication performance between the servers should be optimized as well to meet strict delay and bandwidth requirements of the services. However, optimizing the storage and the network involves a number of challenges, which should be effectively addressed to meet the critical performance requirements. This dissertation describes our efforts to address some of these challenges, which we detail below. In the first part of this dissertation, we analyze Facebook's Memcached deployment, which is a distributed memory caching system. By carefully studying this deployment, which is arguably the world's largest, we detail many characteristics of the caching workload and also reveal a number of surprising results: i) we find the GET/SET ratio to be 30:1, which is higher than what is assumed in the literature, ii) some applications of Memcached behave more like persistent storage than a cache, iii) strong locality metrics, such as keys accessed many millions of times a day, do not always suffice for a high hit rate, and iv) there is still room for efficiency and hit rate improvements in Memcached's implementation. We believe that the insights revealed by our analysis are critical to understand the performance of distributed memory caching systems and help design schemes to optimize their deployment. In the second part of the dissertation, we focus on optimizing the network performance along both Layer 2 and Layer 3 of the protocol stack. We first present R2D2, a method for rapid and reliable data delivery in data centers. R2D2 exploits the uniformity of data center network topology and latency to collapse individual Layer 3 flows into one meta-flow. Such state-sharing among multiple flows leads to a very simple and cost-effective method for making the inter-connection fabric reliable. We extensively test a prototype Linux implementation of R2D2 in 1 Gbps and 10 Gbps networks with a variety of switches and under different workloads. We find that it significantly improves TCP performance by preventing timeouts. We deploy R2D2 in a production environment with hundreds of servers and real world traffic and show that R2D2 performs at least as good as existing solutions which are much more expensive to implement than R2D2. Finally, we describe the QCN (Quantized Congestion Notification) algorithm and present a mathematical model for understanding its stability. QCN is a Layer 2 congestion control mechanism, which has been developed for the IEEE 802.1Qau standard (a part of the IEEE Data Center Bridging Task Group's efforts).

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Atikoǧlu, Berk
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Prabhakar, Balaji, 1967-
Thesis advisor Prabhakar, Balaji, 1967-
Thesis advisor Parulkar, Gurudatta M
Thesis advisor Rosenblum, Mendel
Advisor Parulkar, Gurudatta M
Advisor Rosenblum, Mendel

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Berk Atikoğlu.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Berk Atikoglu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...