Reconciling high efficiency with low latency in the datacenter

Placeholder Show Content

Abstract/Contents

Abstract
Web services are an integral part of today's society, with billions of people using the Internet regularly. The Internet's popularity is in no small part due to the near-instantaneous access to large amounts of personalized information. Online services such as web search (Bing, Google), social networking (Facebook, Twitter, LinkedIn), online maps (Bing Maps, Google Maps, NavQuest), machine translation (Bing Translate, Google Translate), and webmail (GMail, Outlook.com) are all portals to vast amounts of information, filtered to only show relevant results with sub-second response times. The new services and capabilities enabled by these services are also responsible for huge economic growth. Online services are typically hosted in warehouse-scale computers located in large datacenters. These datacenters are run at a massive scale in order to take advantage of economies of scale. A single datacenter can comprise of 50,000 servers, draw tens of megawatts of power, and cost hundreds of millions to billions of dollars to construct. When considering the large numbers of datacenters worldwide, the total impact of datacenters is quite significant. For instance, the electricity consumed by all datacenters is equivalent to the output of 30 large nuclear power plants. At the same time, demand for additional compute capacity of datacenters is on the rise because of the rapid growth in Internet users and the increase in computational complexity of online services. This dissertation focuses on improving datacenter efficiency in the face of latency-critical online services. There are two major components of this effort. The first is to improve the energy efficiency of datacenters, which will improve the operational expenses of the datacenter and help mitigate the growing environmental footprint of operating datacenters. The first two systems we introduce, autoturbo and PEGASUS, fall under this category. The second efficiency opportunity we pursue is to increase the resource efficiency of datacenters by enabling higher utilization. Higher resource efficiency leads to significantly increased capabilities without increasing the capital expenses of owning a datacenter and is critical to future scaling of datacenter capacity. The third system we describe, Heracles, targets the resource efficiency opportunity for current and future datacenters. There are two avenues of improving energy efficiency that we investigate. We examine methods of improving energy efficiency of servers when they are running at peak load and when they are not. Both cases are important because of diurnal load variations on latency-critical online services that can cause the utilization of servers to vary from idle to full load in a 24 hour period. Latency-critical workloads present a unique set of challenges that have made improving their energy efficiency difficult. Previous approaches in power management have run afoul of the performance sensitivity of latency-critical workloads. Furthermore, latency-critical workloads do not contain sufficient periods of idleness, complicating efforts reduce their power footprint via deep-sleep states. In addition to improving energy efficiency, this dissertation also studies the improvement of resource efficiency. This opportunity takes advantage of the fact that datacenters are chronically run at low utilizations, with an industry average of 10%-50% utilization. Ironically, the low utilization of datacenters is not caused by a lack of work, but rather because of fears of performance interference between different workloads. Large-scale latency-critical workloads exacerbate this problem, as they are typically run on dedicated servers or with greatly exaggerated resource reservations. Thus, high resource efficiency through high utilization is obtained by enabling workloads to co-exist with each other on the same server without causing performance degradation. In this dissertation, we describe three practical systems to improve the efficiency of datacenters. Autoturbo uses machine learning to improve the efficiency of servers running at peak load for a variety of energy efficiency metrics. By intelligently selecting the proper power mode on modern CPUs, autoturbo can improve Energy Delay Product by up to 47%. PEGASUS improves energy efficiency for large-scale latency-critical workloads by using a feedback loop to safely reduce the power consumed by servers at low utilizations. An evaluation of PEGASUS on production Google websearch yields power savings of up to 20% on a full-sized production cluster. Finally, Heracles improves datacenter utilization by performing coordinated resource isolation on servers to ensure that latency-critical workloads will still meet their latency guarantees, enabling other jobs to be co-located on the same server. We tested Heracles on several production Google workloads and demonstrated an average server utilization of 90%, opening up the potential for integer multiple increases in resource and cost efficiency.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Lo, David
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Olukotun, Oyekunle Ayinde
Thesis advisor Rosenblum, Mendel
Advisor Olukotun, Oyekunle Ayinde
Advisor Rosenblum, Mendel

Subjects

Genre Theses

Bibliographic information

Statement of responsibility David Lo.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by David Lo

Also listed in

Loading usage metrics...