Dynamic resource and power management in virtualized data centers

Placeholder Show Content

Abstract/Contents

Abstract
Cloud computing is a multi-billion dollar industry that has permanently changed the internet landscape. As public demand for cloud computing and cloud-based services grows, technology companies are required to build a progressively larger network of data centers to keep apace with expanding traffic loads. The operational cost of powering and cooling many thousands of servers is a growing financial strain that has already reached a critical point across the industry, with data centers additionally posing an emerging environmental hazard as a significant contributor to carbon emissions. Energy-efficient resource management is a looming imperative that can save companies billions of dollars and curb the adverse effects of data centers on the environment. Resource utilization rates in modern data centers are low, typically operating below 30-50\% of capacity. Dynamic Power Management is a multidisciplinary research area that aims to reduce energy consumption by dynamically adjusting the power draw of elements within a data center to match an evolving compute load profile, maximizing utilization and minimizing the energy used on idle processes. A natural tension arises, however, between decreased energy consumption and increased congestion or latency, which is exacerbated by stochastic forces that introduce uncertainty and impede full elasticity with added risks, delays, and switching costs. In this thesis, we explore various stochastic models to optimize the energy and congestion tradeoff in Dynamic Power Management applications for virtualized data centers. We use Dynamic Programming as a template to frame and solve these systems, as well as related methods in Approximate Dynamic Programming to reach beyond the practical limitations of standard Dynamic Programming solution concepts. In this way, we are able to address and implement models at multiple layers of the technology stack. In the first part of this thesis, we consider cloud applications with partial execution times. Iterative algorithms, which are increasingly popular in internet search and traffic classification, introduce a dimension of quality that induces a three-way tradeoff among output quality, power draw, and latency. Next, we study resource management for virtual machines. The management of resources in a virtual environment is complicated by primitives that are constantly in flux, including bursty traffic and supply and demand dynamics that drive the effective delays and costs of acquiring computing resources. We develop Markov models to capture these effects and leverage their Markov properties to find resource management policies in systems with incomplete information and exploitable market trends. Finally, we consider voltage and frequency scaling in a network switch. We discuss the limitations of Dynamic Programming in this setting and implement two alternative methods that exploit load-balancing and convexity properties to derive an approximate solution.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Valdez-Vivas, Martin Roberto
Associated with Stanford University, Department of Management Science and Engineering.
Primary advisor Bambos, Nicholas
Thesis advisor Bambos, Nicholas
Thesis advisor Apostolopoulos, John G
Thesis advisor Weyant, John P. (John Peter)
Advisor Apostolopoulos, John G
Advisor Weyant, John P. (John Peter)

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Martin Roberto Valdez-Vivas.
Note Submitted to the Department of Management Science and Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Martin Roberto Valdez-Vivas
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...