Energy-performance tradeoffs in data centers and machine learning

Mann, Ariana Joy

Energy-performance tradeoffs in data centers and machine learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fqz590zt0478" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The ability to scale computing over the coming decade is limited by power and energy: whether that be the power limits of individual processor chips due to heating effects, the ability of the power grid to supply a certain amount of power relative to the massive demand of hyperscale data centers, or the growing emphasis on reducing the carbon impact of computing. All of these causes benefit not only from revolutionary technological improvements, but also from fine-grained power consumption control. In this thesis, a general three tiered approach to control the performance--energy tradeoff in computing is presented, which combines tools from stochastic modeling, dynamic programming, and operating systems. A concrete instance is developed for the processor speed control setting, where a higher speed improves a program's slowdown performance at the convex cost of unit energy or power. For the modern slowdown performance metric, I show that the processor queue state must be modeled with its underlying multi-level structure otherwise the control policy will be sub-optimal even in expectation. To avoid the complexity of the complete multi-level policy solution, I develop an approximate control policy that accounts for the multi-level state and functions under any scheduling policy. While in some settings a clear performance--energy tradeoff is possible, in modern machine learning (and deep neural networks in particular) large gains can be achieved by actually redesigning the training algorithms themselves for energy efficiency. Distributed Distillation is one example of this in the small, power-limited devices setting. Presented in this thesis, Distributed Distillation achieves a 10,000x reduction in the power-hungry communication required for distributed on-device training compared to the vanilla distributed stochastic gradient decent algorithm. Both of these approaches (improved algorithms and tradeoff control policies) can be combined to produce even further improvement.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Mann, Ariana Joy
Degree supervisor	Bambos, Nicholas
Thesis advisor	Bambos, Nicholas
Thesis advisor	Ozgur, Ayfer
Thesis advisor	Rajagopal, Ram
Degree committee member	Ozgur, Ayfer
Degree committee member	Rajagopal, Ram
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Ariana J. Mann.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/qz590zt0478

Access conditions

License: This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).

Also listed in

View in SearchWorks

Loading usage metrics...