Energy-performance tradeoffs in data centers and machine learning

Placeholder Show Content

Abstract/Contents

Abstract
The ability to scale computing over the coming decade is limited by power and energy: whether that be the power limits of individual processor chips due to heating effects, the ability of the power grid to supply a certain amount of power relative to the massive demand of hyperscale data centers, or the growing emphasis on reducing the carbon impact of computing. All of these causes benefit not only from revolutionary technological improvements, but also from fine-grained power consumption control. In this thesis, a general three tiered approach to control the performance--energy tradeoff in computing is presented, which combines tools from stochastic modeling, dynamic programming, and operating systems. A concrete instance is developed for the processor speed control setting, where a higher speed improves a program's slowdown performance at the convex cost of unit energy or power. For the modern slowdown performance metric, I show that the processor queue state must be modeled with its underlying multi-level structure otherwise the control policy will be sub-optimal even in expectation. To avoid the complexity of the complete multi-level policy solution, I develop an approximate control policy that accounts for the multi-level state and functions under any scheduling policy. While in some settings a clear performance--energy tradeoff is possible, in modern machine learning (and deep neural networks in particular) large gains can be achieved by actually redesigning the training algorithms themselves for energy efficiency. Distributed Distillation is one example of this in the small, power-limited devices setting. Presented in this thesis, Distributed Distillation achieves a 10,000x reduction in the power-hungry communication required for distributed on-device training compared to the vanilla distributed stochastic gradient decent algorithm. Both of these approaches (improved algorithms and tradeoff control policies) can be combined to produce even further improvement.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Mann, Ariana Joy
Degree supervisor Bambos, Nicholas
Thesis advisor Bambos, Nicholas
Thesis advisor Ozgur, Ayfer
Thesis advisor Rajagopal, Ram
Degree committee member Ozgur, Ayfer
Degree committee member Rajagopal, Ram
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Ariana J. Mann.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/qz590zt0478

Access conditions

Copyright
© 2023 by Ariana Joy Mann
License
This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).

Also listed in

Loading usage metrics...