Energy-performance tradeoffs in data centers and machine learning
- The ability to scale computing over the coming decade is limited by power and energy: whether that be the power limits of individual processor chips due to heating effects, the ability of the power grid to supply a certain amount of power relative to the massive demand of hyperscale data centers, or the growing emphasis on reducing the carbon impact of computing. All of these causes benefit not only from revolutionary technological improvements, but also from fine-grained power consumption control. In this thesis, a general three tiered approach to control the performance--energy tradeoff in computing is presented, which combines tools from stochastic modeling, dynamic programming, and operating systems. A concrete instance is developed for the processor speed control setting, where a higher speed improves a program's slowdown performance at the convex cost of unit energy or power. For the modern slowdown performance metric, I show that the processor queue state must be modeled with its underlying multi-level structure otherwise the control policy will be sub-optimal even in expectation. To avoid the complexity of the complete multi-level policy solution, I develop an approximate control policy that accounts for the multi-level state and functions under any scheduling policy. While in some settings a clear performance--energy tradeoff is possible, in modern machine learning (and deep neural networks in particular) large gains can be achieved by actually redesigning the training algorithms themselves for energy efficiency. Distributed Distillation is one example of this in the small, power-limited devices setting. Presented in this thesis, Distributed Distillation achieves a 10,000x reduction in the power-hungry communication required for distributed on-device training compared to the vanilla distributed stochastic gradient decent algorithm. Both of these approaches (improved algorithms and tradeoff control policies) can be combined to produce even further improvement.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Mann, Ariana Joy
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Department of Electrical Engineering
|Statement of responsibility
|Ariana J. Mann.
|Submitted to the Department of Electrical Engineering.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Ariana Joy Mann
- This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).
Also listed in
Loading usage metrics...