Hardware-aware algorithms for efficient machine learning

Dao Phuc Quang, Tri

Hardware-aware algorithms for efficient machine learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fsf563fp9953" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Machine learning (ML) training will continue to grow to consume more cycles, their inference will proliferate on more kinds of devices, and their capabilities will be used in more domains. Some goals central to this future are to make ML models efficient so they remain practical to train and deploy, and to unlock new application domains with new capabilities. We describe some recent developments in hardware-aware algorithms to improve the efficiency-quality tradeoff of ML models and equip them with long context. In Chapter 2, we focus on structured sparsity, a natural approach to mitigate the extensive compute and memory cost of large ML models. We describe a line of work on learnable fast transforms that, thanks to their expressiveness and efficiency, yields some of the first sparse training methods to speed up large models in wall-clock time (2x) without compromising their quality. In Chapter 3, we focus on efficient Transformer training and inference for long sequences. We describe FlashAttention, a fast and memory-efficient algorithm to compute attention with no approximation. By careful accounting of reads/writes between different levels of memory hierarchy, FlashAttention is 2-4x faster and uses 10-20x less memory compared to the best existing attention implementations, allowing us to train higher-quality Transformers with 8x longer context. FlashAttention is now widely used in some of the largest research labs and companies. In Chapter 4, we examine state-space models, a promising architecture designed for long-range memory. As we seek to understand why early state-space models did not perform well on language modeling tasks, we propose simple multiplicative interaction that expands their expressiveness. We also design hardware-friendly algorithms to train them. As a result, we are able to train state-space models to multi-billion parameter scale, demonstrating a new kind of model competitive with the dominant Transformers in language modeling. We conclude with some exciting directions in ML and systems, such as software-hardware co-design, structured sparsity for scientific AI, and long context for new AI workflows and modalities.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Dao Phuc Quang, Tri
Degree supervisor	Ermon, Stefano
Degree supervisor	Ré, Christopher
Thesis advisor	Ermon, Stefano
Thesis advisor	Ré, Christopher
Thesis advisor	Olukotun, Oyekunle Ayinde
Degree committee member	Olukotun, Oyekunle Ayinde
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Tri Dao.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/sf563fp9953

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...