Modeling sequences with structured state spaces
Abstract/Contents
- Abstract
- Substantial recent progress in machine learning has been driven by advances in sequence models, which form the backbone of deep learning models that have achieved widespread success across scientific applications. However, existing methods require extensive specialization to different tasks, modalities, and capabilities; suffer from computational efficiency bottlenecks; and have difficulty modeling more complex sequential data, such as when long dependencies are involved. As such, it remains of fundamental importance to continue to develop principled and practical methods for modeling general sequences. This thesis develops a new approach to deep sequence modeling using state space models, a flexible method that is theoretically grounded, computationally efficient, and achieves strong results across a wide variety of data modalities and applications. First, we introduce a class of models with numerous representations and properties that generalize the strengths of standard deep sequence models such as recurrent neural networks and convolutional neural networks. However, we show that computing these models can be challenging, and develop new classes of structured state spaces that are very fast on modern hardware, both when scaling to long sequences and in other settings such as autoregressive inference. Finally, we present a novel mathematical framework for incrementally modeling continuous signals, which can be combined with state space models to endow them with principled state representations and improve their ability to model long-range dependencies. Together, this new class of methods provides effective and versatile building blocks for machine learning models, especially towards addressing general sequential data at scale.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Gu, Albert | |
---|---|---|
Degree supervisor | Ré, Christopher | |
Thesis advisor | Ré, Christopher | |
Thesis advisor | Liang, Percy | |
Thesis advisor | Linderman, Scott | |
Degree committee member | Liang, Percy | |
Degree committee member | Linderman, Scott | |
Associated with | Stanford University, School of Engineering | |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Albert Gu. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/mb976vf9362 |
Access conditions
- Copyright
- © 2023 by Albert Gu
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...