Abstractions for scaling stateful cloud applications

Placeholder Show Content

Abstract/Contents

Abstract
As the scale of both computing and data grows, developers are increasingly building distributed stateful systems in the cloud. However, these systems are challenging to build at scale because they must provide fault tolerance and consistency for stateful computations while managing both compute and data resources. Thus, we need new high-level abstractions that hide the complexity of distributed state management from developers. This dissertation proposes three such abstractions at multiple levels of the stack of a stateful cloud application. The first part of this dissertation targets cloud application developers, proposing Apiary, a database-oriented transactional function-as-a-service (FaaS) platform for stateful cloud applications. FaaS is an increasingly popular programming model because it abstracts away resource management concerns and reduces the complexity of cloud deployment, but existing FaaS platforms struggle to efficiently or reliably serve stateful applications. Apiary solves this problem by tightly integrating function execution with data management, improving FaaS performance on stateful applications by 2-68x while providing fault tolerance and strong transactional guarantees. The second part of this dissertation targets developers of the data management systems on which stateful cloud apps depend, proposing data-parallel actors (DPA), a framework for scaling data management systems. DPA targets an increasingly important class of data management systems called query serving systems, which are characterized by data-parallel, low-latency computations and frequent bulk data updates. DPA allows developers to construct query serving systems from purely single-node components while automatically providing critical properties such as data replication, fault tolerance, and update consistency. We use DPA to build a new query serving system, a simplified data warehouse based on MonetDB, and port existing ones, such as Druid, Solr, and MongoDB, enhancing them with new features such as a novel parallelism-optimizing data placement policy that improves query tail latency by 7-64%. The third part of this dissertation targets application developers utilizing multiple data management systems, proposing Epoxy, a protocol for providing ACID transactions across diverse data stores. Such applications are increasingly common because developers often use multiple data stores to manage heterogeneous data, for example doing transaction processing in Postgres and text search in Elasticsearch while storing image data in a cloud object store like AWS S3. To provide transactional guarantees for these applications, Epoxy adapts multi-version concurrency control to a cross-data store setting. We implement Epoxy for five data stores: Postgres, Elasticsearch, MongoDB, Google Cloud Storage, and MySQL, finding it outperforms existing distributed transaction protocols like XA while providing stronger guarantees and supporting more systems.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Kraft, Peter, (Researcher in computer science)
Degree supervisor Bailis, Peter
Degree supervisor Zaharia, Matei
Thesis advisor Bailis, Peter
Thesis advisor Zaharia, Matei
Thesis advisor Ousterhout, John K
Degree committee member Ousterhout, John K
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Peter Kraft.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/xd205zb2714

Access conditions

Copyright
© 2023 by Peter Kraft
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...