Scheduling and autoscaling methods for low latency applications

Placeholder Show Content

Abstract/Contents

Abstract
Modern web applications are commonly architected as collections of services, built to span increasingly large clusters of virtual machines (VMs), and deployed in the multi-tenant setting of the public cloud. As the number of users interacting with an application changes over time, services within an application are commonly scaled to meet the new demands of such changes. Operationally, applications are deployed onto VMs under the assumption that all VMs of a specific configuration are equally capable and are typically scaled to stabilize the utilization of services to a provided threshold. We find that VMs under the same specification exhibit variable performance and that utilization based autoscaling typically overprovisions resources leading to high deployment cost or underprovisions them resulting in high latency for an application. As such, we demonstrate that these assumptions made when deploying and scaling applications can introduce inefficiencies both in terms of application latency and deployment cost. In this dissertation, we present VM scheduling and autoscaling methods that tackle these inefficiencies and can help provide predictably low latency to latency-sensitive applications. We find that while VMs of a specification are considered equally capable, fine grained measurement data can reveal significant discrepancies between them. First, we present a VM selection and scheduling algorithm called LemonDrop. LemonDrop selects a cluster of VMs from an initial pool and aligns the application's communication patterns with the latencies of the resources it has chosen. LemonDrop aims to minimize aggregate cluster latency amongst the selected VMs. It does so by formulating the task of selection and scheduling as a natural Quadratic Assignment Problem which can be approximately solved within a few seconds. Across public clouds, we show that LemonDrop reduces the median and tail latencies of a benchmark E-commerce application by 1.1-2.3x on average depending on the size of the initial pool of VMs. Furthermore, LemonDrop improves order processing fairness in a benchmark financial exchange by up to 37x for the same or lower order processing latency. Second, we introduce an autoscaling method called COLA which efficiently learns autoscaling policies for microservice applications by iteratively identifying and optimizing bottleneck microservices. COLA accomplishes this by exploring the scaling of highly utilized microservices in terms of CPU and by only exploiting the scaling of microservices which disproportionately reduce end-to-end latency when scaled up. Once trained, COLA is run as a centralized controller, scaling application resources in response to observed workloads. By explicitly optimizing COLA to meet an end-to-end latency target we can meet this latency target with fewer resources, and cost, compared to policies optimizing other metrics such as utilization. Across several applications and compute settings in Google Cloud, COLA reduces cost by 1.34-52.28% depending on the application. Together, these techniques form new methodologies for deploying and scaling latency-sensitive applications which, when compared with existing methods, aligns more closely with the needs of application developers and end users.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Sachidananda, Vighnesh
Degree supervisor Prabhakar, Balaji, 1967-
Thesis advisor Prabhakar, Balaji, 1967-
Thesis advisor Rosenblum, Mendel
Thesis advisor Sivaraman, Anirudh
Degree committee member Rosenblum, Mendel
Degree committee member Sivaraman, Anirudh
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Vighnesh Sachidananda.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/xq718qd4043

Access conditions

Copyright
© 2022 by Vighnesh Sachidananda
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...