Scheduling and autoscaling methods for low latency applications
Abstract/Contents
- Abstract
- Modern web applications are commonly architected as collections of services, built to span increasingly large clusters of virtual machines (VMs), and deployed in the multi-tenant setting of the public cloud. As the number of users interacting with an application changes over time, services within an application are commonly scaled to meet the new demands of such changes. Operationally, applications are deployed onto VMs under the assumption that all VMs of a specific configuration are equally capable and are typically scaled to stabilize the utilization of services to a provided threshold. We find that VMs under the same specification exhibit variable performance and that utilization based autoscaling typically overprovisions resources leading to high deployment cost or underprovisions them resulting in high latency for an application. As such, we demonstrate that these assumptions made when deploying and scaling applications can introduce inefficiencies both in terms of application latency and deployment cost. In this dissertation, we present VM scheduling and autoscaling methods that tackle these inefficiencies and can help provide predictably low latency to latency-sensitive applications. We find that while VMs of a specification are considered equally capable, fine grained measurement data can reveal significant discrepancies between them. First, we present a VM selection and scheduling algorithm called LemonDrop. LemonDrop selects a cluster of VMs from an initial pool and aligns the application's communication patterns with the latencies of the resources it has chosen. LemonDrop aims to minimize aggregate cluster latency amongst the selected VMs. It does so by formulating the task of selection and scheduling as a natural Quadratic Assignment Problem which can be approximately solved within a few seconds. Across public clouds, we show that LemonDrop reduces the median and tail latencies of a benchmark E-commerce application by 1.1-2.3x on average depending on the size of the initial pool of VMs. Furthermore, LemonDrop improves order processing fairness in a benchmark financial exchange by up to 37x for the same or lower order processing latency. Second, we introduce an autoscaling method called COLA which efficiently learns autoscaling policies for microservice applications by iteratively identifying and optimizing bottleneck microservices. COLA accomplishes this by exploring the scaling of highly utilized microservices in terms of CPU and by only exploiting the scaling of microservices which disproportionately reduce end-to-end latency when scaled up. Once trained, COLA is run as a centralized controller, scaling application resources in response to observed workloads. By explicitly optimizing COLA to meet an end-to-end latency target we can meet this latency target with fewer resources, and cost, compared to policies optimizing other metrics such as utilization. Across several applications and compute settings in Google Cloud, COLA reduces cost by 1.34-52.28% depending on the application. Together, these techniques form new methodologies for deploying and scaling latency-sensitive applications which, when compared with existing methods, aligns more closely with the needs of application developers and end users.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Sachidananda, Vighnesh |
---|---|
Degree supervisor | Prabhakar, Balaji, 1967- |
Thesis advisor | Prabhakar, Balaji, 1967- |
Thesis advisor | Rosenblum, Mendel |
Thesis advisor | Sivaraman, Anirudh |
Degree committee member | Rosenblum, Mendel |
Degree committee member | Sivaraman, Anirudh |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Vighnesh Sachidananda. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/xq718qd4043 |
Access conditions
- Copyright
- © 2022 by Vighnesh Sachidananda
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...