Scheduling and autoscaling methods for low latency applications

Sachidananda, Vighnesh

Scheduling and autoscaling methods for low latency applications

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxq718qd4043" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Modern web applications are commonly architected as collections of services, built to span increasingly large clusters of virtual machines (VMs), and deployed in the multi-tenant setting of the public cloud. As the number of users interacting with an application changes over time, services within an application are commonly scaled to meet the new demands of such changes. Operationally, applications are deployed onto VMs under the assumption that all VMs of a specific configuration are equally capable and are typically scaled to stabilize the utilization of services to a provided threshold. We find that VMs under the same specification exhibit variable performance and that utilization based autoscaling typically overprovisions resources leading to high deployment cost or underprovisions them resulting in high latency for an application. As such, we demonstrate that these assumptions made when deploying and scaling applications can introduce inefficiencies both in terms of application latency and deployment cost. In this dissertation, we present VM scheduling and autoscaling methods that tackle these inefficiencies and can help provide predictably low latency to latency-sensitive applications. We find that while VMs of a specification are considered equally capable, fine grained measurement data can reveal significant discrepancies between them. First, we present a VM selection and scheduling algorithm called LemonDrop. LemonDrop selects a cluster of VMs from an initial pool and aligns the application's communication patterns with the latencies of the resources it has chosen. LemonDrop aims to minimize aggregate cluster latency amongst the selected VMs. It does so by formulating the task of selection and scheduling as a natural Quadratic Assignment Problem which can be approximately solved within a few seconds. Across public clouds, we show that LemonDrop reduces the median and tail latencies of a benchmark E-commerce application by 1.1-2.3x on average depending on the size of the initial pool of VMs. Furthermore, LemonDrop improves order processing fairness in a benchmark financial exchange by up to 37x for the same or lower order processing latency. Second, we introduce an autoscaling method called COLA which efficiently learns autoscaling policies for microservice applications by iteratively identifying and optimizing bottleneck microservices. COLA accomplishes this by exploring the scaling of highly utilized microservices in terms of CPU and by only exploiting the scaling of microservices which disproportionately reduce end-to-end latency when scaled up. Once trained, COLA is run as a centralized controller, scaling application resources in response to observed workloads. By explicitly optimizing COLA to meet an end-to-end latency target we can meet this latency target with fewer resources, and cost, compared to policies optimizing other metrics such as utilization. Across several applications and compute settings in Google Cloud, COLA reduces cost by 1.34-52.28% depending on the application. Together, these techniques form new methodologies for deploying and scaling latency-sensitive applications which, when compared with existing methods, aligns more closely with the needs of application developers and end users.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Sachidananda, Vighnesh
Degree supervisor	Prabhakar, Balaji, 1967-
Thesis advisor	Prabhakar, Balaji, 1967-
Thesis advisor	Rosenblum, Mendel
Thesis advisor	Sivaraman, Anirudh
Degree committee member	Rosenblum, Mendel
Degree committee member	Sivaraman, Anirudh
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Vighnesh Sachidananda.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/xq718qd4043

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...