Improving the robustness and accuracy of deep learning deployment on edge devices

Cidon, Eyal

Improving the robustness and accuracy of deep learning deployment on edge devices

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxr588df1785" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Deep learning models are increasingly being deployed on a vast array of edge devices, including a wide variety of phones, indoor and outdoor cameras, wearable devices and drones. These deep learning models are used for a variety of applications, including real-time speech translation, object recognition and object tracking. The ever-increasing diversity of edge devices, and their limited computational and storage capabilities, have led to significant efforts to optimize ML models for real-time inference on the edge. Yet, inference on the edge still faces two major challenges. First, the same ML model running on different edge devices may produce highly divergent outputs on a nearly identical input. Second, using edge-based models comes at the expense of accuracy relative to larger, cloud-based models. However, attempting to offload data to the cloud for processing consumes excessive bandwidth and adds latency due to constrained and unpredictable wireless network links. This dissertation tackles these two challenges by first characterizing their magnitude, and second, by designing systems that help developers deploy ML models on a wide variety of heterogeneous edge devices, while having the capability to offload data to cloud models. To address the first challenge, we examine the possible root causes for inconsistent efficacy across edge devices. To this end, we measure the variability produced by the device sensors, the device's signal processing hardware and software, and its operating system and processors. We present the first methodical characterization of the variations in model prediction across real-world mobile devices. Counter to prevailing wisdom, we demonstrate that accuracy is not a useful metric to characterize prediction divergence across devices, and introduce a new metric, Instability, which directly captures this variation. We characterize different sources for instability and show that differences in compression formats and image signal processing account for significant instability in object classification models. Notably, in our experiments, 14-17% of images produced divergent classifications across one or more phone models. We then evaluate three different techniques for reducing instability. Building on prior work on making models robust to noise, we design a new technique to fine-tune models to be robust to variations across edge devices. We demonstrate that our fine-tuning techniques reduce instability by 75%. To address the second challenge, of offloading computation to the cloud, we first demonstrate that running deep learning tasks purely on the edge device or purely on the cloud is too restrictive. Instead, we show how we can expand our design space to a modular edge-cloud cooperation scheme. We propose that data collection and distribution mechanisms should be co-designed with the eventual sensing objective. Specifically, we design a modular distributed Deep Neural Network (DNN) architecture that learns end-to-end how to represent the raw sensor data and send it over the network such that it meets the eventual sensing task's needs. Such a design intrinsically adapts to varying network bandwidths between the sensors and the cloud. We design DeepCut, a system that intelligently decides when to offload sensory data to the cloud, combining high accuracy with minimal bandwidth consumption, with no changes to edge and cloud models. DeepCut adapts to the dynamics of both the scene and network and only offloads when necessary and feasible using a lightweight offloading logic. DeepCut can flexibly tune the desired bandwidth utilization, allowing a developer to trade off bandwidth utilization and accuracy. DeepCut achieves results within 10-20% of an offline optimal offloading scheme.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Cidon, Eyal
Degree supervisor	Katti, Sachin
Thesis advisor	Katti, Sachin
Thesis advisor	McKeown, Nick
Thesis advisor	Rosenblum, Mendel
Degree committee member	McKeown, Nick
Degree committee member	Rosenblum, Mendel
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Eyal Cidon.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/xr588df1785

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...