Angela Jiang

Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization Degree Type: Ph.D. in Computer Science
Advisor(s): Greg Ganger
Graduated: May 2020

Abstract:

Over the past decade, deep learning has demonstrated state-of-the-art accuracy on challenges posed by computer vision and natural language processing, revolutionizing these fields in the process. Deep learning models are now a fundamental building block for applications such as autonomous driving, medical imaging, and neural machine translation. However, many challenges remain when deploying these models in production. Researchers and practitioners must address a diversity of questions, including how to efficiently design, train, and deploy resource-intensive deep learning models and how to automate these approaches while ensuring robustness to changing conditions.

This dissertation provides and evaluates new ways to improve the efficiency of deep learning training and inference, as well as the underlying systems' robustness to changes in the environment. We address these issues by focusing on the many hyperparameters that are tuned to optimize the model's accuracy and resource usage. These hyperparameters include the choice of model architecture, the training dataset, the optimization algorithm, the hyperparameters of the optimization algorithm (e.g., the learning rate and momentum) and the training time budget. Currently, in practice, almost all hyperparameters are tuned once before training and held static thereafter. This is suboptimal as the conditions that dictate the best hyperparameter value change over time (e.g., as training progresses or when hardware used for inference is replaced). We apply dynamic tuning to hyperparameters that have traditionally been considered static. Using three case studies, we show that using runtime information to dynamically adapt hyperparameters that are traditionally static can increase the efficiency of machine learning training and inference.

First, we propose and analyze Selective-Backprop, a new importance sampling approach that prioritizes examples with high loss in an online fashion. In Selective-Backprop, the examples considered challenging is a tunable hyperparameter. By prioritizing these challenging examples, Selective-Backprop trains to a given target error rate up to 3.5x faster than static approaches.

Next, we explore AdaptSB, a variant of Selective-Backprop that dynamically adapts how we prioritize challenging examples. In Selective-Backprop, the priority assigned to examples of differing degrees of difficulty is held static. In AdaptSB, we treat the priority assigned to different classes of examples as a tunable hyperparameter. By dynamically tailoring example prioritization to the dataset and stage in training, AdaptSB outperforms Selective-Backprop on datasets with label error.

Finally, we propose and analyze Mainstream, a video analysis system that adapts concurrent applications sharing fixed-edge resources to maximize aggregate result quality. In mainstream, we consider the degree of application sharing to be a tunable parameter. Mainstream automatically determines at deployment time the right trade-off between using more specialized DNNs to improve per-frame accuracy and keeping more of an unspecialized base model. We show that Mainstream improves mean event detection F1-scores by up to 87x, compared to static approaches.

Thesis Committee:
Gregory R. Ganger (Chair)
David G. Andersen
Michael Kaminsky (BrdgAI)
Michael Kozuch (Intel Labs)
Padmanabhan S. Pillai (Intel Labs)
Rahul Sukthankar (Google Research)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Machine learning, deep learning, computer vision, edge computing, hyperparameters

CMU-CS-20-112.pdf (7.54 MB) ( 94 pages)
Copyright Notice