Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Deep learning (DL) has revolutionized unstructured data analytics. But in most cases, DL needs massive labeled datasets and large compute clusters, which hinders its adoption. These limitations can be overcome using a popular paradigm called deep transfer learning (DTL). With DTL, one adapts a pre-trained DL model instead of training a model from scratch. Thus, DTL reduces the massive training data and compute requirements to train a model. During adaptation, a common practice is to freeze most pre-trained model parts and adapt only the remaining. Since no single adaptation scheme is universally the best, one often evaluates several schemes, which is also called model selection. We also observed that data labeling for DTL is seldom a one-off process. One often updates their labeled data intermittently by adding new labeled data and performs model selection to evaluate the accuracy of the trained models. Today, one executes this workload by performing computations for the entire pre-trained model and repeats it for every model selection cycle. This approach results in redundant computations in frozen model parts and causes usability and system inefficiency issues. In this work, we reimagine DTL model selection in the presence of frozen layers as an instance of multi-query optimization and propose two optimizations that reduce redundant computations and training overheads. We implement our optimizations into a data system called Nautilus. Experiments with end-to-end workloads on benchmark datasets show that Nautilus reduces DTL model selection runtimes by up to 5X compared to the current practice.

Cite

CITATION STYLE

APA

Nakandala, S., & Kumar, A. (2022). Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 506–520). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517846

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free