Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

Supun Nakandala; Arun Kumar

Conference ProceedingsOPEN ACCESS

Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

Proceedings of the ACM SIGMOD International Conference on Management of Data (2022) 506-520

DOI: 10.1145/3514221.3517846

3Citations

11Readers

Abstract

Deep learning (DL) has revolutionized unstructured data analytics. But in most cases, DL needs massive labeled datasets and large compute clusters, which hinders its adoption. These limitations can be overcome using a popular paradigm called deep transfer learning (DTL). With DTL, one adapts a pre-trained DL model instead of training a model from scratch. Thus, DTL reduces the massive training data and compute requirements to train a model. During adaptation, a common practice is to freeze most pre-trained model parts and adapt only the remaining. Since no single adaptation scheme is universally the best, one often evaluates several schemes, which is also called model selection. We also observed that data labeling for DTL is seldom a one-off process. One often updates their labeled data intermittently by adding new labeled data and performs model selection to evaluate the accuracy of the trained models. Today, one executes this workload by performing computations for the entire pre-trained model and repeats it for every model selection cycle. This approach results in redundant computations in frozen model parts and causes usability and system inefficiency issues. In this work, we reimagine DTL model selection in the presence of frozen layers as an instance of multi-query optimization and propose two optimizations that reduce redundant computations and training overheads. We implement our optimizations into a data system called Nautilus. Experiments with end-to-end workloads on benchmark datasets show that Nautilus reduces DTL model selection runtimes by up to 5X compared to the current practice.

Author supplied keywords

Cite

CITATION STYLE

APA

Nakandala, S., & Kumar, A. (2022). Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 506–520). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517846

Nautilus: An Optimized System for Deep Transfer Learning over Evolving Training Datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions