Diagnosing machine learning pipelines with fine-grained lineage

14Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present the Hippo system to enable the diagnosis of distributed machine learning (ML) pipelines by leveraging fine-grained data lineage. Hippo exposes a concise yet powerful API, derived from primitive lineage types, to capture fine-grained data lineage for each data transformation. It records the input datasets, the output datasets and the cell-level mapping between them. It also collects sufficient information that is needed to reproduce the computation. Hippo efficiently enables common ML diagnosis operations such as code debugging, result analysis, data anomaly removal, and computation replay. By exploiting the metadata separation and high-order function encoding strategies, we observe an O (103)x total improvement in lineage storage efficiency vs. the baseline of cell-wise mapping recording while maintaining the lineage integrity. Hippo can answer the real use case lineage queries within a few seconds, which is low enough to enable interactive diagnosis of ML pipelines.

Cite

CITATION STYLE

APA

Zhang, Z., Sparks, E. R., & Franklin, M. J. (2017). Diagnosing machine learning pipelines with fine-grained lineage. In HPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (pp. 143–153). Association for Computing Machinery, Inc. https://doi.org/10.1145/3078597.3078603

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free