Reptile: Aggregation-level Explanations for Hierarchical Data

2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Users often can see from overview-level statistics that some results look "off", but are rarely able to characterize even the type of error. Reptile is an iterative human-in-the-loop explanation and cleaning system for errors in hierarchical data. Users specify an anomalous distributive aggregation result (a complaint), and Reptile recommends drill-down operations to help the user "zoom-in"on the underlying errors. Unlike prior explanation systems that intervene on raw records, Reptile intervenes by learning a group's expected statistics, and ranks drill-down sub-groups by how much the intervention fixes the complaint. This group-level formulation supports a wide range of error types (missing, duplicates, value errors) and uniquely leverages the distributive properties of the user complaint. Further, the learning-based intervention lets users provide domain expertise that Reptile learns from. In each drill-down iteration, Reptile must train a large number of predictive models. We thus extend factorized learning from count-join queries to aggregation-join queries, and develop a suite of optimizations that leverage the data's hierarchical structure. These optimizations reduce runtimes by >6× compared to a Lapack-based implementation. When applied to real-world Covid-19 and African farmer survey data, Reptile correctly identifies 21/30 (vs 2 using existing explanation approaches) and 20/22 errors. Reptile has been deployed in Ethiopia and Zambia, and used to clean nation-wide farmer survey data; the clean data has been used to design national drought insurance policies.

Cite

CITATION STYLE

APA

Huang, Z., & Wu, E. (2022). Reptile: Aggregation-level Explanations for Hierarchical Data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 399–413). Association for Computing Machinery. https://doi.org/10.1145/3514221.3517854

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free