Effective Discovery of Meaningful Outlier Relationships

Aline Bessa; Juliana Freire; Tamraparni Dasu; Divesh Srivastava

Journal ArticleOPEN ACCESS

Effective Discovery of Meaningful Outlier Relationships

Bessa A
Freire J
Dasu T
et al.

ACM/IMS Transactions on Data Science (2020) 1(2) 1-33

DOI: 10.1145/3385192

N/ACitations

11Readers

Abstract

We propose Predictable Outliers in Data-trendS (PODS) , a method that, given a collection of temporal datasets, derives data-driven explanations for outliers by identifying meaningful relationships between them. First, we formalize the notion of meaningfulness, which so far has been informally framed in terms of explainability. Next, since outliers are rare and it is difficult to determine whether their relationships are meaningful, we develop a new criterion that does so by checking if these relationships could have been predicted from non-outliers, i.e., whether we could see the outlier relationships coming . Finally, searching for meaningful outlier relationships between every pair of datasets in a large data collection is computationally infeasible. To address that, we propose an indexing strategy that prunes irrelevant comparisons across datasets, making the approach scalable. We present the results of an experimental evaluation using real datasets and different baselines, which demonstrates the effectiveness, robustness, and scalability of our approach.

Cite

CITATION STYLE

APA

Bessa, A., Freire, J., Dasu, T., & Srivastava, D. (2020). Effective Discovery of Meaningful Outlier Relationships. ACM/IMS Transactions on Data Science, 1(2), 1–33. https://doi.org/10.1145/3385192

Effective Discovery of Meaningful Outlier Relationships

Abstract

Cite

Register to see more suggestions