Abstract
We propose Predictable Outliers in Data-trendS (PODS) , a method that, given a collection of temporal datasets, derives data-driven explanations for outliers by identifying meaningful relationships between them. First, we formalize the notion of meaningfulness, which so far has been informally framed in terms of explainability. Next, since outliers are rare and it is difficult to determine whether their relationships are meaningful, we develop a new criterion that does so by checking if these relationships could have been predicted from non-outliers, i.e., whether we could see the outlier relationships coming . Finally, searching for meaningful outlier relationships between every pair of datasets in a large data collection is computationally infeasible. To address that, we propose an indexing strategy that prunes irrelevant comparisons across datasets, making the approach scalable. We present the results of an experimental evaluation using real datasets and different baselines, which demonstrates the effectiveness, robustness, and scalability of our approach.
Cite
CITATION STYLE
Bessa, A., Freire, J., Dasu, T., & Srivastava, D. (2020). Effective Discovery of Meaningful Outlier Relationships. ACM/IMS Transactions on Data Science, 1(2), 1–33. https://doi.org/10.1145/3385192
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.