The difference and the norm — Characterising similarities and differences between databases

Kailash Budhathoki; Jilles Vreeken

Conference ProceedingsOPEN ACCESS

The difference and the norm — Characterising similarities and differences between databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9285 206-223

DOI: 10.1007/978-3-319-23525-7_13

8Citations

8Readers

Abstract

Suppose we are given a set of databases, such as sales records over different branches. How can we characterise the differences and the norm between these datasets? That is, what are the patterns that characterize the general distribution, and what are those that are important to describe the individual datasets? We study how to discover these pattern sets simultaneously and without redundancy – automatically identifying those patterns that aid describing the overall distribution, as well as those pointing out those that are characteristic for specific databases.We define the problem in terms of the Minimum Description Length principle, and propose the DiffNorm algorithm to approximate the MDLoptimal summary directly from data. Empirical evaluation on synthetic and real-world data shows that DiffNorm efficiently discovers descriptions that accurately characterise the difference and the norm in easily understandable terms.

Cite

CITATION STYLE

APA

Budhathoki, K., & Vreeken, J. (2015). The difference and the norm — Characterising similarities and differences between databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9285, pp. 206–223). Springer Verlag. https://doi.org/10.1007/978-3-319-23525-7_13

The difference and the norm — Characterising similarities and differences between databases

Abstract

Cite

Register to see more suggestions