Exploiting dataset similarity for distributed mining

8Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The notion of similarity is an important one in data mining. It can be used to pro vide useful structural information on data as w ell as enable clustering. In this paper we presen t an elegant method for measuring the similarity between homogeneous datasets. The algorithm presented is eÆcient in storage and scale, has the ability to adjust to time constraints. and can provide the user with likely causes of similarity or dis-similarity. One potential application of our similarity measure is in the distributed data mining domain. Using the notion of similarity across databases as a distance metric one cangenerate clusters of similar datasets. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The similarity measure is evaluated on a dataset from the Census Bureau, and synthetic datasets from IBM. ?© 2000 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Parthasarathy, S., & Ogihara, M. (2000). Exploiting dataset similarity for distributed mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1800 LNCS, pp. 399–406). Springer Verlag. https://doi.org/10.1007/3-540-45591-4_52

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free