Exploiting dataset similarity for distributed mining

Srinivasan Parthasarathy; Mitsunori Ogihara

Conference Proceedings

Exploiting dataset similarity for distributed mining

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1800 LNCS 399-406

DOI: 10.1007/3-540-45591-4_52

8Citations

9Readers

Get full text

Abstract

The notion of similarity is an important one in data mining. It can be used to pro vide useful structural information on data as w ell as enable clustering. In this paper we presen t an elegant method for measuring the similarity between homogeneous datasets. The algorithm presented is eÆcient in storage and scale, has the ability to adjust to time constraints. and can provide the user with likely causes of similarity or dis-similarity. One potential application of our similarity measure is in the distributed data mining domain. Using the notion of similarity across databases as a distance metric one cangenerate clusters of similar datasets. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The similarity measure is evaluated on a dataset from the Census Bureau, and synthetic datasets from IBM. ?© 2000 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Parthasarathy, S., & Ogihara, M. (2000). Exploiting dataset similarity for distributed mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1800 LNCS, pp. 399–406). Springer Verlag. https://doi.org/10.1007/3-540-45591-4_52

Exploiting dataset similarity for distributed mining

Abstract

Cite

Register to see more suggestions