Performance controlled data reduction for knowledge discovery in distributed databases

Slobodan Vucetic; Zoran Obradovic

Conference Proceedings

Performance controlled data reduction for knowledge discovery in distributed databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1805 29-39

DOI: 10.1007/3-540-45571-x_6

11Citations

4Readers

Get full text

Abstract

The objective of data reduction is to obtain a compact representation of a large data set to facilitate repeated use of non-redundant information with complex and slow learning algorithms and to allow efficient data transfer and storage. For a user-controllable allowed accuracy loss we propose an effective data reduction procedure based on guided sampling for identifying a minimal size representative subset, followed by a model-sensitivity analysis for determining an appropriate compression level for each attribute. Experiments were performed on 3 large data sets and, depending on an allowed accuracy loss margin ranging from 1% to 5% of the ideal generalization, the achieved compression rates ranged between 95 and 12,500 times. These results indicate that transferring reduced data sets from multiple locations to a centralized site for an efficient and accurate knowledge discovery might often be possible in practice.

Author supplied keywords

Cite

CITATION STYLE

APA

Vucetic, S., & Obradovic, Z. (2000). Performance controlled data reduction for knowledge discovery in distributed databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1805, pp. 29–39). Springer Verlag. https://doi.org/10.1007/3-540-45571-x_6

Performance controlled data reduction for knowledge discovery in distributed databases

Abstract

Author supplied keywords

Cite

Register to see more suggestions