Efficient sampling: Application to image data

Surong Wang; Manoranjan Dash; Liang Tien Chia

Conference Proceedings

Efficient sampling: Application to image data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3518 LNAI 452-463

DOI: 10.1007/11430919_53

4Citations

6Readers

Get full text

Abstract

Sampling is an important preprocessing algorithm that is used to mine large data efficiently. Although a simple random sample often works fine for reasonable sample size, accuracy falls sharply with reduced sample size. In KDD'03 we proposed EASE that outputs a sample based on its 'closeness' to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. 1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. 2) EASE was shown to work on IBM QUEST dataset which is a categorical count dataset. EASIER, in addition, is shown to work on continuous data such as Color Structure Descriptor of images. Two mining tasks, classification and association rule mining, are used to validate the efficacy of EASIER samples vis-a-vis EASE and SRS samples. © Springer-Verlag Berlin Heidelberg 2005.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, S., Dash, M., & Chia, L. T. (2005). Efficient sampling: Application to image data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3518 LNAI, pp. 452–463). Springer Verlag. https://doi.org/10.1007/11430919_53

Efficient sampling: Application to image data

Abstract

Author supplied keywords

Cite

Register to see more suggestions