Luster sampling to improve classifier accuracy for numeric data

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering is one of the essential techniques to group similar data. Improving model accuracy is still a challenge for all variety of data. Training and testing a classifier on entire data is not possible for large scale of data. Sampling of the data is necessary for any modeling and is an important aspect in data mining. All models train and test on different samples taken by traditional techniques like random forest ensemble method. In this paper, we propose cluster sampling which is superior to any other sampling methods in improving classifier accuracy. Sampling the data from usual methods cannot cover all variety of data from the original. Cluster sampling is a two-step approach. First it clusters the entire data, second it selects samples from each cluster. These samples consists all verity of data with equal proportion. Cluster sampling leverages the tree based ensemble to handle categorical, numerical and mixed type of data. Classifiers modeled on cluster sampling samples shown superior in accuracy than modeled on other sampling techniques.

Cite

CITATION STYLE

APA

Lakshmi Sreenivasa Reddy, D., & Rajini, M. (2019). Luster sampling to improve classifier accuracy for numeric data. International Journal of Recent Technology and Engineering, 8(2), 3685–3692. https://doi.org/10.35940/ijrte.B2848.078219

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free