Classification by Clustering (CbC): An Approach of Classifying Big Data Based on Similarities

  • Khan S
  • Ahamed S
  • Jannat M
  • et al.
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data classification in supervised learning is the process of classifying data for data mining task that helps to analyse data for decision-making. The objective of a classification model is to correctly predict the categorical class labels of known/unknown instances. In machine learning for data mining applications, the classification models are trained based on labelled training datasets. In this paper, we have investigated if we can build a classification model based on the similarities of the instances instead of class labels of instances. Data labelling is always very costly and time-consuming process, and it becomes a very difficult task if the data is big data. The proposed approach clusters the big data and builds the classifier based on the clusters without considering the class labels, which basically improve the performance of the classifier. However, we can relate to the clusters with class labels. We have collected 10 big data from the UC Irvine machine learning repository for experimental analysis and applied three popular decision tree induction algorithms: ID3 (Iterative Dichotomiser 3), C4.5 (extension of ID3 algorithm), and CART (Classification and Regression Tree) for classifier construction.

Cite

CITATION STYLE

APA

Khan, S. S., Ahamed, S., Jannat, M., Shatabda, S., & Farid, D. Md. (2020). Classification by Clustering (CbC): An Approach of Classifying Big Data Based on Similarities (pp. 593–605). https://doi.org/10.1007/978-981-13-7564-4_50

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free