Data Mining: The Textbook

  • Aggarwal C
N/ACitations
Citations of this article
1.5kReaders
Mendeley users who have this article in their library.
Get full text

Abstract

The classification problem is closely related to the clustering problem discussed in Chaps. 6 and 7. While the clustering problem is that of determining similar groups of data points, the classification problem is that of learning the structure of a data set of examples, already partitioned into groups, that are referred to as categories or classes. The learning of these categories is typically achieved with a model. This model is used to estimate the group identifiers (or class labels) of one or more previously unseen data examples with unknown labels. Therefore, one of the inputs to the classification problem is an example data set that has already been partitioned into different classes. This is referred to as the training data, and the group identifiers of these classes are referred to as class labels. In most cases, the class labels have a clear semantic interpretation in the context of a specific application, such as a group of customers interested in a specific product, or a group of data objects with a desired property of interest. The model learned is referred to as the training model. The previously unseen data points that need to be classified are collectively referred to as the test data set. The algorithm that creates the training model for prediction is also sometimes referred to as the learner.

Cite

CITATION STYLE

APA

Aggarwal, C. C. (2015). Data Mining: The Textbook. Springer International Publishing (pp. 285–344). https://doi.org/10.1007/978-3-319-14142-8_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free