A grouping method for categorical attributes having very large number of values

13Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In case of very large number of values, the risk of overfilling ihe data increases sharply and building good groupings becomes difficult In ihis paper, we propose two new grouping methods founded on a Bayesian approach, leading lo Bayes optimal groupings. The first method exploits a standard schema for grouping models and the second one extends this schema by managing a "garbage" group dedicated to the least frequent values. Extensive comparative experiments demonstrate that the new grouping methods build high quality groupings in terms of predictive quality, robustness and small number of groups. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Boullé, M. (2005). A grouping method for categorical attributes having very large number of values. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3587 LNAI, pp. 228–242). Springer Verlag. https://doi.org/10.1007/11510888_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free