A grouping method for categorical attributes having very large number of values

Marc Boullé

Conference Proceedings

A grouping method for categorical attributes having very large number of values

Boullé M

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3587 LNAI 228-242

DOI: 10.1007/11510888_23

13Citations

7Readers

Get full text

Abstract

In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In case of very large number of values, the risk of overfilling ihe data increases sharply and building good groupings becomes difficult In ihis paper, we propose two new grouping methods founded on a Bayesian approach, leading lo Bayes optimal groupings. The first method exploits a standard schema for grouping models and the second one extends this schema by managing a "garbage" group dedicated to the least frequent values. Extensive comparative experiments demonstrate that the new grouping methods build high quality groupings in terms of predictive quality, robustness and small number of groups. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Boullé, M. (2005). A grouping method for categorical attributes having very large number of values. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3587 LNAI, pp. 228–242). Springer Verlag. https://doi.org/10.1007/11510888_23

A grouping method for categorical attributes having very large number of values

Abstract

Cite

Register to see more suggestions