HANDLING CATEGORICAL FEATURES WITH MANY LEVELS USING A PRODUCT PARTITION MODEL

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A common difficulty in data analysis is how to handle categorical pre-dictors with a large number of levels or categories. Few proposals have been developed to tackle this important and frequent problem. We introduce a generative model that simultaneously carries out the model fitting and the aggre-gation of the categorical levels into larger groups. We represent the categorical predictor by a graph where the nodes are the categories and establish a probability distribution over meaningful partitions of this graph. Condition-ally on the observed data, we obtain a posterior distribution for the levels ag-gregation, allowing the inference about the most probable clustering for the categories. Simultaneously, we extract inference about all the other regression model parameters. We compare our and state-of-art methods showing that it has equally good predictive performance and more interpretable results. Our approach balances out accuracy vs. interpretability, a current important con-cern in statistics and machine learning.

Cite

CITATION STYLE

APA

Criscuolo, T. L., Assunção, R. M., Loschi, R. H., Meira, W., & Cruz-Reyes, D. (2023). HANDLING CATEGORICAL FEATURES WITH MANY LEVELS USING A PRODUCT PARTITION MODEL. Annals of Applied Statistics, 17(1), 786–814. https://doi.org/10.1214/22-AOAS1651

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free