Missing categorical data imputation and individual observation level imputation

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Traditional missing data techniques of imputation schemes focus on prediction of the missing value based on other observed values. In the case of continuous missing data the imputation of missing values often focuses on regression models. In the case of categorical data, usual techniques are then focused on classifi cation techniques which sets the missing value to the 'most likely' category. This however leads to overrepresentation of the categories which are in general observed more often and hence can lead to biased results in many tasks especially in the case of presence of dominant categories. We present original methodology of imputation of missing values which results in the most likely structure (distribution) of the missing data conditional on the observed values. The methodology is based on the assumption that the categorical variable containing the missing values has multinomial distribution. Values of the parameters of this distribution are than estimated using the multinomial logistic regression. Illustrative example of missing value and its reconstruction of the highest education level of persons in some population is described.

Cite

CITATION STYLE

APA

Zimmermann, P., Mazouch, P., & Tesárková, K. H. (2014). Missing categorical data imputation and individual observation level imputation. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 62(6), 1527–1534. https://doi.org/10.11118/actaun201462061527

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free