On classification with incomplete data

  • Williams D
  • Liao X
  • Xue Y
 et al. 
  • 65


    Mendeley users who have this article in their library.
  • 67


    Citations of this article.


We address the incomplete-data problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both Expectation-Maximization (EM) and Variational Bayesian EM (VB-EM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graph-based regularization. The semisupervised algorithm utilizes all available data-both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shown.

Author-supplied keywords

  • Classification
  • Imperfect labeling
  • Incomplete data
  • Missing data
  • Semisupervised learning
  • Supervised learning

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Get full text


  • David Williams

  • Xuejun Liao

  • Ya Xue

  • Lawrence Carin

  • Balaji Krishnapuram

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free