Finding uninformative features in binary data

Xin Wang; Ata Kabán

Conference Proceedings

Finding uninformative features in binary data

Lecture Notes in Computer Science (2005) 3578 40-47

DOI: 10.1007/11508069_6

10Citations

9Readers

Get full text

Abstract

For statistical modelling of multivariate binary data, such as text documents, datum instances are typically represented as vectors over a global vocabulary of attributes. Apart from the issue of high dimensionality, this also faces us with the problem of uneven importance of various attribute presences/absences. This problem has been largely overlooked in the literature, however it may create difficulties in obtaining reliable estimates of unsupervised probabilistic representation models. In turn, the problem of automated feature selection and feature weighting in the context of unsupervised learning is challenging, because there is no known target to guide the search. In this paper we propose and study a relatively simple cluster-based generative model for multivariate binary data, equipped with automated feature weighting capability. Empirical results on both synthetic and real data sets are given and discussed. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Wang, X., & Kabán, A. (2005). Finding uninformative features in binary data. In Lecture Notes in Computer Science (Vol. 3578, pp. 40–47). Springer Verlag. https://doi.org/10.1007/11508069_6

Finding uninformative features in binary data

Abstract

Cite

Register to see more suggestions