Text mining using non-negative matrix factorizations

V. Paul Pauca; Farial Shahnaz; Michael W. Berry; Robert J. Plemmons

Conference Proceedings

Text mining using non-negative matrix factorizations

SIAM Proceedings Series (2004) 452-456

DOI: 10.1137/1.9781611972740.45

232Citations

85Readers

Get full text

Abstract

This study involves a methodology for the automatic identification of semantic features and document clusters in a heterogeneous text collection. The methodology is based upon encoding the data using low rank nonnegative matrix factorization algorithms to preserve natural data non-negativity and thus avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Some existing non-negative matrix factorization techniques are reviewed and some new ones are proposed. Numerical experiments are reported on the use of a hybrid NMF algorithm to produce a parts-based approximation of a sparse term-by-document matrix. The resulting basis vectors and matrix projection can be used to identify underlying semantic features (topics) and document clusters of the corresponding text collection.

Author supplied keywords

Cite

CITATION STYLE

APA

Pauca, V. P., Shahnaz, F., Berry, M. W., & Plemmons, R. J. (2004). Text mining using non-negative matrix factorizations. In SIAM Proceedings Series (pp. 452–456). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611972740.45

Text mining using non-negative matrix factorizations

Abstract

Author supplied keywords

Cite

Register to see more suggestions