Text classification using graph mining-based feature extraction

Chuntao Jiang; Frans Coenen; Robert Sanderson; Michele Zito

Conference Proceedings

Text classification using graph mining-based feature extraction

Research and Development in Intelligent Systems XXVI: Incorporating Applications and Innovations in Intelligent Systems XVII (2010) 21-34

DOI: 10.1007/978-1-84882-983-1_2

16Citations

48Readers

Get full text

Abstract

A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent features is essential. © 2010 Springer-Verlag London.

Cite

CITATION STYLE

APA

Jiang, C., Coenen, F., Sanderson, R., & Zito, M. (2010). Text classification using graph mining-based feature extraction. In Research and Development in Intelligent Systems XXVI: Incorporating Applications and Innovations in Intelligent Systems XVII (pp. 21–34). Springer London. https://doi.org/10.1007/978-1-84882-983-1_2

Text classification using graph mining-based feature extraction

Abstract

Cite

Register to see more suggestions