An efficient method for generating, storing and matching features for text mining

Shing Kit Chan; Wai Lam

Conference Proceedings

An efficient method for generating, storing and matching features for text mining

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5476 LNAI 86-97

DOI: 10.1007/978-3-642-01307-2_11

0Citations

5Readers

Get full text

Abstract

Log-linear models have been widely used in text mining tasks because it can incorporate a large number of possibly correlated features. In text mining, these possibly correlated features are generated by conjunction of features. They are usually used with log-linear models to estimate robust conditional distributions. To avoid manual construction of conjunction of features, we propose a new algorithmic framework called F-tree for automatically generating and storing conjunctions of features in text mining tasks. This compact graph-based data structure allows fast one-vs-all matching of features in the feature space which is crucial for many text mining tasks. Based on this hierarchical data structure, we propose a systematic method for removing redundant features to further reduce memory usage and improve performance. We do large-scale experiments on three publicly-available datasets and show that this automatic method can get state-of-the-art performance achieved by manual construction of features. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Chan, S. K., & Lam, W. (2009). An efficient method for generating, storing and matching features for text mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5476 LNAI, pp. 86–97). https://doi.org/10.1007/978-3-642-01307-2_11

An efficient method for generating, storing and matching features for text mining

Abstract

Cite

Register to see more suggestions