An efficient method for generating, storing and matching features for text mining

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Log-linear models have been widely used in text mining tasks because it can incorporate a large number of possibly correlated features. In text mining, these possibly correlated features are generated by conjunction of features. They are usually used with log-linear models to estimate robust conditional distributions. To avoid manual construction of conjunction of features, we propose a new algorithmic framework called F-tree for automatically generating and storing conjunctions of features in text mining tasks. This compact graph-based data structure allows fast one-vs-all matching of features in the feature space which is crucial for many text mining tasks. Based on this hierarchical data structure, we propose a systematic method for removing redundant features to further reduce memory usage and improve performance. We do large-scale experiments on three publicly-available datasets and show that this automatic method can get state-of-the-art performance achieved by manual construction of features. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Chan, S. K., & Lam, W. (2009). An efficient method for generating, storing and matching features for text mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5476 LNAI, pp. 86–97). https://doi.org/10.1007/978-3-642-01307-2_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free