Integrating LDA with clustering technique for relevance feature selection

Abdullah Semran Alharbi; Yuefeng Li; Yue Xu

Conference Proceedings

Integrating LDA with clustering technique for relevance feature selection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10400 LNAI 274-286

DOI: 10.1007/978-3-319-63004-5_22

5Citations

5Readers

Get full text

Abstract

Selecting features from documents that describe user information needs is challenging due to the nature of text, where redundancy, synonymy, polysemy, noise and high dimensionality are common problems. The assumption that clustered documents describe only one topic can be too simple knowing that most long documents discuss multiple topics. LDA-based models show significant improvement over the cluster-based in information retrieval (IR). However, the integration of both techniques for feature selection (FS) is still limited. In this paper, we propose an innovative and effective cluster- and LDA-based model for relevance FS. The model also integrates a new extended random set theory to generalise the LDA local weights for document terms. It can assign a more discriminative weight to terms based on their appearance in LDA topics and the clustered documents. The experimental results, based on the RCV1 dataset and TREC topics for information filtering (IF), show that our model significantly outperforms eight state-of-the-art baseline models in five standard performance measures.

Author supplied keywords

Cite

CITATION STYLE

APA

Alharbi, A. S., Li, Y., & Xu, Y. (2017). Integrating LDA with clustering technique for relevance feature selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10400 LNAI, pp. 274–286). Springer Verlag. https://doi.org/10.1007/978-3-319-63004-5_22

Integrating LDA with clustering technique for relevance feature selection

Abstract

Author supplied keywords

Cite

Register to see more suggestions