Abstract
The paper presents an empirical study of integrating ngrams and multi-word terms into topic models, while maintaining similarities between them and words based on their component structure. First, we adapt the PLSA-SIM algorithm to the more widespread LDA model and ngrams. Then we propose a novel algorithm LDA-ITER that allows the incorporation of the most suitable ngrams into topic models. The experiments of integrating ngrams and multi-word terms conducted on five text collections in different languages and domains demonstrate a significant improvement in all the metrics under consideration.
Cite
CITATION STYLE
Nokel, M., & Loukachevitch, N. (2016). Accounting ngrams and multi-word terms can improve topic models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 44–49). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-1806
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.