Do important words in bag-of-words model of text relatedness help?

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We address the question of how Bag-of-Words (BoW) models of text relatedness can be improved by using important words in the text-pair instead of all the words. To find important words in a text, we use a new approach based on word relatedness. We use two text relatedness methods: Latent Semantic Analysis (LSA) and Google Trigram Model (GTM) on five different datasets where words in the text-pair are sorted based on importance. We compare the use of a small number of important words against the use of all the words in the texts, and we find that both LSA and GTM achieve better results on four of the data sets and the same result on the fifth dataset.

Cite

CITATION STYLE

APA

Islam, A., Milios, E., & Kešelj, V. (2015). Do important words in bag-of-words model of text relatedness help? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9302, pp. 569–577). Springer Verlag. https://doi.org/10.1007/978-3-319-24033-6_64

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free