We address the question of how Bag-of-Words (BoW) models of text relatedness can be improved by using important words in the text-pair instead of all the words. To find important words in a text, we use a new approach based on word relatedness. We use two text relatedness methods: Latent Semantic Analysis (LSA) and Google Trigram Model (GTM) on five different datasets where words in the text-pair are sorted based on importance. We compare the use of a small number of important words against the use of all the words in the texts, and we find that both LSA and GTM achieve better results on four of the data sets and the same result on the fifth dataset.
CITATION STYLE
Islam, A., Milios, E., & Kešelj, V. (2015). Do important words in bag-of-words model of text relatedness help? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9302, pp. 569–577). Springer Verlag. https://doi.org/10.1007/978-3-319-24033-6_64
Mendeley helps you to discover research relevant for your work.