Distance weighted cosine similarity measure for text classification

Baoli Li; Liping Han

Conference Proceedings

Distance weighted cosine similarity measure for text classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8206 LNCS 611-618

DOI: 10.1007/978-3-642-41278-3_74

176Citations

208Readers

Get full text

Abstract

In Vector Space Model, Cosine is widely used to measure the similarity between two vectors. Its calculation is very efficient, especially for sparse vectors, as only the non-zero dimensions need to be considered. As a fundamental component, cosine similarity has been applied in solving different text mining problems, such as text classification, text summarization, information retrieval, question answering, and so on. Although it is popular, the cosine similarity does have some problems. Starting with a few synthetic samples, we demonstrate some problems of cosine similarity: it is overly biased by features of higher values and does not care much about how many features two vectors share. A distance weighted cosine similarity metric is thus proposed. Extensive experiments on text classification exhibit the effectiveness of the proposed metric. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Li, B., & Han, L. (2013). Distance weighted cosine similarity measure for text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8206 LNCS, pp. 611–618). https://doi.org/10.1007/978-3-642-41278-3_74

Distance weighted cosine similarity measure for text classification

Abstract

Cite

Register to see more suggestions