Distance weighted cosine similarity measure for text classification

176Citations
Citations of this article
208Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In Vector Space Model, Cosine is widely used to measure the similarity between two vectors. Its calculation is very efficient, especially for sparse vectors, as only the non-zero dimensions need to be considered. As a fundamental component, cosine similarity has been applied in solving different text mining problems, such as text classification, text summarization, information retrieval, question answering, and so on. Although it is popular, the cosine similarity does have some problems. Starting with a few synthetic samples, we demonstrate some problems of cosine similarity: it is overly biased by features of higher values and does not care much about how many features two vectors share. A distance weighted cosine similarity metric is thus proposed. Extensive experiments on text classification exhibit the effectiveness of the proposed metric. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Li, B., & Han, L. (2013). Distance weighted cosine similarity measure for text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8206 LNCS, pp. 611–618). https://doi.org/10.1007/978-3-642-41278-3_74

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free