The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents

D. Gunawan; C. A. Sembiring; M. A. Budiman

Conference ProceedingsOPEN ACCESS

The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents

Journal of Physics: Conference Series (2018) 978(1)

DOI: 10.1088/1742-6596/978/1/012120

124Citations

221Readers

Abstract

Rapidly increasing number of web pages or documents leads to topic specific filtering in order to find web pages or documents efficiently. This is a preliminary research that uses cosine similarity to implement text relevance in order to find topic specific document. This research is divided into three parts. The first part is text-preprocessing. In this part, the punctuation in a document will be removed, then convert the document to lower case, implement stop word removal and then extracting the root word by using Porter Stemming algorithm. The second part is keywords weighting. Keyword weighting will be used by the next part, the text relevance calculation. Text relevance calculation will result the value between 0 and 1. The closer value to 1, then both documents are more related, vice versa.

Cite

CITATION STYLE

APA

Gunawan, D., Sembiring, C. A., & Budiman, M. A. (2018). The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents. In Journal of Physics: Conference Series (Vol. 978). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/978/1/012120

The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents

Abstract

Cite

Register to see more suggestions