Sentiment classification using document embeddings trained with cosine similarity

Tan Thongtan; Tanasanee Phienthrakul

Conference ProceedingsOPEN ACCESS

Sentiment classification using document embeddings trained with cosine similarity

ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (2019) 407-414

DOI: 10.18653/v1/p19-2057

100Citations

221Readers

Abstract

In document-level sentiment classification, each document must be mapped to a fixed length vector. Document embedding models map each document to a dense, lowdimensional vector in continuous vector space. This paper proposes training document embeddings using cosine similarity instead of dot product. Experiments on the IMDB dataset show that accuracy is improved when using cosine similarity compared to using dot product, while using feature combination with Naïve Bayes weighted bag of n-grams achieves a new state of the art accuracy of 97.42%. Code to reproduce all experiments is available at https://github.com/tanthongtan/dv-cosine.

Cite

CITATION STYLE

APA

Thongtan, T., & Phienthrakul, T. (2019). Sentiment classification using document embeddings trained with cosine similarity. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 407–414). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-2057

Sentiment classification using document embeddings trained with cosine similarity

Abstract

Cite

Register to see more suggestions