Abstract
In document-level sentiment classification, each document must be mapped to a fixed length vector. Document embedding models map each document to a dense, lowdimensional vector in continuous vector space. This paper proposes training document embeddings using cosine similarity instead of dot product. Experiments on the IMDB dataset show that accuracy is improved when using cosine similarity compared to using dot product, while using feature combination with Naïve Bayes weighted bag of n-grams achieves a new state of the art accuracy of 97.42%. Code to reproduce all experiments is available at https://github.com/tanthongtan/dv-cosine.
Cite
CITATION STYLE
Thongtan, T., & Phienthrakul, T. (2019). Sentiment classification using document embeddings trained with cosine similarity. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 407–414). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-2057
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.