INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words

4Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

Abstract

We present the INESC-ID system for the 2016 edition of SemEval Twitter Sentiment Analysis shared task (subtask 4-A). The system was based on the Non-Linear Sub-space Embedding (NLSE) model developed for last year's competition. This model trains a projection of pre-trained embeddings into a small subspace using the supervised data available. Despite its simplicity, the system attained performances comparable to the best systems of last edition with no need for feature engineering. One limitation of this model was the assumption that a pre-trained embedding was available for every word. In this paper, we investigated different strategies to overcome this limitation by exploiting character-level embeddings and learning representations for out-ofembedding vocabulary words. The resulting approach outperforms our previous model by a relatively small margin, while still attaining strong results and a consistent good performance across all the evaluation datasets..

Cite

CITATION STYLE

APA

Amir, S., Astudillo, R. F., Ling, W., Silva, M. J., & Trancoso, I. (2016). INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 238–242). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1036

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free