We present the INESC-ID system for the 2016 edition of SemEval Twitter Sentiment Analysis shared task (subtask 4-A). The system was based on the Non-Linear Sub-space Embedding (NLSE) model developed for last year's competition. This model trains a projection of pre-trained embeddings into a small subspace using the supervised data available. Despite its simplicity, the system attained performances comparable to the best systems of last edition with no need for feature engineering. One limitation of this model was the assumption that a pre-trained embedding was available for every word. In this paper, we investigated different strategies to overcome this limitation by exploiting character-level embeddings and learning representations for out-ofembedding vocabulary words. The resulting approach outperforms our previous model by a relatively small margin, while still attaining strong results and a consistent good performance across all the evaluation datasets..
CITATION STYLE
Amir, S., Astudillo, R. F., Ling, W., Silva, M. J., & Trancoso, I. (2016). INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 238–242). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1036
Mendeley helps you to discover research relevant for your work.