INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words

Silvio Amir; Ramon F. Astudillo; Wang Ling; Mário J. Silva; Isabel Trancoso

Conference ProceedingsOPEN ACCESS

INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words

SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (2016) 238-242

DOI: 10.18653/v1/s16-1036

4Citations

83Readers

Abstract

We present the INESC-ID system for the 2016 edition of SemEval Twitter Sentiment Analysis shared task (subtask 4-A). The system was based on the Non-Linear Sub-space Embedding (NLSE) model developed for last year's competition. This model trains a projection of pre-trained embeddings into a small subspace using the supervised data available. Despite its simplicity, the system attained performances comparable to the best systems of last edition with no need for feature engineering. One limitation of this model was the assumption that a pre-trained embedding was available for every word. In this paper, we investigated different strategies to overcome this limitation by exploiting character-level embeddings and learning representations for out-ofembedding vocabulary words. The resulting approach outperforms our previous model by a relatively small margin, while still attaining strong results and a consistent good performance across all the evaluation datasets..

Cite

CITATION STYLE

APA

Amir, S., Astudillo, R. F., Ling, W., Silva, M. J., & Trancoso, I. (2016). INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 238–242). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1036

INESC-ID at SemEval-2016 task 4-A: Reducing the problem of out-of-embedding words

Abstract

Cite

Register to see more suggestions