Tw-StAR at SemEval-2019 task 5: N-gram embeddings for hate speech detection in multilingual tweets

Hala Mulki; Chedi Bechikh Ali; Hatem Haddad; Ismail Babaoğlu

Conference Proceedings

Tw-StAR at SemEval-2019 task 5: N-gram embeddings for hate speech detection in multilingual tweets

NAACL HLT 2019 - International Workshop on Semantic Evaluation, SemEval 2019, Proceedings of the 13th Workshop (2019) 503-507

DOI: 10.18653/v1/s19-2090

8Citations

82Readers

Get full text

Abstract

In this paper, we describe our contribution in SemEval-2019: subtask A of task 5 “Multilingual detection of hate speech against immigrants and women in Twitter (HatEval)”. We developed two hate speech detection model variants through Tw-StAR framework. While the first model adopted one-hot encoding n-grams to train an NB classifier, the second generated and learned n-gram embeddings within a feedforward neural network. For both models, specific terms, selected via MWT patterns, were tagged in the input data. With two feature types employed, we could investigate the ability of n-gram embeddings to rival one-hot n-grams. Our results showed that in English, n-gram embeddings outperformed one-hot n-grams. However, representing Spanish tweets by one-hot n-grams yielded a slightly better performance compared to that of n-gram embeddings. The official ranking indicated that Tw-StAR ranked 9th for English and 20th for Spanish.

Cite

CITATION STYLE

APA

Mulki, H., Ali, C. B., Haddad, H., & Babaoğlu, I. (2019). Tw-StAR at SemEval-2019 task 5: N-gram embeddings for hate speech detection in multilingual tweets. In NAACL HLT 2019 - International Workshop on Semantic Evaluation, SemEval 2019, Proceedings of the 13th Workshop (pp. 503–507). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s19-2090

Tw-StAR at SemEval-2019 task 5: N-gram embeddings for hate speech detection in multilingual tweets

Abstract

Cite

Register to see more suggestions