On the Importance of Word Embedding in Automated Harmful Information Detection

Salar Mohtaj; Sebastian Möller

Conference Proceedings

On the Importance of Word Embedding in Automated Harmful Information Detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13502 LNAI 251-262

DOI: 10.1007/978-3-031-16270-1_21

2Citations

1Readers

Get full text

Abstract

Social media have been growing rapidly during past years. They changed different aspects of human life, especially how people communicate and also how people access information. However, along with the important benefits, social media causes a number of significant challenges since they were introduced. Spreading of fake news and hate speech are among the most challenging issues which have attracted a lot of attention by researchers in past years. Different models based on natural language processing are developed to combat these phenomena and stop them in the early stages before mass spreading. Considering the difficulty of the task of automated harmful information detection (i.e., fake news and hate speech detection), every single step of the detection process could have a sensible impact on the performance of models. In this paper, we study the importance of word embedding on the overall performance of deep neural network architecture on the detection of fake news and hate speech on social media. We test various approaches for converting raw input text into vectors, from random weighting to state-of-the-art contextual word embedding models. In addition, to compare different word embedding approaches, we also analyze different strategies to get the vectors from contextual word embedding models (i.e., get the weights from the last layer, against averaging weights of the last layers). Our results show that XLNet embedding outperforms the other embedding approaches on both tasks related to harmful information identification.

Author supplied keywords

Cite

CITATION STYLE

APA

Mohtaj, S., & Möller, S. (2022). On the Importance of Word Embedding in Automated Harmful Information Detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13502 LNAI, pp. 251–262). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16270-1_21

On the Importance of Word Embedding in Automated Harmful Information Detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions