“Why do I feel offended?’ Korean Dataset for Offensive Language Identification

3Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Offensive content is an unavoidable issue on social media. Most existing offensive language identification methods rely on the compilation of labeled datasets. However, existing methods rarely consider low-resource languages that have relatively less data available for training (e.g., Korean). To address these issues, we construct a novel KOrean Dataset for Offensive Language Identification (KODOLI). KODOLI comprises more fine-grained offensiveness categories (i.e., not offensive, likely offensive, and offensive) than existing ones. A likely offensive language refers to texts with implicit offensiveness or abusive language without offensive intentions. In addition, we propose two auxiliary tasks to help identify offensive languages: abusive language detection and sentiment analysis. We provide experimental results for baselines on KODOLI and observe that pre-trained language models suffer from identifying "LIKELY" offensive statements. Quantitative results and qualitative analysis demonstrate that jointly learning offensive language, abusive language and sentiment information improves the performance of offensive language identification.

Cite

CITATION STYLE

APA

Park, S. H., Kim, K. M., Lee, O. J., Kang, Y., Lee, J., Lee, S. M., & Lee, S. K. (2023). “Why do I feel offended?’ Korean Dataset for Offensive Language Identification. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023 (pp. 1112–1123). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-eacl.85

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free