LZ1904 at SemEval-2021 Task 5: Bi-LSTM-CRF for Toxic Span Detection using Pretrained Word Embedding

3Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recurrent Neural Networks (RNN) have been widely used in various Natural Language Processing (NLP) tasks such as text classification, sequence tagging, and machine translation. Long Short Term Memory (LSTM), a special unit of RNN, has the advantage of memorizing past and even future information in a sentence (especially for bidirectional LSTM). In the shared task of detecting toxic spans in texts, we first apply pretrained word embedding (GloVe) to generate the word vectors after tokenization. Then we construct Bidirectional Long Short Term Memory-Conditional Random Field (Bi-LSTM-CRF) model by Baidu research to predict whether each word in the sentence is toxic or not. We tune hyperparameters of dropout rate, number of LSTM units, embedding size with 10 epochs and choose the epoch with best validation recall. Our model achieves an F1 score of 66.99% on test dataset.

Cite

CITATION STYLE

APA

Zou, L., & Li, W. (2021). LZ1904 at SemEval-2021 Task 5: Bi-LSTM-CRF for Toxic Span Detection using Pretrained Word Embedding. In SemEval 2021 - 15th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 1009–1014). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.semeval-1.138

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free