The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Leonardo Ranaldi; Aria Nourbakhsh; Arianna Patrizi; Elena Sofia Ruzzetti; Dario Onorati; Michele Mastromattei; Francesca Fallucchi; Fabio Massimo Zanzotto

Conference ProceedingsOPEN ACCESS

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

International Conference Recent Advances in Natural Language Processing, RANLP (2023) 949-960

DOI: 10.26615/978-954-452-092-2_102

11Citations

21Readers

Abstract

Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pretrained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning. Only after what we call extreme domain adaptation, that is, retraining with the masked language model task on all the novel corpus, pre-trained Transformers reach their standard high results. This suggests that huge pre-training corpora may give Transformers unexpected help since they are exposed to many of the possible sentences.

Cite

CITATION STYLE

APA

Ranaldi, L., Nourbakhsh, A., Patrizi, A., Ruzzetti, E. S., Onorati, D., Mastromattei, M., … Zanzotto, F. M. (2023). The Dark Side of the Language: Pre-trained Transformers in the DarkNet. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 949–960). Incoma Ltd. https://doi.org/10.26615/978-954-452-092-2_102

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Abstract

Cite

Register to see more suggestions