Abstract
State-of-the-art transformer models have achieved robust performance on a variety of NLP tasks. Many of these approaches have employed domain agnostic pre-training tasks to train models that yield highly generalized sentence representations that can be fine-tuned for specific downstream tasks. We propose refining a pre-trained NLP model using the objective of detecting shuffled tokens. We use a sequential approach by starting with the pre-trained RoBERTa model and training it using our approach. Applying random shuffling strategy on the word-level, we found that our approach enables the RoBERTa model achieve better performance on 4 out of 7 GLUE tasks. Our results indicate that learning to detect shuffled tokens is a promising approach to learn more coherent sentence representations.1
Cite
CITATION STYLE
Panda, S., Agrawal, A., Ha, J., & Bloch, B. (2021). Shuffled-token Detection for Refining Pre-trained RoBERTa. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 88–93). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-srw.12
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.