PhoBERT: Pre-trained language models for Vietnamese

180Citations
Citations of this article
181Readers
Mendeley users who have this article in their library.

Abstract

We present PhoBERT with two versions—PhoBERTbase and PhoBERTlarge—the first public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual model XLM-R (Conneau et al., 2020) and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at: https://github.com/VinAIResearch/PhoBERT.

Cite

CITATION STYLE

APA

Nguyen, D. Q., & Nguyen, A. T. (2020). PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 1037–1042). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.92

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free