HyperT5: Towards Compute-Efficient Korean Language Modeling

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Pretraining and fine-tuning language models have become the standard practice in industrial natural language processing (NLP), but developing and deploying general-purpose language models without the abundant computation or data resources is a real-world issue faced by smaller organizations or communities whose main focus is languages with less accessible resources (e.g., non-English). This paper explores the sequence-to-sequence (seq2seq) language model architecture as a more practical and compute-efficient alternative to the decoder-oriented approach (e.g., GPT-3), accompanied by novel findings in compute-optimality analyses. We successfully trained billion-scale Korean-language seq2seq language models that strongly outperform other competitive models in Korean benchmarks. Moreover, we demonstrate that such language models can be more efficiently utilized by employing a heavy pre-finetuning strategy, by showcasing a case study on dialog-task adaptation. Our case study shows that adopting language models with more readily available domain-specific unlabeled data greatiy improves fine-tuning data efficiency in low-resource settings.

Cite

CITATION STYLE

APA

Park, D., Ka, S., Yoo, K. M., Lee, G., & Kang, J. (2023). HyperT5: Towards Compute-Efficient Korean Language Modeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 5, pp. 412–424). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-industry.40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free