SuryaKiran at PragTag 2023 - Benchmarking Domain Adaptation using Masked Language Modeling in Natural Language Processing For Specialized Data

Kunal Suri; Prakhar Mishra; Albert Nanda

Conference Proceedings

SuryaKiran at PragTag 2023 - Benchmarking Domain Adaptation using Masked Language Modeling in Natural Language Processing For Specialized Data

EMNLP 2023 - 10th Workshop on Argument Mining, ArgMining 2023 - Proceedings (2023) 218-222

DOI: 10.18653/v1/2023.argmining-1.26

1Citations

12Readers

Get full text

Abstract

Most transformer models are trained on English language corpus that contain text from forums like Wikipedia and Reddit. While these models are being used in many specialized domains such as scientific peer review, legal, and healthcare, their performance is subpar because they do not contain the information present in data relevant to such specialized domains. To help these models perform as well as possible on specialized domains, one of the approaches is to collect labeled data of that particular domain and fine-tune the transformer model of choice on such data. While a good approach, it suffers from the challenge of collecting a lot of labeled data which requires significant manual effort. Another way is to use unlabeled domain-specific data to pre-train these transformer model and then fine-tune this model on labeled data. We evaluate how transformer models perform when fine-tuned on labeled data after initial pre-training with unlabeled data. We compare their performance with a transformer model fine-tuned on labeled data without initial pre-training with unlabeled data. We perform this comparison on a dataset of Scientific Peer Reviews provided by organizers of PragTag-2023 Shared Task1 and observe that a transformer model fine-tuned on labeled data after initial pre-training on unlabeled data using Masked Language Modelling outperforms a transformer model fine-tuned only on labeled data without initial pre-training with unlabeled data using Masked Language Modelling.

Cite

CITATION STYLE

APA

Suri, K., Mishra, P., & Nanda, A. (2023). SuryaKiran at PragTag 2023 - Benchmarking Domain Adaptation using Masked Language Modeling in Natural Language Processing For Specialized Data. In EMNLP 2023 - 10th Workshop on Argument Mining, ArgMining 2023 - Proceedings (pp. 218–222). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.argmining-1.26

SuryaKiran at PragTag 2023 - Benchmarking Domain Adaptation using Masked Language Modeling in Natural Language Processing For Specialized Data

Abstract

Cite

Register to see more suggestions