ASRS-CMFS vs. RoBERTa: Comparing Two Pre-Trained Language Models to Predict Anomalies in Aviation Occurrence Reports with a Low Volume of In-Domain Data Available

13Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

We consider the problem of solving Natural Language Understanding (NLU) tasks characterized by domain-specific data. An effective approach consists of pre-training Transformer-based language models from scratch using domain-specific data before fine-tuning them on the task at hand. A low domain-specific data volume is problematic in this context, given that the performance of language models relies heavily on the abundance of data during pre-training. To study this problem, we create a benchmark replicating realistic field use of language models to classify aviation occurrences extracted from the Aviation Safety Reporting System (ASRS) corpus. We compare two language models on this new benchmark: ASRS-CMFS, a compact model inspired from RoBERTa, pre-trained from scratch using only little domain-specific data, and the regular RoBERTa model, with no domain-specific pre-training. The RoBERTa model benefits from its size advantage, while the ASRS-CMFS benefits from the pre-training from scratch strategy. We find no compelling statistical evidence that RoBERTa outperforms ASRS-CMFS, but we show that ASRS-CMFS is more compute-efficient than RoBERTa. We suggest that pre-training a compact model from scratch is a good strategy for solving domain-specific NLU tasks using Transformer-based language models in the context of domain-specific data scarcity.

Cite

CITATION STYLE

APA

Kierszbaum, S., Klein, T., & Lapasset, L. (2022). ASRS-CMFS vs. RoBERTa: Comparing Two Pre-Trained Language Models to Predict Anomalies in Aviation Occurrence Reports with a Low Volume of In-Domain Data Available. Aerospace, 9(10). https://doi.org/10.3390/aerospace9100591

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free