Pre-training and evaluation of numeracy-oriented language model

Fuli Feng; Xilin Rui; Wenjie Wang; Yixin Cao; Tat Seng Chua

Conference ProceedingsOPEN ACCESS

Pre-training and evaluation of numeracy-oriented language model

ICAIF 2021 - 2nd ACM International Conference on AI in Finance (2021)

DOI: 10.1145/3490354.3494412

0Citations

13Readers

Get full text

Abstract

Pre-trained language model (LM) has led to significant performance gains in various natural language processing (NLP) applications due to its strong literacy, e.g., the ability to capture word dependencies. However, the existing pre-trained LMs largely ignore numeracy, i.e., treating numbers within text as plain words and without understanding the basic numerical concepts. The weak numeracy has become a barrier to the use of pre-trained LMs in NLP applications over financial documents such as annual filings and analyst reports that are number intensive. However, the understanding and analysis of financial documents are becoming gradationally important. To bridge this gap, this work explores the central theme of numerical pre-training to empower LM with numeracy. In particular, we propose two numerical pre-training methods with objectives that encourage the LM to understand the magnitude and value of numbers and encode the dependency between a number and its context. By applying the proposed methods on BERT, we pre-train two LMs, named BERT-M and BERT-V. Moreover, we construct four datasets of financial documents for evaluating the numeracy of pre-trained LM, which focus on three fundamental perspectives of numeracy: a) number embedding; b) number-text composition; and c) number-number composition. Extensive experiments on the datasets validate the effectiveness of the pre-trained BERT-M and BERT-V, which outperform the state-of-the-art LM for financial documents (FinBERT) by 4.83% and 4.34% on average. Furthermore, their aggregation named BERT-MV increases the gain to 10.88%.

Author supplied keywords

Cite

CITATION STYLE

APA

Feng, F., Rui, X., Wang, W., Cao, Y., & Chua, T. S. (2021). Pre-training and evaluation of numeracy-oriented language model. In ICAIF 2021 - 2nd ACM International Conference on AI in Finance. Association for Computing Machinery, Inc. https://doi.org/10.1145/3490354.3494412

Pre-training and evaluation of numeracy-oriented language model

Abstract

Author supplied keywords

Cite

Register to see more suggestions