Constructing and analyzing domain-specific language model for financial text mining

11Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The application of natural language processing (NLP) to financial fields is advancing with an increase in the number of available financial documents. Transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) have been successful in NLP in recent years. These cutting-edge models have been adapted to the financial domain by applying financial corpora to existing pre-trained models and by pre-training with the financial corpora from scratch. In Japanese, by contrast, financial terminology cannot be applied from a general vocabulary without further processing. In this study, we construct language models suitable for the financial domain. Furthermore, we compare methods for adapting language models to the financial domain, such as pre-training methods and vocabulary adaptation. We confirm that the adaptation of a pre-training corpus and tokenizer vocabulary based on a corpus of financial text is effective in several downstream financial tasks. No significant difference is observed between pre-training with the financial corpus and continuous pre-training from the general language model with the financial corpus. We have released our source code and pre-trained models.

Cite

CITATION STYLE

APA

Suzuki, M., Sakaji, H., Hirano, M., & Izumi, K. (2023). Constructing and analyzing domain-specific language model for financial text mining. Information Processing and Management, 60(2). https://doi.org/10.1016/j.ipm.2022.103194

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free