AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees

3Citations
Citations of this article
34Readers
Mendeley users who have this article in their library.

Abstract

Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to solve programming language related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three different downstream tasks.

Cite

CITATION STYLE

APA

Liang, R., Zhang, T., Lu, Y., Liu, Y., Huang, Z., & Chen, X. (2022). AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees. In FinNLP 2022 - 4th Workshop on Financial Technology and Natural Language Processing, Proceedings of the Workshop (pp. 10–17). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.finnlp-1.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free