Japanese mistakable legal term correction using infrequency-aware bert classifier

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

We propose a method to assist legislative drafters that locates inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on BERT (Bidirectional Encoder Representations from Transformers). The BERT classifier is pretrained with a huge number of whole sentences; thus, it contains abundant linguistic knowledge. Classifiers for predicting legal terms suffer from two-level infrequency: term-level infrequency and set-level infrequency. The former causes a class imbalance problem and the latter causes an underfitting problem; both degrade classification performance. To overcome these problems, we apply three techniques, namely, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. The preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, the repetitive soft undersampling overcomes term-level infrequency, and the classifier unification overcomes set-level infrequency while saving storage consumption. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or language models, and that all three training techniques improve performance.

Cite

CITATION STYLE

APA

Yamakoshi, T., Komamizu, T., Ogawa, Y., & Toyama, K. (2020). Japanese mistakable legal term correction using infrequency-aware bert classifier. Transactions of the Japanese Society for Artificial Intelligence, 35(4), 1–17. https://doi.org/10.1527/tjsai.E-K25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free