Bootstrapping-Based extraction of dictionary terms from unsegmented legal text

Masato Hagiwara; Yasuhiro Ogawa; Katsuhiko Toyama

Conference Proceedings

Bootstrapping-Based extraction of dictionary terms from unsegmented legal text

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5447 LNAI 213-227

DOI: 10.1007/978-3-642-00609-8_19

1Citations

4Readers

Get full text

Abstract

Recent demands for translating Japanese statutes into foreign languages necessitate the compilation of standard bilingual dictionaries. To support this costly task, we propose a bootstrapping-basedlexical knowledge extraction algorithm Monaka, to automatically extract dictionary term candidates from unsegmented Japanese legal text. The algorithm is based on the Tchai algorithm and extracts reliable patterns and instances in an iterative manner, but instead uses character n-grams as contextual patterns, and introduces a special constraint to ensure proper segmentation of the extracted terms. The experimental results show that this algorithm can extract correctly segmented and important dictionary terms with higher accuracy compared to conventional methods. © Springer-Verlag Berlin Heidelberg 2009.

Author supplied keywords

Cite

CITATION STYLE

APA

Hagiwara, M., Ogawa, Y., & Toyama, K. (2009). Bootstrapping-Based extraction of dictionary terms from unsegmented legal text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5447 LNAI, pp. 213–227). https://doi.org/10.1007/978-3-642-00609-8_19

Bootstrapping-Based extraction of dictionary terms from unsegmented legal text

Abstract

Author supplied keywords

Cite

Register to see more suggestions