Automatic identification of legal terms in Czech law texts

Karel Pala; Pavel Rychlý; Pavel Šmerk

Conference Proceedings

Automatic identification of legal terms in Czech law texts

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6036 LNAI 83-94

DOI: 10.1007/978-3-642-12837-0_5

8Citations

14Readers

Get full text

Abstract

Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Pala, K., Rychlý, P., & Šmerk, P. (2010). Automatic identification of legal terms in Czech law texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6036 LNAI, pp. 83–94). https://doi.org/10.1007/978-3-642-12837-0_5

Automatic identification of legal terms in Czech law texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions