Analysis of EU languages through text compression

Kimmo Kettunen; Markus Sadeniemi; Tiina Lindh-Knuutila; Timo Honkela

Conference Proceedings

Analysis of EU languages through text compression

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4139 LNAI 99-109

DOI: 10.1007/11816508_12

14Citations

11Readers

Get full text

Abstract

In this article, we are studying the differences between the European languages using statistical and unsupervised methods. The analysis is conducted in different levels of language, lexical, morphological and syntactic. Our premise is that the difficulty of the translation could be perceived as differences or similarities in different levels of language. The results are compared to linguistic groupings. The analyses of this paper are based on the concept of Kolmogorov complexity, which is used to compare the language structure in syntactic and morphological levels. The way the languages convey information in these levels is taken as a measure of similarity or dissimilarity between languages and the results are compared to classical linguistic classification. The results will serve as a tool in developing machine translation system(s), e.g., in the following way: if source language conveys more information in the morphological level and the target language more in the syntactic level, it is clear that the (machine) translator must be able to transfer the information from one level to another. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Kettunen, K., Sadeniemi, M., Lindh-Knuutila, T., & Honkela, T. (2006). Analysis of EU languages through text compression. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4139 LNAI, pp. 99–109). Springer Verlag. https://doi.org/10.1007/11816508_12

Analysis of EU languages through text compression

Abstract

Cite

Register to see more suggestions