The paper proposes a new method for characterization and distinction between closely related languages on the example of Serbian and Croatian languages. In the first step, the method transforms the text in different languages into the uniformly coded text. It is carried out in accordance to the position of each sign of the script in the text line and its height. Then, the coded text given as 1-D image is subjected to the texture analysis. According to that analysis, a feature vector of 28 elements is established. These 28 elements are extracted from co-occurrence texture and adjacent local binary pattern analysis. The feature vector is a starting point for classification by an extension of a state of the art method, called GA-ICDA. As a result, the distinction between the closely related languages is correctly accomplished. The method is tested on a database of documents in Serbian and Croatian languages. The experiments give promising results.
CITATION STYLE
Brodić, D., Amelio, A., & Milivojević, Z. N. (2015). Characterization and distinction between closely related south slavic languages on the example of serbian and croatian. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9256, pp. 654–666). Springer Verlag. https://doi.org/10.1007/978-3-319-23192-1_55
Mendeley helps you to discover research relevant for your work.