Categorization of multilingual scientific documents by a compound classification system

Jarosław Protasiewicz; Marcin Mirończuk; Sławomir Dadas

Conference Proceedings

Categorization of multilingual scientific documents by a compound classification system

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10246 LNAI 563-573

DOI: 10.1007/978-3-319-59060-8_51

3Citations

4Readers

Get full text

Abstract

The aim of this study was to propose a classification method for documents that include simultaneously text parts in various languages. For this purpose, we constructed a three-leveled classification system. On its first level, a data processing module prepares a suitable vector space model. Next, in the middle tier, a set of monolingual or multilingual classifiers assigns the probabilities of belonging each document or its parts to all possible categories. The models are trained by using Multinomial Naïve Bayes and Long Short-Term Memory algorithms. Finally, in the last component, a multilingual decision module assigns a target class to each document. The module is built on a logistic regression classifier, which as the inputs receives probabilities produced by the classifiers. The system has been verified experimentally. According to the reported results, it can be assumed that the proposed system can deal with textual documents which content is composed of many languages at the same time. Therefore, the system can be useful in the automatic organizing of multilingual publications or other documents.

Author supplied keywords

Cite

CITATION STYLE

APA

Protasiewicz, J., Mirończuk, M., & Dadas, S. (2017). Categorization of multilingual scientific documents by a compound classification system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10246 LNAI, pp. 563–573). Springer Verlag. https://doi.org/10.1007/978-3-319-59060-8_51

Categorization of multilingual scientific documents by a compound classification system

Abstract

Author supplied keywords

Cite

Register to see more suggestions