Text classification in the domain of applied linguistics as part of a pre-editing module for machine translation systems

Ksenia Oskina

Conference Proceedings

Text classification in the domain of applied linguistics as part of a pre-editing module for machine translation systems

Oskina K

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9811 LNCS 691-698

DOI: 10.1007/978-3-319-43958-7_84

1Citations

3Readers

Get full text

Abstract

This article describes the method of document classification on the basis of a vector space model with regard to the domain of Applied Linguistics for Russian. This method makes it possible to classify input text data in two different categories: applied linguistics texts (AL) and non-applied linguistics texts (nonAL). The proposed method is implemented using the statistical measure of TF-IDF and the evaluation measure of cosine similarity. The study gives promising results and opens up further prospects for the application of this approach to text classification in other languages.

Author supplied keywords

Cite

CITATION STYLE

APA

Oskina, K. (2016). Text classification in the domain of applied linguistics as part of a pre-editing module for machine translation systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9811 LNCS, pp. 691–698). Springer Verlag. https://doi.org/10.1007/978-3-319-43958-7_84

Text classification in the domain of applied linguistics as part of a pre-editing module for machine translation systems

Abstract

Author supplied keywords

Cite

Register to see more suggestions