Text Augmentation Techniques for Document Vector Generation from Russian News Articles

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, a document classification system is enhanced through the construction of a text augmentation technique by testing various Part-of-Speech filters and word vector weighting methods with nine different models for document representation. Subject/object tagging is introduced as a new form of text augmentation, along with a novel classification system grounded in a word weighting method based on the distribution of words among classes of documents. When an augmentation including subject/object tagging, a nouns+adjectives filter and Inverse Document Frequency word weighting was applied, an average increase in classification accuracy of 4.1% points was observed.

Cite

CITATION STYLE

APA

Aminoff, C., Romanenko, A., Kosomaa, O., & Vankka, J. (2018). Text Augmentation Techniques for Document Vector Generation from Russian News Articles. In Communications in Computer and Information Science (Vol. 920, pp. 571–586). Springer Verlag. https://doi.org/10.1007/978-3-319-99972-2_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free