In this paper, a document classification system is enhanced through the construction of a text augmentation technique by testing various Part-of-Speech filters and word vector weighting methods with nine different models for document representation. Subject/object tagging is introduced as a new form of text augmentation, along with a novel classification system grounded in a word weighting method based on the distribution of words among classes of documents. When an augmentation including subject/object tagging, a nouns+adjectives filter and Inverse Document Frequency word weighting was applied, an average increase in classification accuracy of 4.1% points was observed.
CITATION STYLE
Aminoff, C., Romanenko, A., Kosomaa, O., & Vankka, J. (2018). Text Augmentation Techniques for Document Vector Generation from Russian News Articles. In Communications in Computer and Information Science (Vol. 920, pp. 571–586). Springer Verlag. https://doi.org/10.1007/978-3-319-99972-2_47
Mendeley helps you to discover research relevant for your work.