Designing an information system for the electronic document management of a university: Automatic classification of documents

1Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

To ensure the effective functioning of the university educational environment, document flow processes automation, which includes the task of documents automatic classification, is of great importance. The article considers the task of classifying university documents by machine learning methods in order to improve the quality of classification. Documents preprocessing was carried out, which made it possible to distinguish significant words in documents, due to which the accuracy of documents classification increased. Described are methods of extracting features from text TF and TF-IDF, which determine keywords by words frequency included in document. A modification of the TF-IDF method is proposed, which consists in calculating the words importance depending on their part of speech. This made it possible to improve the classification quality by highlighting only important and significant words in documents. Suggested is a classification algorithm using a method of support vectors to reduce the documents number involved in classification and a method of k-nearest neighbor for classification. The advantage of this algorithm over the described analogues is shown, which is expressed in the number of mistakenly classified documents decrease.

Cite

CITATION STYLE

APA

Tkachenko, A. L., & Denisova, L. A. (2022). Designing an information system for the electronic document management of a university: Automatic classification of documents. In Journal of Physics: Conference Series (Vol. 2182). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/2182/1/012035

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free