To ensure the effective functioning of the university educational environment, document flow processes automation, which includes the task of documents automatic classification, is of great importance. The article considers the task of classifying university documents by machine learning methods in order to improve the quality of classification. Documents preprocessing was carried out, which made it possible to distinguish significant words in documents, due to which the accuracy of documents classification increased. Described are methods of extracting features from text TF and TF-IDF, which determine keywords by words frequency included in document. A modification of the TF-IDF method is proposed, which consists in calculating the words importance depending on their part of speech. This made it possible to improve the classification quality by highlighting only important and significant words in documents. Suggested is a classification algorithm using a method of support vectors to reduce the documents number involved in classification and a method of k-nearest neighbor for classification. The advantage of this algorithm over the described analogues is shown, which is expressed in the number of mistakenly classified documents decrease.
CITATION STYLE
Tkachenko, A. L., & Denisova, L. A. (2022). Designing an information system for the electronic document management of a university: Automatic classification of documents. In Journal of Physics: Conference Series (Vol. 2182). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/2182/1/012035
Mendeley helps you to discover research relevant for your work.