Automated document categorization model

Rakhi Patra

Book Chapter

Automated document categorization model

Patra R

Springer, (2021), 19-36

DOI: 10.1007/978-3-030-50641-4_2

0Citations

3Readers

Get full text

Abstract

The aim of this work is to build a generic model of Document Clustering that automatically groups together the related documents. Model is built with unsupervised and supervised learning with the assumption of no prior knowledge of the given domain. No manual effort is required for creating the training document set, instead the proposed model automatically generates training document. After that, it uses those for categorizing text documents. In the proposed model, the entire process is broadly divided into two steps. First, the initial classification is done in an unsupervised way. Apply K-means algorithm on the unlabeled documents in order to prepare the training dataset. Text documents are represented here as feature vector format where keywords extracted are considered as a feature. Here the selected representative documents are considered as the initial centroids. In step 2, create a supervised classifier on the initially categorized set. The categorized documents resulted from the previous step are used to train the supervised classifier. Naive Bayes classifier will be used as a statistical text classifier which uses word frequencies as features.

Author supplied keywords

Cite

CITATION STYLE

APA

Patra, R. (2021). Automated document categorization model. In Studies in Computational Intelligence (Vol. 907, pp. 19–36). Springer. https://doi.org/10.1007/978-3-030-50641-4_2

Automated document categorization model

Abstract

Author supplied keywords

Cite

Register to see more suggestions