In this paper, a study is established for exploiting a document classification technique for categorizing a set of random online documents. The technique is aimed to assign one or more classes or categories to a document, making it easier to manage and sort. This paper describes an experiment on the proposed method for classifying documents effectively using the decision tree technique. The proposed research framework is a Document Analysis based on the Random Forest Algorithm (DARFA). The proposed framework consists of 5 components, which are (i) Document dataset, (ii) Data Preprocessing, (iii) Document Term Matrix, (iv) Random Forest classification, and (v) Visualization. The proposed classification method can analyze the content of document datasets and classifies documents according to the text content. The proposed framework use algorithms that include TF-IDF and Random Forest algorithm. The outcome of this study benefits as an enhancement to document management procedures like managing documents in daily business operations, consolidating inventory systems, organizing files in databases, and categorizing document folders.
CITATION STYLE
Nohuddin, P. N. E., Noormanshah, W. M. U., & Zainol, Z. (2021). Content analytics based on random forest classification technique: An empirical evaluation using online news dataset. International Journal of Advanced and Applied Sciences, 8(2), 77–84. https://doi.org/10.21833/ijaas.2021.02.011
Mendeley helps you to discover research relevant for your work.