Content analytics based on random forest classification technique: An empirical evaluation using online news dataset

Puteri N.E. Nohuddin; Wan M.U. Noormanshah; Zuraini Zainol

Journal ArticleOPEN ACCESS

Content analytics based on random forest classification technique: An empirical evaluation using online news dataset

International Journal of Advanced and Applied Sciences (2021) 8(2) 77-84

DOI: 10.21833/ijaas.2021.02.011

4Citations

13Readers

Abstract

In this paper, a study is established for exploiting a document classification technique for categorizing a set of random online documents. The technique is aimed to assign one or more classes or categories to a document, making it easier to manage and sort. This paper describes an experiment on the proposed method for classifying documents effectively using the decision tree technique. The proposed research framework is a Document Analysis based on the Random Forest Algorithm (DARFA). The proposed framework consists of 5 components, which are (i) Document dataset, (ii) Data Preprocessing, (iii) Document Term Matrix, (iv) Random Forest classification, and (v) Visualization. The proposed classification method can analyze the content of document datasets and classifies documents according to the text content. The proposed framework use algorithms that include TF-IDF and Random Forest algorithm. The outcome of this study benefits as an enhancement to document management procedures like managing documents in daily business operations, consolidating inventory systems, organizing files in databases, and categorizing document folders.

Author supplied keywords

Cite

CITATION STYLE

APA

Nohuddin, P. N. E., Noormanshah, W. M. U., & Zainol, Z. (2021). Content analytics based on random forest classification technique: An empirical evaluation using online news dataset. International Journal of Advanced and Applied Sciences, 8(2), 77–84. https://doi.org/10.21833/ijaas.2021.02.011

Content analytics based on random forest classification technique: An empirical evaluation using online news dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions