Document Content Analysis Based on Random Forest Algorithm

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of this study is to develop a classification model with capabilities of performing text analysis, ID labeling, or tagging to an unstructured and uncategorized dataset, and perform supervised classification with trained datasets as input to predict the output of classification. The proposed technique classifies the dataset into four categories (i.e., crime, education, marriage, and sports) fittingly using random forest technique. The framework of text analysis document classification consists of five stages which are (i) collecting news dataset, (ii) data pre-processing (iii) document term matrix and weighting term, (iv) classification using random forest technique, and (v) text analytics and visualization results. This study presents a classification model which is able to perform text analysis during search for terms variable that appears frequently across the dataset.

Cite

CITATION STYLE

APA

Noormanshah, W. M. U., Nohuddin, P. N. E., & Zainol, Z. (2020). Document Content Analysis Based on Random Forest Algorithm. In Lecture Notes in Networks and Systems (Vol. 118, pp. 485–494). Springer. https://doi.org/10.1007/978-981-15-3284-9_55

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free