Document Content Analysis Based on Random Forest Algorithm

Wan M.U. Noormanshah; Puteri N.E. Nohuddin; Zuraini Zainol

Book Chapter

Document Content Analysis Based on Random Forest Algorithm

Springer, (2020), 485-494

DOI: 10.1007/978-981-15-3284-9_55

1Citations

4Readers

Get full text

Abstract

The aim of this study is to develop a classification model with capabilities of performing text analysis, ID labeling, or tagging to an unstructured and uncategorized dataset, and perform supervised classification with trained datasets as input to predict the output of classification. The proposed technique classifies the dataset into four categories (i.e., crime, education, marriage, and sports) fittingly using random forest technique. The framework of text analysis document classification consists of five stages which are (i) collecting news dataset, (ii) data pre-processing (iii) document term matrix and weighting term, (iv) classification using random forest technique, and (v) text analytics and visualization results. This study presents a classification model which is able to perform text analysis during search for terms variable that appears frequently across the dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Noormanshah, W. M. U., Nohuddin, P. N. E., & Zainol, Z. (2020). Document Content Analysis Based on Random Forest Algorithm. In Lecture Notes in Networks and Systems (Vol. 118, pp. 485–494). Springer. https://doi.org/10.1007/978-981-15-3284-9_55

Document Content Analysis Based on Random Forest Algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions