The aim of this study is to develop a classification model with capabilities of performing text analysis, ID labeling, or tagging to an unstructured and uncategorized dataset, and perform supervised classification with trained datasets as input to predict the output of classification. The proposed technique classifies the dataset into four categories (i.e., crime, education, marriage, and sports) fittingly using random forest technique. The framework of text analysis document classification consists of five stages which are (i) collecting news dataset, (ii) data pre-processing (iii) document term matrix and weighting term, (iv) classification using random forest technique, and (v) text analytics and visualization results. This study presents a classification model which is able to perform text analysis during search for terms variable that appears frequently across the dataset.
CITATION STYLE
Noormanshah, W. M. U., Nohuddin, P. N. E., & Zainol, Z. (2020). Document Content Analysis Based on Random Forest Algorithm. In Lecture Notes in Networks and Systems (Vol. 118, pp. 485–494). Springer. https://doi.org/10.1007/978-981-15-3284-9_55
Mendeley helps you to discover research relevant for your work.