The\rcurrent paper proposes a new visualization tool to help check the quality of\rthe random forest predictions by plotting the proximity matrix as weighted\rnetworks. This new visualization technique will be compared with the\rtraditional multidimensional scale plot. The present paper also introduces a\rnew accuracy index (proportion of misplaced cases), and compares it to total\raccuracy, sensitivity and specificity. It also applies cluster coefficients to\rweighted graphs, in order to understand how well the random forest algorithm is\rseparating two classes. Two datasets were analyzed, one from a medical research\r(breast cancer) and the other from a psychology research (medical student’s\racademic achievement), varying the sample sizes and the predictive accuracy.\rWith different number of observations and different possible prediction accuracies,\rit was possible to compare how each visualization technique behaves in each\rsituation. The results pointed that the visualization of random forest’s\rpredictive performance was easier and more intuitive to interpret using the\rweighted network of the proximity matrix than using the multidimensional scale\rplot. The proportion of misplaced cases was highly related to total accuracy,\rsensitivity and specificity. This strategy, together with the computation of Zhang and Horvath’s\r(2005) clustering\rcoefficient for weighted graphs, can be very helpful in understanding how well\ra random forest prediction is doing in terms of classification.
CITATION STYLE
Golino, H. F., & Gomes, C. M. A. (2014). Visualizing Random Forest’s Prediction Results. Psychology, 05(19), 2084–2098. https://doi.org/10.4236/psych.2014.519211
Mendeley helps you to discover research relevant for your work.