Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance

Hiromasa Kaneko

Journal ArticleOPEN ACCESS

Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance

Kaneko H

ACS Omega (2023) 8(25) 23218-23225

DOI: 10.1021/acsomega.3c03722

26Citations

55Readers

Abstract

Feature importance (FI) is used to interpret the machine learning model y = f(x) constructed between the explanatory variables or features, x, and the objective variables, y. For a large number of features, interpreting the model in the order of increasing FI is inefficient when there are similarly important features. Therefore, in this study, a method is developed to interpret models by considering the similarities between the features in addition to the FI. The cross-validated permutation feature importance (CVPFI), which can be calculated using any machine learning method and can handle multicollinearity problems, is used as the FI, while the absolute correlation and maximal information coefficients are used as metrics of feature similarity. Machine learning models could be effectively interpreted by considering the features from the Pareto fronts, where CVPFI is large and the feature similarity is small. Analyses of actual molecular and material data sets confirm that the proposed method enables the accurate interpretation of machine learning models.

Cite

CITATION STYLE

APA

Kaneko, H. (2023). Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance. ACS Omega, 8(25), 23218–23225. https://doi.org/10.1021/acsomega.3c03722

Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance

Abstract

Cite

Register to see more suggestions