Abstract
Variable importance indices relying on the outputs of parametric techniques (e.g. Partial Least Squares - PLS) have been hailed an efficient course of action for variable selection in wrapper-based frameworks. The use of parametric techniques for that purpose, however, may lead to unreliable rankings when the assessed variables do not follow a parametric probability density function, jeopardizing the precision of variable importance assessment. This paper presents a new framework for variable selection that relies on non-parametric statistical tests with the aim of classifying industrial batches or samples into multiple classes related to quality or authenticity. The framework relies on two phases. In the first phase (i.e. filter), the Mutual Information (MI) technique performs a preliminary removal of less significant variables. In the second phase (i.e. wrapper), three non-parametric tests (Anderson-Darling, Kruskal-Wallis and Steel's Test) are used to rank the remaining variables according to their relevance for classification. The robustness of the proposed framework is evaluated by varying the MI cutoff and different types of classifiers in data collected from seven industrial processes. On average, the recommended combination of MI cutoff, non-parametric test and classifier for each dataset increased classification accuracy by 17.04% while requiring 78.65% less variables when compared to the well-known stepwise variable selection method. The proposed framework also outperformed other variable selection approaches from the literature.
Author supplied keywords
Cite
CITATION STYLE
Beuren, G. M., & Anzanello, M. J. (2019). Variable selection using statistical non-parametric tests for classifying production batches into multiple classes. Chemometrics and Intelligent Laboratory Systems, 193. https://doi.org/10.1016/j.chemolab.2019.103830
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.