We investigate several global variable importance measures derived from artificial neural networks (ANN) to address the challenging problem of feature ranking in high-dimensional unstructured problems. While several ANN (local) importance measures have been validated in the context of computer vision or natural language processing tasks, it is not clear how these methods perform on unstructured problems where many variables are expected to be irrelevant. We empirically compare these ANN measures with one standard and state-of-the-art Random forests (RF) importance measure on several artificial and real datasets. These experiments show that ANN measures can achieve performance similar to the RF measure, sometimes outperforming it. On some problems however, the feature rankings returned by ANN are not as good as the ones returned by RF, despite significantly better predictive performance. Importantly, reaching the best performance with the ANN-based methods often comes at the cost of introducing a so-called selection layer at the beginning of the network. Using this specific neural architecture has proven to be critical both in terms of feature ranking and predictive performance on datasets with many irrelevant variables. Finally, we evaluate these methods on the problem of gene network inference, where they yield decent performance, without however outperforming RF.
CITATION STYLE
Vecoven, N., Begon, J. M., Sutera, A., Geurts, P., & Huynh-Thu, V. A. (2020). Nets Versus Trees for Feature Ranking and Gene Network Inference. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12323 LNAI, pp. 231–245). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61527-7_16
Mendeley helps you to discover research relevant for your work.