Ensemble methods are among the most commonly utilised algorithms that construct a group of models and combine their predictions to provide improved generalisation. They do so by aggregating multiple diverse versions of models learned using machine learning algorithms, and it is this diversity that enables the ensemble to perform better than any of its members taken individually. This approach can be extended to produce ensembles of deep learning methods that combine various good performing models, which are between them very diverse because they have reached different local minima and make different prediction errors. It has been shown that a large, cumbersome deep neural network can be approximated by a smaller network through a process of distillation, and that it is possible to approximate an ensemble of other learning algorithms by using a single neural network, with the help of additional artificially generated pseudo-data. We extend this work to show that an ensemble of deep neural networks can indeed be approximated by a single deep neural network with size and capacity equal to the single ensemble member, and we develop a recipe that shows how this can be achieved without using any artificial training data or any other special provisions, such as using the soft output targets during the distillation process. We also show that, under particular circumstances, the distillation process can be used as a form of regularisation, through its implicit reduction in learning capacity. We corroborate our findings with an experimental analysis on some common benchmark datasets in computer vision and deep learning.
CITATION STYLE
Mosca, A., & Magoulas, G. D. (2018). Distillation of deep learning ensembles as a regularisation method. In Smart Innovation, Systems and Technologies (Vol. 85, pp. 97–118). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-66790-4_6
Mendeley helps you to discover research relevant for your work.