In this work, we address the task of phenotypic traits prediction using methods for semi-supervised learning. More specifically, we propose to use supervised and semi-supervised classification trees as well as supervised and semi-supervised random forests of classification trees. We consider 114 datasets for different phenotypic traits referring to 997 microbial species. These datasets present a challenge for the existing machine learning methods: they are not labelled/annotated entirely and their distribution is typically imbalanced. We investigate whether approaching the task of phenotype prediction as a semi-supervised learning task can yield improved predictive performance. The results suggest that the semi-supervised methodology considered here is especially helpful when using single trees, especially when the amount of labeled data ranges from 20 to 40%. Similar improvements can be seen when the presence of the phenotype is very imbalanced.
CITATION STYLE
Levatić, J., Brbić, M., Perdih, T. S., Kocev, D., Vidulin, V., Šmuc, T., … Džeroski, S. (2018). Phenotype prediction with semi-supervised classification trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10785 LNAI, pp. 138–150). Springer Verlag. https://doi.org/10.1007/978-3-319-78680-3_10
Mendeley helps you to discover research relevant for your work.