Correct machine learning on protein sequences: A peer-reviewing perspective

55Citations
Citations of this article
130Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Machine learning methods are becoming increasingly popular to predict protein features from sequences. Machine learning in bioinformatics can be powerful but carries also the risk of introducing unexpected biases, which may lead to an overestimation of the performance. This article espouses a set of guidelines to allow both peer reviewers and authors to avoid common machine learning pitfalls. Understanding biology is necessary to produce useful data sets, which have to be large and diverse. Separating the training and test process is imperative to avoid over-selling method performance, which is also dependent on several hidden parameters. A novel predictor has always to be compared with several existing methods, including simple baseline strategies. Using the presented guidelines will help nonspecialists to appreciate the critical issues in machine learning.

Cite

CITATION STYLE

APA

Walsh, I., Pollastri, G., & Tosatto, S. C. E. (2016). Correct machine learning on protein sequences: A peer-reviewing perspective. Briefings in Bioinformatics, 17(5), 831–840. https://doi.org/10.1093/bib/bbv082

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free