Biases of drug-target interaction network data

7Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Network based prediction of interaction between drug compounds and target proteins is a core step in the drug discovery process. The availability of drug-target interaction data has boosted the development of machine learning methods for the in silico prediction of drug-target interactions. In this paper we focus on the crucial issue of data bias. We show that four popular datasets contain a bias because of the way they have been constructed: all drug compounds and target proteins have at least one interaction and some of them have only a single interaction. We show that this bias can be exploited by prediction methods to achieve an optimistic generalization performance as estimated by cross-validation procedures, in particular leave-one-out cross validation. We discuss possible ways to mitigate the effect of this bias, in particular by adapting the validation procedure. In general, results indicate that the data bias should be taken into account when assessing the generalization performance of machine learning methods for the in silico prediction of drug-target interactions. The datasets and source code for this article are available at http://cs.ru.nl/~tvanlaarhoven/bias2014/ © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Van Laarhoven, T., & Marchiori, E. (2014). Biases of drug-target interaction network data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8626 LNBI, pp. 23–33). Springer Verlag. https://doi.org/10.1007/978-3-319-09192-1_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free