Bigger data is better for molecular diagnosis tests based on decision trees

3Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most molecular diagnosis tests are based on small studies with about twenty patients, and use classical statistics. The prevailing conception is that such studies can indeed yield accurate tests with just one or two predictors, especially when using informative molecules like microRNA in cancer diagnosis. We investigated the relationship between accuracy, the number of microRNA predictors, and the sample size of the dataset used in developing cancer diagnosis tests. The generalization capability of the tests was also investigated. One of the largest existing free breast cancer dataset was used in a binary classification (cancer versus normal) using C5 and CART decision trees. The results show that diagnosis tests with a good compromise between accuracy and the number of predictors (related to costs) can be obtained with C5 or CART on a sample size of more than 100 patients. These tests generalize well.

Cite

CITATION STYLE

APA

Floares, A. G., Calin, G. A., & Manolache, F. B. (2016). Bigger data is better for molecular diagnosis tests based on decision trees. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9714 LNCS, 288–295. https://doi.org/10.1007/978-3-319-40973-3_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free