Weak classifiers performance measure in handling noisy clinical trial data

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most research concluded that machine learning performance is better when dealing with cleaned dataset compared to dirty dataset. In this paper, we experimented three weak or base machine learning classifiers: Decision Table, Naive Bayes and k-Nearest Neighbor to see their performance on real-world, noisy and messy clinical trial dataset rather than employing beautifully designed dataset. We involved the clinical trial data scientist in leading us to a better data analysis exploration and enhancing the performance result evaluation. The classifiers performances were analyzed using Accuracy and Receiver Operating Characteristic (ROC), supported with sensitivity, specificity and precision values which resulted to contradiction of conclusion made by previous research. We employed pre-processing techniques such as interquartile range technique to remove the outliers and mean imputation to handle missing values and these techniques resulted to; all three classifiers work better in dirty dataset compared to imputed and clean dataset by showing highest accuracy and ROC measure. Decision Table turns out to be the best classifier when dealing with real-world noisy clinical trial.

Cite

CITATION STYLE

APA

Kamaru-Zaman, E. A., Brass, A., Weatherall, J., & Rahman, S. A. (2016). Weak classifiers performance measure in handling noisy clinical trial data. In Communications in Computer and Information Science (Vol. 652, pp. 148–157). Springer Verlag. https://doi.org/10.1007/978-981-10-2777-2_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free