Most research concluded that machine learning performance is better when dealing with cleaned dataset compared to dirty dataset. In this paper, we experimented three weak or base machine learning classifiers: Decision Table, Naive Bayes and k-Nearest Neighbor to see their performance on real-world, noisy and messy clinical trial dataset rather than employing beautifully designed dataset. We involved the clinical trial data scientist in leading us to a better data analysis exploration and enhancing the performance result evaluation. The classifiers performances were analyzed using Accuracy and Receiver Operating Characteristic (ROC), supported with sensitivity, specificity and precision values which resulted to contradiction of conclusion made by previous research. We employed pre-processing techniques such as interquartile range technique to remove the outliers and mean imputation to handle missing values and these techniques resulted to; all three classifiers work better in dirty dataset compared to imputed and clean dataset by showing highest accuracy and ROC measure. Decision Table turns out to be the best classifier when dealing with real-world noisy clinical trial.
CITATION STYLE
Kamaru-Zaman, E. A., Brass, A., Weatherall, J., & Rahman, S. A. (2016). Weak classifiers performance measure in handling noisy clinical trial data. In Communications in Computer and Information Science (Vol. 652, pp. 148–157). Springer Verlag. https://doi.org/10.1007/978-981-10-2777-2_13
Mendeley helps you to discover research relevant for your work.