Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

284Citations
Citations of this article
224Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.

References Powered by Scopus

GloVe: Global vectors for word representation

26881Citations
N/AReaders
Get full text

Bagging predictors

19042Citations
N/AReaders
Get full text

pROC: An open-source package for R and S+ to analyze and compare ROC curves

8744Citations
N/AReaders
Get full text

Cited by Powered by Scopus

The PRISMA 2020 statement: An updated guideline for reporting systematic reviews

45898Citations
N/AReaders
Get full text

The PRISMA 2020 statement: An updated guideline for reporting systematic reviews

6062Citations
N/AReaders
Get full text

PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews

6037Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Marshall, I. J., Noel-Storr, A., Kuiper, J., Thomas, J., & Wallace, B. C. (2018). Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner’s guide. In Research Synthesis Methods (Vol. 9, pp. 602–614). John Wiley and Sons Ltd. https://doi.org/10.1002/jrsm.1287

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 64

54%

Researcher 27

23%

Professor / Associate Prof. 19

16%

Lecturer / Post doc 8

7%

Readers' Discipline

Tooltip

Medicine and Dentistry 48

53%

Computer Science 25

27%

Social Sciences 9

10%

Business, Management and Accounting 9

10%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 1

Save time finding and organizing research with Mendeley

Sign up for free