A substantial proportion of trial registrations are not linked to corresponding published articles, limiting analyses and new tools. Our aim was to develop a method for finding articles reporting the results of trials that are registered on ClinicalTrials.gov when they do not include metadata links. We used a set of 27,280 trial registration and article pairs to train and evaluate methods for identifying missing links in both directions—from articles to registrations and from registrations to articles. We trained a classifier with six distance metrics as feature representations to rank the correct article or registration, using recall@K to evaluate performance and compare to baseline methods. When identifying links from registrations to published articles, the classifier ranked the correct article first (recall@1) among 378,048 articles in 80.8% of evaluation cases and 34.9% in the baseline method. Recall@10 was 85.1% compared to 60.7% in the baseline. When predicting links from articles to registrations, recall@1 was 83.4% for the classifier and 39.8% in the baseline. Recall@10 was 89.5% compared to 65.8% in the baseline. The proposed method improves on our baseline document similarity method to be feasible for identifying missing links in practice. Given a ClinicalTrials.gov registration, a user checking 10 ranked articles can expect to identify the matching article in at least 85% of cases, if the trial has been published. The proposed method can be used to improve the coupling of ClinicalTrials.gov and PubMed, with applications related to automating systematic review and evidence synthesis processes.
Liu, S., Bourgeois, F. T., & Dunn, A. G. (2022). Identifying unreported links between ClinicalTrials.gov trial registrations and their published results. Research Synthesis Methods, 13(3), 342–352. https://doi.org/10.1002/jrsm.1545