Data management and data analysis techniques used in pharmacoepidemiological multi-database studies: A systematic literature review

  • Bazelier M
  • Eriksson I
  • De Vries F
  • et al.
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Background: Over the past decade, an increasing number of studies have been performed using healthcare databases from multiple countries, regions or healthcare organisations. Using data from multiple databases offers a number of potential advantages such as increased sample size and generalizability. There are various ways to combine data from several independent databases, such as combining aggregate results or combining individual patient data. Our aim was to identify pharmacoepidemiological multi-database studies and to describe data management and data analysis techniques. Methods: Systematic literature searches were conducted in PubMed and Embase complemented by a manual literature search. We included pharmacoepidemiological multi-database studies published from 2007 onwards that combined data for a pre-planned common analysis or quantitative synthesis. The strategies were based on search terms that included 'databases', 'drugs' and 'epidemiology'/'observational studies' (using MeSH and free text terms). Information was retrieved about study characteristics, methods used for individuallevel analyses and meta-analyses, data management and motivations for performing the study. Regarding the meta-analyses, we defined three different levels of combining data: (1) an aggregate level approach, in which separate analyses are performed on datasets from each database and overall results are collected for meta-analysis, (2) a semi-aggregate level approach, in which stratified datasets with event counts are collected for one common analysis and (3) an individual level approach, in which individual patient data are collected for one common analysis. Results: We found 3083 articles by the systematic searches and an additional 176 by the manual search. After full-text screening of 75 articles, 22 were selected for final inclusion. The number of databases used per study ranged from 2 to 17 (median = 4.0). Most studies used a cohort design (82 %) instead of a case-control design (18 %). Logistic regression was most often used for individual-level analyses (41 %), followed by Cox regression (23 %) and Poisson regression (14 %). As meta-analysis method, a majority of the studies combined individual patient data (73 %). Six studies performed an aggregate meta-analysis (27 %), while a semi-aggregate approach was applied in three studies (14 %). Information on central programming or heterogeneity assessment was missing in approximately half of the publications. Most studies were motivated by power (86 %). Conclusion: Pharmacoepidemiological multi-database studies, a well-powered strategy to address safety issues, have increased in popularity. To be able to correctly interpret the results of these studies, researchers should systematically report on database management and analysis techniques, including central programming and heterogeneity testing.

Cite

CITATION STYLE

APA

Bazelier, M. T., Eriksson, I., De Vries, F., Schmidt, M. K., Raitanen, J., Haukka, J., … Andersen, M. (2015). Data management and data analysis techniques used in pharmacoepidemiological multi-database studies: A systematic literature review. European Journal of Epidemiology, 30(8), 732. Retrieved from http://www.embase.com/search/results?subaction=viewrecord&from=export&id=L72274361

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free