Abstract
Background: General practice data are increasingly used to estimate chronic disease prevalence. Concerns remain about data completeness and fragmentation, particularly when patients attend multiple practices. Previous studies have restricted analyses to only include ‘active’ patients (frequent clinical encounters), assuming that these records are more complete and representative; however, the validity of this approach has not been tested. This study examines whether the prevalence estimated from patient-level (linked) general practice records differs from the common approach of using active practice-level (unlinked) general practice records. Methods: This retrospective cohort study used de-identified electronic health records from the MedicineInsight dataset, comprising 694,004 patients aged 18 years and older from 39 general practices in Western Australia, covering approximately 32.7% of the state’s adult population as of January 26, 2022. Patient demographics, diagnoses, and clinical encounters were analysed. Results: Condition prevalence estimates vary depending on cohort definition and the inclusion of patients with low general practice engagement. Active patients had higher median encounters (9 vs. 4) and consistently higher condition prevalence across all chronic diseases, including hypertension (18.2% vs. 11.6%), diabetes (7.0% vs. 4.6%), and asthma (11.3% vs. 8.1%), demonstrating systematic overestimation when analyses exclude patients with lower healthcare utilisation. The patient-level cohort captured more total diagnoses due to its larger denominator (257,023 total diagnosed conditions across the N = 608,000 patient-level cohort, versus 133,235 total diagnosed conditions across N = 201,817 practice-level active patients). Conclusion: Diagnostic information in general practice records is often dispersed across practices, affecting population planning and research. Linking patient records across practices enhances diagnostic visibility and reveals a more complete picture of chronic disease burden, highlighting the risk of overestimating disease prevalence when analyses are restricted to active patient records alone. This overestimation likely results from excluding healthier patients with fewer healthcare encounters. Small differences in prevalence estimates can have substantial implications on population-level planning, potentially affecting funding allocations, clinical guideline, and workforce decisions. These findings suggest the need for linked general practice datasets to improve the accuracy of prevalence estimates and inform effective policy and resource allocation decisions in primary care.
Author supplied keywords
Cite
CITATION STYLE
Varhol, R. J., Lee, C. M. Y., Randall, S., Boyd, J. H., & Robinson, S. (2025). Using general practice data for chronic disease prevalence: the impact of record linkage on estimation accuracy. BMC Medical Informatics and Decision Making, 25(1). https://doi.org/10.1186/s12911-025-03244-9
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.