Large Datasets in Biomedicine: A Discussion of Salient Analytic Issues

38Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

Advances in high-throughput and mass-storage technologies have led to an information explosion in both biology and medicine, presenting novel challenges for analysis and modeling. With regards to multivariate analysis techniques such as clustering, classification, and regression, large datasets present unique and often misunderstood challenges. The authors' goal is to provide a discussion of the salient problems encountered in the analysis of large datasets as they relate to modeling and inference to inform a principled and generalizable analysis and highlight the interdisciplinary nature of these challenges. The authors present a detailed study of germane issues including high dimensionality, multiple testing, scientific significance, dependence, information measurement, and information management with a focus on appropriate methodologies available to address these concerns. A firm understanding of the challenges and statistical technology involved ultimately contributes to better science. The authors further suggest that the community consider facilitating discussion through interdisciplinary panels, invited papers and curriculum enhancement to establish guidelines for analysis and reporting. © 2009 J Am Med Inform Assoc.

Cite

CITATION STYLE

APA

Sinha, A., Hripcsak, G., & Markatou, M. (2009). Large Datasets in Biomedicine: A Discussion of Salient Analytic Issues. Journal of the American Medical Informatics Association, 16(6), 759–767. https://doi.org/10.1197/jamia.M2780

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free