Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)

21Citations
Citations of this article
59Readers
Mendeley users who have this article in their library.

Abstract

Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P < 0.001), replicating a previous finding. We also identified a negative correlation between LPA and a topic enriched for lung cancer (P < 0.001) which was not previously identified via phenome-wide scanning. We were able to replicate the top finding in a separate dataset. Our results demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases.

References Powered by Scopus

A global reference for human genetic variation

11559Citations
N/AReaders
Get full text

Probabilistic topic models

3957Citations
N/AReaders
Get full text

Sequence variations in PCSK9, low LDL, and protection against coronary heart disease

2760Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Mapping ICD-10 and ICD-10-CM Codes to phecodes: Workflow development and initial evaluation

262Citations
N/AReaders
Get full text

Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study

30Citations
N/AReaders
Get full text

TASTE

26Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Zhao, J., Feng, Q. P., Wu, P., Warner, J. L., Denny, J. C., & Wei, W. Q. (2019). Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA). PLoS ONE, 14(2). https://doi.org/10.1371/journal.pone.0212112

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 15

58%

Researcher 6

23%

Lecturer / Post doc 3

12%

Professor / Associate Prof. 2

8%

Readers' Discipline

Tooltip

Medicine and Dentistry 8

36%

Computer Science 6

27%

Nursing and Health Professions 5

23%

Engineering 3

14%

Article Metrics

Tooltip
Social Media
Shares, Likes & Comments: 12

Save time finding and organizing research with Mendeley

Sign up for free