Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods

29Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Long-term cigarette smoking causes various human diseases, including respiratory disease, cancer, and gastrointestinal (GI) disorders. Alterations in gene expression and variable splicing processes induced by smoking are associated with the development of diseases. This study applied advanced machine learning methods to identify the isoforms with important roles in distinguishing smokers from former smokers based on the expression profile of isoforms from current and former smokers collected in one previous study. These isoforms were deemed as features, which were first analyzed by the Boruta to select features highly correlated with the target variables. Then, the selected features were evaluated by four feature ranking algorithms, resulting in four feature lists. The incremental feature selection method was applied to each list for obtaining the optimal feature subsets and building high-performance classification models. Furthermore, a series of classification rules were accessed by decision tree with the highest performance. Eventually, the rationality of the mined isoforms (features) and classification rules was verified by reviewing previous research. Features such as isoforms ENST00000464835 (expressed by LRRN3), ENST00000622663 (expressed by SASH1), and ENST00000284311 (expressed by GPR15), and pathways (cytotoxicity mediated by natural killer cell and cytokine-cytokine receptor interaction) revealed by the enrichment analysis, were highly relevant to smoking response, suggesting the robustness of our analysis pipeline.

References Powered by Scopus

Random forests

95271Citations
29772Readers

This article is free to access.

45914Citations
8786Readers

This article is free to access.

SMOTE: Synthetic minority over-sampling technique

22535Citations
10911Readers

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Huang, F., Ma, Q., Ren, J., Li, J., Wang, F., Huang, T., & Cai, Y. D. (2023). Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods. BioMed Research International, 2023. https://doi.org/10.1155/2023/5333361

Readers over time

‘23‘24036912

Readers' Seniority

Tooltip

Researcher 2

100%

Readers' Discipline

Tooltip

Medicine and Dentistry 1

25%

Biochemistry, Genetics and Molecular Bi... 1

25%

Immunology and Microbiology 1

25%

Psychology 1

25%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free
0