Non-small-cell lung cancer (NSCLC) is the most common type of lung cancer, which accounts for a proportion of nearly 85%. The increasing availability of genome-wide gene expression data has facilitated the identification of gene signatures that are significant to the precise classification of NSCLC subtypes and personalized treatment decisions. Unsupervised feature selection is an effective computational technique for searching the most discriminative feature subset to distinguish different classes and find the potential information embedded in biological data. In this study, we proposed a novel unsupervised feature selection method to identify the gene signatures for NSCLC subtype classification based on gene expression data. The proposed method incorporated linear discriminant analysis, adaptive structure preservation, and l_{2,1} -norm sparse regression into a joint learning framework for unsupervised feature selection to select the informative genes. An effective algorithm was developed to solve the optimization problem in the proposed method. Furthermore, we performed module-based gene filtering before feature selection to reduce the computational cost. We evaluated the proposed method on a gene expression dataset of NSCLC from The Cancer Genome Atlas (TCGA). The experimental results show that the proposed method identified a small number of gene signatures for accurate NSCLC subtype classification. Enrichment analysis of the identified gene signatures was also performed by summarizing the key biological processes.
CITATION STYLE
Ye, X., Zhang, W., & Sakurai, T. (2020). Adaptive Unsupervised Feature Learning for Gene Signature Identification in Non-Small-Cell Lung Cancer. IEEE Access, 8, 154354–154362. https://doi.org/10.1109/ACCESS.2020.3018480
Mendeley helps you to discover research relevant for your work.