FSR: Feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number

Gerard Wong; Christopher Leckie; Adam Kowalczyk

Journal ArticleOPEN ACCESS

FSR: Feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number

Bioinformatics (2012) 28(2) 151-159

DOI: 10.1093/bioinformatics/btr644

13Citations

37Readers

Abstract

Motivation: Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. Results: We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. © The Author 2011. Published by Oxford University Press. All rights reserved.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Wong, G., Leckie, C., & Kowalczyk, A. (2012). FSR: Feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number. Bioinformatics, 28(2), 151–159. https://doi.org/10.1093/bioinformatics/btr644

Readers' Seniority

PhD / Post grad / Masters / Doc 12

35%

Researcher 11

32%

Professor / Associate Prof. 9

26%

Lecturer / Post doc 2

Readers' Discipline

Computer Science 11

46%

Agricultural and Biological Sciences 7

29%

Engineering 3

13%

Medicine and Dentistry 3

13%

FSR: Feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number

Abstract

References Powered by Scopus

Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

The international HapMap project

A review of feature selection techniques in bioinformatics

Cited by Powered by Scopus

A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining

Discovering cancer subtypes via an accurate fusion strategy on multiple profile data

HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline