FSR: Feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number

13Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. Results: We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. © The Author 2011. Published by Oxford University Press. All rights reserved.

References Powered by Scopus

Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

8804Citations
N/AReaders
Get full text

The international HapMap project

5196Citations
N/AReaders
Get full text

A review of feature selection techniques in bioinformatics

4120Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining

79Citations
N/AReaders
Get full text

Discovering cancer subtypes via an accurate fusion strategy on multiple profile data

37Citations
N/AReaders
Get full text

HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework

24Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wong, G., Leckie, C., & Kowalczyk, A. (2012). FSR: Feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number. Bioinformatics, 28(2), 151–159. https://doi.org/10.1093/bioinformatics/btr644

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 12

35%

Researcher 11

32%

Professor / Associate Prof. 9

26%

Lecturer / Post doc 2

6%

Readers' Discipline

Tooltip

Computer Science 11

46%

Agricultural and Biological Sciences 7

29%

Engineering 3

13%

Medicine and Dentistry 3

13%

Save time finding and organizing research with Mendeley

Sign up for free