RaPiDS: an algorithm for rapid expression profile database search.

ISSN: 09199454
14Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

In this paper we present a fast algorithm and implementation for computing the Spearman rank correlation (SRC) between a query expression profile and each expression profile in a database of profiles. The algorithm is linear in the size of the profile database with a very small constant factor. It is designed to efficiently handle multiple profile platforms and missing values. We show that our specialized algorithm and C++ implementation can achieve an approximately 100-fold speed-up over a reasonable baseline implementation using Perl hash tables. RaPiDS is designed for general similarity search rather than classification - but in order to attempt to classify the usefulness of SRC as a similarity measure we investigate the usefulness of this program as a classifier for classifying normal human cell types based on gene expression. Specifically we use the k nearest neighbor classifier with a t statistic derived from SRC as the similarity measure for profile pairs. We estimate the accuracy using a jackknife test on the microarray data with manually checked cell type annotation. Preliminary results suggest the measure is useful (64% accuracy on 1,685 profiles vs. the majority class classifier's 17.5%) for profiles measured under similar conditions (same laboratory and chip platform); but requires improvement when comparing profiles from different experimental series.

Cite

CITATION STYLE

APA

Horton, P. B., Kiseleva, L., & Fujibuchi, W. (2006). RaPiDS: an algorithm for rapid expression profile database search. Genome Informatics. International Conference on Genome Informatics, 17(2), 67–76.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free