Rapid advances in genome sequencing and gene expression microarray technologies are providing unprecedented opportunities to identify specific genes involved in complex biological processes, such as development, signal transduction, and disease. The vast amount of data generated by these technologies has presented new challenges in bioinformatics. To help organize and interpret microarray data, new and efficient computational methods are needed to: (1) distinguish accurately between different biological or clinical categories (e.g., malignant vs. benign), and (2) identify specific genes that play a role in determining those categories. Here we present a novel and simple method that exhaustively scans microarray data for unambiguous gene expression patterns. Such patterns of data can be used as the basis for classification into biological or clinical categories. The method, termed the Characteristic Attribute Organization System (CAOS), is derived from fundamental precepts in systematic biology. In CAOS we define two types of characteristic attributes ('pure' and 'private') that may exist in gene expression microarray data. We also consider additional attributes ('compound') that are composed of expression states of more than one gene that are not characteristic on their own. CAOS was tested on three well-known cancer DNA microarray data sets for its ability to classify new microarray samples. We found CAOS to be a highly accurate and robust class prediction technique. In addition, CAOS identified specific genes, not emphasized in other analyses, that may be crucial to the biology of certain types of cancer. The success of CAOS in this study has significant implications for basic research and the future development of reliable methods for clinical diagnostic tools. © 2002 Elsevier Science (USA). All rights reserved.
Sarkar, I. N., Planet, P. J., Bael, T. E., Stanley, S. E., Siddall, M., DeSalle, R., & Figurski, D. H. (2002). Characteristic attributes in cancer microarrays. Journal of Biomedical Informatics, 35(2), 111–122. https://doi.org/10.1016/S1532-0464(02)00504-X