Identifying rare classes with sparse training data

Mingwu Zhang; Wei Jiang; Chris Clifton; Sunil Prabhakar

Conference Proceedings

Identifying rare classes with sparse training data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4653 LNCS 751-760

DOI: 10.1007/978-3-540-74469-6_73

0Citations

8Readers

Get full text

Abstract

Building models and learning patterns from a collection of data are essential tasks for decision making and dissemination of knowledge. One of the common tools to extract knowledge is to build a classifier. However, when the training dataset is sparse, it is difficult to build an accurate classifier. This is especially true in biological science, as biological data are hard to produce and error-prone. Through empirical results, this paper shows challenges in building an accurate classifier with a sparse biological training dataset. Our findings indicate the inadequacies in well known classification techniques. Although certain clustering techniques, such as seeded k-Means, show some promise, there are still spaces for further improvement. In addition, we propose a novel idea that could be used to produce more balanced classifier when training data samples are very limited. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Zhang, M., Jiang, W., Clifton, C., & Prabhakar, S. (2007). Identifying rare classes with sparse training data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4653 LNCS, pp. 751–760). Springer Verlag. https://doi.org/10.1007/978-3-540-74469-6_73

Identifying rare classes with sparse training data

Abstract

Cite

Register to see more suggestions