On comparing classifiers: Pitfalls to avoid and a recommended approach

Steven L. Salzberg

Journal Article

On comparing classifiers: Pitfalls to avoid and a recommended approach

Salzberg S

Data Mining and Knowledge Discovery (1997) 1(3) 317-328

DOI: 10.1023/A:1009752403260

709Citations

635Readers

Get full text

Abstract

An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies. © 1997 Kluwer Academic Publishers.

Author supplied keywords

Cite

CITATION STYLE

APA

Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3), 317–328. https://doi.org/10.1023/A:1009752403260

On comparing classifiers: Pitfalls to avoid and a recommended approach

Abstract

Author supplied keywords

Cite

Register to see more suggestions