On sample size and classification accuracy: A performance comparison

Margarita Sordo; Qing Zeng

Conference Proceedings

On sample size and classification accuracy: A performance comparison

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3745 LNBI 193-201

DOI: 10.1007/11573067_20

64Citations

85Readers

Get full text

Abstract

We investigate the dependency between sample size and classification accuracy of three classification techniques: Naïve Bayes, Support Vector Machines and Decision Trees over a set of -8500 text excerpts extracted automatically from narrative reports from the Brigham & Women's Hospital, Boston, USA. Each excerpt refers to the smoking status of a patient as: current, past, never a smoker or, denies smoking. Our empirical results, consistent with [1], confirm that size of the training set and the classification rate are indeed correlated. Even though these algorithms perform reasonably well with small datasets, as the number of cases increases, both SMV and Decision Trees show a substantial improvement in performance, suggesting a more consistent learning process. Unlike the majority of evaluations, ours were carried out specifically in a medical domain where the limited amount of data is a common occurrence [13][14]. This study is part of the I2B2 project, Core 21. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Sordo, M., & Zeng, Q. (2005). On sample size and classification accuracy: A performance comparison. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3745 LNBI, pp. 193–201). https://doi.org/10.1007/11573067_20

On sample size and classification accuracy: A performance comparison

Abstract

Cite

Register to see more suggestions