Using Cluster Analysis to Assess the Impact of Dataset Heterogeneity on Deep Convolutional Network Accuracy: A First Glance

Mauro Mendez; Saul Calderon; Pascal N. Tyrrell

Conference Proceedings

Using Cluster Analysis to Assess the Impact of Dataset Heterogeneity on Deep Convolutional Network Accuracy: A First Glance

Communications in Computer and Information Science (2020) 1087 CCIS 307-319

DOI: 10.1007/978-3-030-41005-6_21

8Citations

13Readers

Get full text

Abstract

In this paper we performed cluster analysis using Fuzzy K-means over the image-based features of two models, to assess how dataset heterogeneity impacts model accuracy. A highly heterogeneous dataset is linked with sparse data samples, which usually impacts the overall model generalization and accuracy with test samples. We propose to measure the Coefficient of Variation (CV) in the resulting clusters, to estimate data heterogeneity as a metric for predicting model generalization and test accuracy. We show that highly heterogeneous datasets are common when the number of samples are not enough, thus yielding a high CV. In our experiments with two different models and datasets, higher CV values decreased model test accuracy considerably. We tested ResNet 18, to solve binary classification of x-ray teeth scans, and VGG16, to solve age regression from hand x-ray scans. Results obtained suggest that cluster analysis can be used to identify heterogeneity influence on CNN model testing accuracy. According to our experiments, we consider that a CV <5% is recommended to yield a satisfactory model test accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Mendez, M., Calderon, S., & Tyrrell, P. N. (2020). Using Cluster Analysis to Assess the Impact of Dataset Heterogeneity on Deep Convolutional Network Accuracy: A First Glance. In Communications in Computer and Information Science (Vol. 1087 CCIS, pp. 307–319). Springer. https://doi.org/10.1007/978-3-030-41005-6_21

Using Cluster Analysis to Assess the Impact of Dataset Heterogeneity on Deep Convolutional Network Accuracy: A First Glance

Abstract

Author supplied keywords

Cite

Register to see more suggestions