Dataset Characteristics (Metafeatures)

Pavel Brazdil; Jan N. van Rijn; Carlos Soares; Joaquin Vanschoren

Book ChapterOPEN ACCESS

Dataset Characteristics (Metafeatures)

Springer Science and Business Media Deutschland GmbH, (2022), 53-75

DOI: 10.1007/978-3-030-67024-5_4

5Citations

16Readers

Abstract

This chapter discusses dataset characteristics that play a crucial role in many metalearning systems. Typically, they help to restrict the search in a given configuration space. The basic characteristic of the target variable, for instance, determines the choice of the right approach. If it is numeric, it suggests that a suitable regression algorithm should be used, while if it is categorical, a classification algorithm should be used instead. This chapter provides an overview of different types of dataset characteristics, which are sometimes also referred to as metafeatures. These are of different types, and include so-called simple, statistical, information-theoretic, model-based, complexitybased, and performance-based metafeatures. The last group of characteristics has the advantage that it can be easily defined in any domain. These characteristics include, for instance, sampling landmarkers representing the performance of particular algorithms on samples of data, relative landmarkers capturing differences or ratios of performance values and providing estimates of performance gains. The final part of this chapter discusses the specific dataset characteristics used in different machine learning tasks, including classification, regression, time series, and clustering.

Cite

CITATION STYLE

APA

Brazdil, P., van Rijn, J. N., Soares, C., & Vanschoren, J. (2022). Dataset Characteristics (Metafeatures). In Cognitive Technologies (pp. 53–75). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-67024-5_4

Dataset Characteristics (Metafeatures)

Abstract

Cite

Register to see more suggestions