Dataset Characteristics (Metafeatures)

5Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This chapter discusses dataset characteristics that play a crucial role in many metalearning systems. Typically, they help to restrict the search in a given configuration space. The basic characteristic of the target variable, for instance, determines the choice of the right approach. If it is numeric, it suggests that a suitable regression algorithm should be used, while if it is categorical, a classification algorithm should be used instead. This chapter provides an overview of different types of dataset characteristics, which are sometimes also referred to as metafeatures. These are of different types, and include so-called simple, statistical, information-theoretic, model-based, complexitybased, and performance-based metafeatures. The last group of characteristics has the advantage that it can be easily defined in any domain. These characteristics include, for instance, sampling landmarkers representing the performance of particular algorithms on samples of data, relative landmarkers capturing differences or ratios of performance values and providing estimates of performance gains. The final part of this chapter discusses the specific dataset characteristics used in different machine learning tasks, including classification, regression, time series, and clustering.

Cite

CITATION STYLE

APA

Brazdil, P., van Rijn, J. N., Soares, C., & Vanschoren, J. (2022). Dataset Characteristics (Metafeatures). In Cognitive Technologies (pp. 53–75). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-67024-5_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free