Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

8Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.

Cite

CITATION STYLE

APA

Hirsch, V., Reimann, P., Treder-Tschechlov, D., Schwarz, H., & Mitschang, B. (2023). Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification. VLDB Journal, 32(5), 1037–1064. https://doi.org/10.1007/s00778-023-00780-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free