Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

Vitali Hirsch; Peter Reimann; Dennis Treder-Tschechlov; Holger Schwarz; Bernhard Mitschang

Journal ArticleOPEN ACCESS

Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

VLDB Journal (2023) 32(5) 1037-1064

DOI: 10.1007/s00778-023-00780-6

9Citations

11Readers

Abstract

Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.

Author supplied keywords

Cite

CITATION STYLE

APA

Hirsch, V., Reimann, P., Treder-Tschechlov, D., Schwarz, H., & Mitschang, B. (2023). Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification. VLDB Journal, 32(5), 1037–1064. https://doi.org/10.1007/s00778-023-00780-6

Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions