Abstract
Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.
Author supplied keywords
Cite
CITATION STYLE
Hirsch, V., Reimann, P., Treder-Tschechlov, D., Schwarz, H., & Mitschang, B. (2023). Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification. VLDB Journal, 32(5), 1037–1064. https://doi.org/10.1007/s00778-023-00780-6
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.