The ever growing amount of data that becomes available necessitates more memory to store it. Machine learned models are becoming increasingly sophisticated and efficient in order to navigate this growing amount of data. However, not all data is relevant for a certain machine learning task and storing that irrelevant data is a waste of memory and power. To address this, we propose bitpaths: a novel pattern-based method to compress datasets using a random forest. During inference, a KNN classifier then uses the encoded training examples to make a prediction for the encoded test example. We empirically compare bitpaths’ predictive performance with the uncompressed setting. Our method can achieve compression ratios up to 80 for datasets with a large number of features without affecting the predictive performance.
CITATION STYLE
Nuyts, L., Devos, L., Meert, W., & Davis, J. (2023). Bitpaths: Compressing Datasets Without Decreasing Predictive Performance. In Communications in Computer and Information Science (Vol. 1752 CCIS, pp. 261–268). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-23618-1_18
Mendeley helps you to discover research relevant for your work.