Hash-based feature learning for incomplete continuous-valued data

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Hash-based feature learning is a widely-used data mining approach for dimensionality reduction and for building linear models that are comparable in performance to their nonlinear counterpart. Unfortunately, such an approach is inapplicable to many real-world data sets because they are often riddled with missing values. Substantial data preprocessing is therefore needed to impute the missing values before the hash-based features can be derived. Biases can be introduced during this preprocessing because it is performed independently of the subsequent modeling task, which can result in the models constructed from the imputed hash-based features being suboptimal. To overcome this limitation, we present a novel framework called H-FLIP that simultaneously estimates the missing values while constructing a set of nonlinear hash-based features from the incomplete data. The effectiveness of the framework is demonstrated through experiments using both synthetic and real-world data sets.

Cite

CITATION STYLE

APA

Yuan, S., Tan, P. N., Cheruvelil, K. S., Fergus, C. E., Skaff, N. K., & Soranno, P. A. (2017). Hash-based feature learning for incomplete continuous-valued data. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 678–686). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.76

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free