While determining model complexity is an important problem in machine learning, many feature learning algorithms rely on cross-validation to choose an optimal number of features, which is usually challenging for online learning from a massive stream of data. In this paper, we propose an incremental feature learning algorithm to determine the optimal model complexity for large-scale, online datasets based on the denoising autoencoder. This algorithm is composed of two processes: adding features and merging features. Speciﬁcally, it adds new features to minimize the objective function’s residual and merges similar features to obtain a compact feature representation and prevent over-ﬁtting. Our experiments show that the proposed model quickly converges to the optimal number of features in a large-scale online setting. In classiﬁcation tasks, our model outperforms the (non-incremental) denoising autoencoder, and deep networks constructed from our algorithm perform favorably compared to deep belief networks and stacked denoising autoencoders. Further, the algorithm is effective in recognizing new patterns when the data distribution changes over time in the massive online data stream.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below