In this paper, we focus on how to use random forests based methods to improve the anomaly detection rate for streaming datasets. The key concept in a current work [12] is to build a random forest where in any tree, at any internal node, a feature is randomly selected and the associated data space is partitioned in half. However, the model parameters were pre-defined and the efficiency on applying this model for various conditions is not discussed. In this paper, we first give mathematical justification of required tree height and number of trees by casting the problem as a classical coupon collector problem. Then we design a majority voting score combination strategy to combine the results from different anomaly detection trees. Finally, we apply feature clustering to group the correlated features together in order to find the anomalies jointly determined by subsets of features.
CITATION STYLE
Zhao, Z., Mehrotra, K. G., & Mohan, C. K. (2018). Online anomaly detection using random forest. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10868 LNAI, pp. 135–147). Springer Verlag. https://doi.org/10.1007/978-3-319-92058-0_13
Mendeley helps you to discover research relevant for your work.