We present in this chapter a novel method for detecting intrusion into host systems that combines both data and execution flow of programs. To do this, we use sequences of system call traces produced by the host’s kernel, together with their arguments. The latter are further augmented with contextual information and domainlevel knowledge in the form of signatures, and used to generate clusters for each individual system call, and for each application type. The argument-driven cluster models are then used to rewrite process sequences of system calls, and the rewritten sequences are fed to a naïve Bayes classifier that builds class conditional probabilities fromMarkov modeling of system call sequences, thus capturing execution flow. The domain level knowledge augments our machine learning-based detection technique with capabilities of deep packet inspection capabilities usually found, until now, in network intrusion detection systems. We provide the results for the clustering phase, together with their validation using the Silhouette width, the cross-validation technique, and a manual analysis of the produced clusters on the 1999 DARPA dataset from the MIT Lincoln Lab.
Rachidi, T., Koucham, O., & Assem, N. (2016). Combined data and execution flow host intrusion detection using machine learning. Studies in Computational Intelligence, 650, 427–450. https://doi.org/10.1007/978-3-319-33386-1_21