Sign up & Download
Sign in

Improving Network Intrusion Detection by Means of Domain-Aware Genetic Programming

by J Blasco, A Orfila, A Ribagorda
Availability Reliability and Security 2010 ARES 10 International Conference on ()

Abstract

One of the central areas in network intrusion detection is how to build effective systems that are able to distinguish normal from intrusive traffic. In this paper we explore the use of Genetic Programming (GP) for such a purpose. Although GP has already been studied for this task, the inner features of network intrusion detection have been systematically ignored. To avoid the blind use of GP shown in previous research, we guide the search by means of a fitness function based on recent advances on IDS evaluation. For the experimental work we use a well-known dataset (i.e. KDD-99) that has become a standard to compare research although its drawbacks. Results clearly show that an intelligent use of GP achieves systems that are comparable (and even better in realistic conditions) to top state-of-the-art proposals in terms of effectiveness, improving them in efficiency and simplicity.

Cite this document (BETA)

Available from ieeexplore.ieee.org
Page 1
hidden

Improving Network Intrusion Detec...

Improving Network Intrusion Detection by Means of Domain-Aware Genetic
Programming
Jorge Blasco∗, Agustin Orfila†, Arturo Ribagorda‡
Computer Science Department
Carlos III University of Madrid
Legane´s, Spain
{jbalis∗, adiaz†, arturo‡}@inf.uc3m.es
Abstract—One of the central areas in network intrusion
detection is how to build effective systems that are able to
distinguish normal from intrusive traffic. In this paper we
explore the use of Genetic Programming (GP) for such a
purpose. Although GP has already been studied for this task,
the inner features of network intrusion detection have been
systematically ignored. To avoid the blind use of GP shown in
previous research, we guide the search by means of a fitness
function based on recent advances on IDS evaluation. For the
experimental work we use a well-known dataset (i.e. KDD-
99) that has become a standard to compare research although
its drawbacks. Results clearly show that an intelligent use of
GP achieves systems that are comparable (and even better in
realistic conditions) to top state-of-the-art proposals in terms
of effectiveness, improving them in efficiency and simplicity.
Keywords-intrusion detection; genetic programming; effi-
ciency; effectiveness;
I. INTRODUCTION
Intrusion Detection is the process of monitoring and
analyzing the activity of a network or a computer system in
order to detect possible intrusion attacks [1]. The design of a
network intrusion detection system (NIDS) is determined by
a set of decisions about raw data obtaining, event detection,
analysis rules, data storage and response procedures. Focus-
ing on the analysis techniques, artificial intelligence has been
widely explored, including approaches based on machine
learning, neural networks, evolutionary computation, etc.
In this paper we focus on the improvement of automatic
generation of analysis rules using Genetic Programming
(GP). Our research tries to improve the results on effective-
ness found in the literature while enhancing the efficiency
and semantics of the solutions. Thus, in terms of effective-
ness, the way we approach to GP provides IDS analysis
rules that at least achieve the same level of state-of-the-
art proposals. In addition, our system clearly outperforms
classical machine learning algorithms when the dataset is
adapted to have a more realistic prevalence of attacks. For
a NIDS is not only important the effectiveness but also
the efficiency. In intensive network usage environments IDS
must analyze huge amounts of data. If the NIDS is not
fast enough it will begin to drop the analysis of packets. In
this regard, the solutions provided by algorithms like C4.5
[2] generate wide and deep trees which may produce an
overhead on the analysis process. On the contrary, GP trees
can be quite simple being able to process more information
in less time. Furthermore, the use of an appropriate function
set for GP individuals results in analysis rules that provide
better knowledge about the nature of the attacks. Other
paradigms involve specialized structures which are nothing
like computer programs (e.g. weight vectors for neural
networks) what constrains the semantics of the generated
rules.
In addition to the use of traditional metrics to evaluate our
IDS, we have also used a recently presented metric (i.e. Cid)
[3] proposed specifically for the intrusion detection domain.
Our recent research [4] has proved that domain-aware GP
is able to produce efficient and easy to understand rules for
IDS, specifically to detect probe attacks. In our efforts to
provide an exhaustive comparison of the efficiency of our
approach it was necessary to use a dataset which covered
a wide range of different attacks types. To evaluate our
approach we have used the well known KDD-99 dataset.
Although this dataset has been criticized in some studies
[5], [6] due to questions such as its unrealistic prevalence of
attacks or its uncertain relation with reality, it is still used
in recent publications [7] and is considered as a standard
benchmark that most research uses to measure effectiveness.
The remainder of this paper is structured as follows.
Section II briefly reviews the basics of genetic programming.
Section III reviews related work done in the area. Section
IV describes the design of the proposed system. Section
V describes de KDD-99 Dataset. Then, Section VI shows
the experimental setup, results and discussion. Finally, last
section summarizes the main conclusions and future work.
II. GENETIC PROGRAMMING BASICS
Genetic Programming is a supervised search technique
devised by John R. Koza in 1992 [8]. GP is somehow similar
to Genetic Algorithms (GA), but instead of using chromo-
somes to encode the solution, it uses computer programs
represented as trees. IDS are itself computer programs and
its size and structure is not known in advance. Consequently,
the use of GP is more appropriate than GA for the problem
2010 International Conference on Availability, Reliability and Security
978-0-7695-3965-2/10 $26.00 © 2010 IEEE
DOI 10.1109/ARES.2010.53
327

Authors on Mendeley

Readership Statistics

10 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
30% Ph.D. Student
 
20% Student (Master)
 
20% Associate Professor
by Country
 
30% Spain
 
20% United States
 
20% Italy

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in