Fast monitoring of traffic subpopulations
- ISBN: 9781605583341
- DOI: 10.1145/1452520.1452551
Abstract
Network accounting, forensics, security, and performance monitoring applications often need to examine detailed traces from subsets of flows ("subpopulations"), where the application desires flexibility in specifying the subpopulation (e.g., to detect a portscan, the application must observe many packets between a source and a destination with one packet to each port). However, the dynamism and volume of network traffic on many high-speed links necessitates traffic sampling, which adversely affects subpopulation monitoring: because many subpopulations of interest to operators are low-volume flows, conventional sampling schemes (e.g., uniform random sampling) miss much of the subpopulation's traffic. Today's routers and network devices provide scant support for monitoring specific traffic subpopulations. This paper presents the design, implementation, and evaluation of FlexSample, a traffic monitoring engine that dynamically extracts traffic from subpopulations that operators define using conditions on packet header fields. FlexSample uses a fast, flexible counter array to provide rough estimates of packets' membership in respective subpopulations. Based on these coarse estimates, FlexSample then makes per-packet sampling decisions to sample proportionately from each subpopulation (as specified by a network operator), subject to an overall sampling constraint. We apply FlexSample to extract subpopulations such as port scans and traffic to high-degree nodes and find that it is able to capture significantly more packets from these subpopulations than conventional approaches.
Fast monitoring of traffic subpopulations
Anirudh Ramachandran, Srinivasan Seetharaman, Nick Feamster, and Vijay Vazirani
School of Computer Science, Georgia Tech
266 Ferst Drive, Atlanta, GA, USA
{avr,srini,feamster,vazirani}@cc.gatech.edu
ABSTRACT
Network accounting, forensics, security, and performance monitor-
ing applications often need to examine detailed traces from subsets
of flows (“subpopulations”), where the application requires flex-
ibility in specifying the subpopulation (e.g., to detect a portscan,
the application must observe many packets between a source and
a destination with one packet to each port). Unfortunately, the dy-
namism and volume of network traffic on many high-speed links
requires traffic sampling, which adversely affects subpopulation
monitoring: because many subpopulations of interest to operators
are low-volume flows, conventional sampling schemes (e.g., uni-
form random sampling) can miss much of the subpopulation’s traf-
fic. Today’s routers and network devices provide scant support for
monitoring specific traffic subpopulations.
This paper presents the design, implementation, and evaluation
of FlexSample, a traffic monitoring framework that dynamically ex-
tracts traffic from subpopulations that operators define using con-
ditions on packet header fields. FlexSample uses a fast, flexible
counter array to provide rough estimates of packets’ membership
in respective subpopulations. Based on these coarse estimates,
FlexSample then makes per-packet sampling decisions to sample
proportionately from each subpopulation (as specified by a net-
work operator), subject to an overall sampling constraint. We apply
FlexSample to extract subpopulations such as port scans and traffic
to high-degree nodes and find that it can capture significantly more
packets from these subpopulations than conventional approaches.
Categories and Subject Descriptors: C.2.3 [Computer-
Communication Networks]: Network Monitoring C.4 [Computer-
Communication Networks]: Measurement Techniques
General Terms: Algorithms, Design, Measurement, Security
Keywords: traffic subpopulations, traffic statistics, sampling,
counters, FlexSample
1. INTRODUCTION
Routers and other devices that monitor traffic on high-speed net-
works cannot collect and record accurate statistics based on every
packet in a traffic stream. Updates to statistics typically require ac-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
IMC’08, October 20–22, 2008, Vouliagmeni, Greece.
Copyright 2008 ACM 978-1-60558-334-1/08/10 ...$5.00.
cess to a large—and relatively slow—memory (such as DRAM).
They thus rely on packet sampling: selecting packets uniformly at
random from the packet stream. Uniform random sampling works
well for the traditional goals of billing and traffic engineering be-
cause these metrics require accurate estimates of only the “heavy-
hitter” flows, which are typically well-represented in sampled traf-
fic. Although many schemes have improved upon uniform random
sampling [14, 15, 43], heavy-hitter identification remains their pri-
mary focus.
Recently, however, operators have started to monitor traffic for
a broader range of applications: identifying P2P “supernodes”,
servers with many clients, infected computers (bots) engaged in
activities such as spam, denial-of-service, or portscanning. Much
of this traffic consists of small-volumed flows that have few pack-
ets per flow (“mouse” flows). Because techniques such as uniform
random sampling select more packets from heavy-hitter flows, they
will likely miss the presence of small-volume flows. Thus, these
conventional or “naïve” sampling techniques are less appropriate
for these new monitoring applications.
There is need for a technique that can sample packets not just
from heavy-hitter flows (such as uniform random sampling), but
also from other traffic subpopulations: subsets of flows that have
some common property or behavior. To capture more packets from
certain subpopulations, the packet selection algorithm needs to ef-
ficiently determine whether a packet in the stream belongs to that
subpopulation and bias its sampling rate to ultimately capture more
or less traffic from the respective subpopulation. To accomplish this
goal, we presents FlexSample, a framework and technique to bias
packet selection towards certain subpopulations of traffic, subject
to an overall sampling constraint. FlexSample provides expressive-
ness and flexibility beyond naïve packet sampling techniques used
by existing systems such as Cisco’s Sampled NetFlow [31]. The
key idea behind FlexSample is that high-speed network devices can
maintain approximate statistics using fast, space-efficient counters
to determine the subpopulation to which each packet belongs; these
counters can then be used to bias packet selection towards packets
that belong to desired subpopulations.
FlexSample allows an operator to specify the characteristics of
traffic subpopulations (e.g., packets from flows that have less than
10 packets, packets from a source IP address that has sent over
100 packets, etc.), as well as a sampling budget—the fraction of
the expected number of sampled packets obtained using the origi-
nal sampling rate—that is set apart for packets selected from each
subpopulation. FlexSample selects packets that belong to differ-
ent subpopulations at different instantaneous sampling probabili-
ties such that: (1) the overall expected number of packets selected
is equal to the expected number of packets selected by uniform
random sampling; (2) the fraction of sampled packets belonging
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



