Can DNS-based blacklists keep up with bots
Available from
Nick Feamster's profile on Mendeley.
Page 1
Can DNS-based blacklists keep up with bots
Can DNS-Based Blacklists Keep Up with Bots?
Anirudh Ramachandran, David Dagon, and Nick Feamster
College of Computing, Georgia Institute of Technology
{avr, dagon, feamster}@cc.gatech.edu
1. Introduction
Many Internet Service Providers (ISPs), anti-virus com-
panies, and enterprise email vendors use Domain Name
System-based Blackhole Lists (DNSBLs) to keep track of
IP addresses that originate spam, so that future emails sent
from these IP addresses can be rejected out-of-hand. DNSBL
operators populate blocking lists based on complaints from
recipients of spam, who report the IP address of the relay
from which the unwanted email was sent. To be effective in
blocking spam, information in the blacklist must have the
following properties:
1. Completeness. The blacklist must contain a reasonable
fraction of all spamming IP addresses.
2. Responsiveness (i.e., low response time). We term the
period of time between when a host first starts send-
ing spam, and when it ultimately becomes listed the
response time. The blacklist must have a low response
time so that other recipients can subsequently block
spam originating from the respective IP addresses.
Despite the widespread use of DNSBLs, to our knowledge
there has not been a thorough evaluation of the effectiveness
of blackhole lists in blocking spam. Although our previous
work has briefly surveyed the completeness of DNSBLs for
various types of spamming techniques (specifically, botnets,
short-lived BGP routes) [5] at the time each piece of spam
was received, neither this study nor any other that we are
aware of have studied the response time of DNSBLs.
DNSBLs have proved to be an effective mechanism for
blocking spam when spammers were less agile (i.e., when
they sent spam from a smaller number of open relays). Pre-
vious studies, however, have suggested that spammers are
becoming increasingly agile, distributing the spam “work-
load” more widely across mail relays [4]. The recent rise of
botnets—large collections of compromised machines under
the control of a single controlling user—suggest that spam
is being sent from an increasingly larger set of IP addresses,
that the distribution of workload would have an even longer
tail, and that each spamming host is relatively transient (re-
cent work notes that most spamming bots are transient, at
least from the perspective of a single domain [5]). This tran-
sience implies that, for DNSBLs to be effective at all, they
must be responsive. This paper presents a preliminary evalu-
ation of the responsiveness of blacklists for a specific set of
CEAS 2006 - Third Conference on Email and Anti-Spam, July 27-28, 2006,
Mountain View, California USA
.
S−Day
Infection
Detection
Opportunity
Blacklisted
Response Time
Figure 1: A conceptual view of a spamming host’s life cycle.
spamming IP addresses that are known to come from a spam-
ming botnet that spreads via the “Bobax” vulnerability [1].
2. A Model of Responsiveness
Figure 1 presents a model that shows four distinct phases
of a host’s life-cycle as part of a spamming botnet. Although
we acknowledge that spam can certainly originate from un-
infected machines (e.g., rented machines for email market-
ing campaigns), we focus on studying the responsiveness of
DNSBLs for blocking spam from infected machines (i.e.,
likely botnet “zombies”), which send the vast majority of
spam on the Internet today [6]. Initially, the host becomes a
member of a spamming botnet; subsequently, the host begins
to send spam (listed as “S-day” in Figure 1). After some time,
the host’s activities are detected, investigated, and recorded,
which ultimately results in the host being blacklisted. Our
goal is to determine not only completeness, but also response
time, as shown in Figure 1, which is challenging given the
lack of any ground truth: validating the time at which a host
becomes infected or first sends spam is difficult, but we can
still estimate lower bounds on response time.
3. Data Collection
Two datasets—a trace of DNSBL lookups and a trace
of spamming botnet activity—allow us to establish a lower
bound on response time: the difference between the time the
host first becomes listed in the Spamhaus blacklist and the
first time a host appeared after November 17, 2005 (i.e.,
the time that we know the host has been infected). We
have packet captures of DNSBL queries to a mirror of the
Spamhaus blacklist [2] for November 29 and 30, 2005. This
mirror sees approximately 1/17 of all Spamhaus queries,
most of which originated from hosts in the south-eastern
United States (where the mirror is located). The domains
being queried, of course, represent the entire population of
spamming hosts. To derive some ground truth for hosts that
1
Anirudh Ramachandran, David Dagon, and Nick Feamster
College of Computing, Georgia Institute of Technology
{avr, dagon, feamster}@cc.gatech.edu
1. Introduction
Many Internet Service Providers (ISPs), anti-virus com-
panies, and enterprise email vendors use Domain Name
System-based Blackhole Lists (DNSBLs) to keep track of
IP addresses that originate spam, so that future emails sent
from these IP addresses can be rejected out-of-hand. DNSBL
operators populate blocking lists based on complaints from
recipients of spam, who report the IP address of the relay
from which the unwanted email was sent. To be effective in
blocking spam, information in the blacklist must have the
following properties:
1. Completeness. The blacklist must contain a reasonable
fraction of all spamming IP addresses.
2. Responsiveness (i.e., low response time). We term the
period of time between when a host first starts send-
ing spam, and when it ultimately becomes listed the
response time. The blacklist must have a low response
time so that other recipients can subsequently block
spam originating from the respective IP addresses.
Despite the widespread use of DNSBLs, to our knowledge
there has not been a thorough evaluation of the effectiveness
of blackhole lists in blocking spam. Although our previous
work has briefly surveyed the completeness of DNSBLs for
various types of spamming techniques (specifically, botnets,
short-lived BGP routes) [5] at the time each piece of spam
was received, neither this study nor any other that we are
aware of have studied the response time of DNSBLs.
DNSBLs have proved to be an effective mechanism for
blocking spam when spammers were less agile (i.e., when
they sent spam from a smaller number of open relays). Pre-
vious studies, however, have suggested that spammers are
becoming increasingly agile, distributing the spam “work-
load” more widely across mail relays [4]. The recent rise of
botnets—large collections of compromised machines under
the control of a single controlling user—suggest that spam
is being sent from an increasingly larger set of IP addresses,
that the distribution of workload would have an even longer
tail, and that each spamming host is relatively transient (re-
cent work notes that most spamming bots are transient, at
least from the perspective of a single domain [5]). This tran-
sience implies that, for DNSBLs to be effective at all, they
must be responsive. This paper presents a preliminary evalu-
ation of the responsiveness of blacklists for a specific set of
CEAS 2006 - Third Conference on Email and Anti-Spam, July 27-28, 2006,
Mountain View, California USA
.
S−Day
Infection
Detection
Opportunity
Blacklisted
Response Time
Figure 1: A conceptual view of a spamming host’s life cycle.
spamming IP addresses that are known to come from a spam-
ming botnet that spreads via the “Bobax” vulnerability [1].
2. A Model of Responsiveness
Figure 1 presents a model that shows four distinct phases
of a host’s life-cycle as part of a spamming botnet. Although
we acknowledge that spam can certainly originate from un-
infected machines (e.g., rented machines for email market-
ing campaigns), we focus on studying the responsiveness of
DNSBLs for blocking spam from infected machines (i.e.,
likely botnet “zombies”), which send the vast majority of
spam on the Internet today [6]. Initially, the host becomes a
member of a spamming botnet; subsequently, the host begins
to send spam (listed as “S-day” in Figure 1). After some time,
the host’s activities are detected, investigated, and recorded,
which ultimately results in the host being blacklisted. Our
goal is to determine not only completeness, but also response
time, as shown in Figure 1, which is challenging given the
lack of any ground truth: validating the time at which a host
becomes infected or first sends spam is difficult, but we can
still estimate lower bounds on response time.
3. Data Collection
Two datasets—a trace of DNSBL lookups and a trace
of spamming botnet activity—allow us to establish a lower
bound on response time: the difference between the time the
host first becomes listed in the Spamhaus blacklist and the
first time a host appeared after November 17, 2005 (i.e.,
the time that we know the host has been infected). We
have packet captures of DNSBL queries to a mirror of the
Spamhaus blacklist [2] for November 29 and 30, 2005. This
mirror sees approximately 1/17 of all Spamhaus queries,
most of which originated from hosts in the south-eastern
United States (where the mirror is located). The domains
being queried, of course, represent the entire population of
spamming hosts. To derive some ground truth for hosts that
1
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
5 Readers on Mendeley
by Discipline
by Academic Status
40% Ph.D. Student
20% Student (Master)
20% Student (Postgraduate)
by Country
40% United States
20% Japan
20% Argentina



