Sign up & Download
Sign in

Understanding the network-level behavior of spammers

by Anirudh Ramachandran, Nick Feamster
ACM SIGCOMM Computer Communication Review (2006)

Abstract

This paper studies the network-level behavior of spammers, including: IP address ranges that send the most spam, common spamming modes (e.g., BGP route hijacking, bots), how persistent across time each spamming host is, and characteristics of spamming botnets. We try to answer these questions by analyzing a 17-month trace of over 10 million spam messages collected at an Internet "spam sinkhole", and by correlating this data with the results of IP-based blacklist lookups, passive TCP fingerprinting information, routing information, and botnet "command and control" traces.We find that most spam is being sent from a few regions of IP address space, and that spammers appear to be using transient "bots" that send only a few pieces of email over very short periods of time. Finally, a small, yet non-negligible, amount of spam is received from IP addresses that correspond to short-lived BGP routes, typically for hijacked prefixes. These trends suggest that developing algorithms to identify botnet membership, filtering email messages based on network-level properties (which are less variable than email content), and improving the security of the Internet routing infrastructure, may prove to be extremely effective for combating spam.

Author-supplied keywords

Cite this document (BETA)

Available from Nick Feamster's profile on Mendeley.
Page 1
hidden

Understanding the network-level behavior of spammers

Understanding the Network-Level Behavior of Spammers
Anirudh Ramachandran and Nick Feamster
College of Computing, Georgia Tech
{avr, feamster}@cc.gatech.edu
ABSTRACT
This paper studies the network-level behavior of spammers, includ-
ing: IP address ranges that send the most spam, common spamming
modes (e.g., BGP route hijacking, bots), how persistent across time
each spamming host is, and characteristics of spamming botnets.
We try to answer these questions by analyzing a 17-month trace
of over 10 million spam messages collected at an Internet “spam
sinkhole”, and by correlating this data with the results of IP-based
blacklist lookups, passive TCP fingerprinting information, routing
information, and botnet “command and control” traces.
We find that most spam is being sent from a few regions of
IP address space, and that spammers appear to be using transient
“bots” that send only a few pieces of email over very short peri-
ods of time. Finally, a small, yet non-negligible, amount of spam
is received from IP addresses that correspond to short-lived BGP
routes, typically for hijacked prefixes. These trends suggest that de-
veloping algorithms to identify botnet membership, filtering email
messages based on network-level properties (which are less vari-
able than email content), and improving the security of the Internet
routing infrastructure, may prove to be extremely effective for com-
bating spam.
Categories and Subject Descriptors
C.2.0 [Computer Communication Networks]: Security and pro-
tection; C.2.3 [Computer Communication Networks]: Network
operations – network management
General Terms
Design, Management, Reliability, Security
Keywords
spam, botnet, BGP, network management, security
1. Introduction
This paper presents a study of the network-level characteristics
of unsolicited commercial email (“spam”). Much attention has been
devoted to studying the content of spam, but comparatively little at-
tention has been paid to spam’s network-level properties. Conven-
tional wisdom often asserts that most of today’s spam comes from
botnets, and that a large fraction of spam comes from Asia; a few
studies have attempted to quantify some of these characteristics [5].
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGCOMM’06,September11-15,2006,Pisa,Italy.
Copyright2006ACM1-59593-308-5/06/0009...$5.00.
Unfortunately, little is known about how much spam comes from
botnets versus other techniques (e.g., short-lived route announce-
ments, open relays, etc.), the geographic and topological distribu-
tion of where most spam originates (in terms of Internet Service
Providers, countries, and IP address space), the extent to which dif-
ferent spammers use the same network resources, the stationarity
of these properties over time, and so forth. A primary goal of this
paper is to shed some light on these relatively unstudied questions.
Beyond merely exposing spammers’ behavior, gathering infor-
mation about the network-level behavior of spam could be a ma-
jor asset for designing spam filters that are based on spammers’
network-level behavior (presuming that the network-level charac-
teristics of spam are sufficiently different than those of legitimate
mail, a question we explore further in Section 4). Whereas spam-
mers have the flexibility to alter the content of emails—both per-
recipient and over time as users update spam filters—they have far
less flexibility when it comes to altering the network-level proper-
ties of the spam they send. It is far easier for a spammer to alter the
content of email messages to evade spam filters than it is for that
spammer to change the ISP, IP address space, or botnet from which
spam is sent.
Towards the goal of developing techniques that will help in the
design of more robust network-level spam filters, this paper char-
acterizes the network-level behavior of spammers as observed at
a large spam sinkhole domain, which stores complete logs of all
spam received from August 2004 through December 2005. We
perform a joint analysis of the data collected at this sinkhole with
an archive of BGP route advertisements as heard from the receiving
network, traces from the “command and control” of a Bobax botnet,
and traces of legitimate email from the mail server logs of a large
email service provider. Although many aspects of mail headers can
be forged, we base our analysis strictly on properties of the sender
that are difficult to forge (e.g., IP addresses that made connections
to our mail servers, passive TCP fingerprints, corresponding route
announcements, etc.).
We draw the following surprising conclusions from our study:
• The vast majority of received spam arrives from a few con-
centrated portions of IP address space (Section 4). Spam
filtering techniques currently make no assumptions about
the distribution of spam across IP address space. In a re-
lated area, many worm propagation models assume a uni-
form distribution of vulnerable hosts across IP address space
(e.g., [29]). In contrast, we find that the vast majority
of spamming hosts—and, perhaps not coincidentally, most
Bobax-infected hosts—lie within a small number of IP ad-
dress space regions. Unfortunately, with a few exceptions
(e.g., 60.* – 70.*), most legitimate email comes from the
same regions of IP address space, which suggests that, in
general, effective filtering based on network-level properties
may require determining second-order characteristics (e.g.,
botnet membership).
291
Page 2
hidden
• Most received spam is sent from Windows hosts, each of
which sends a relatively small volume of spam to our do-
main (Section 5). Most bots send a relatively small volume
of spam to our sinkhole (i.e., less than 100 pieces of spam
over 17 months), and about three-quarters of them are only
active for a single time period of less than two minutes (65%
of them send all spam in a “single shot”).
• A small set of spammers continually use short-lived route an-
nouncements to remain untraceable (Section 6). A small por-
tion of spam is sent by sophisticated spammers, who briefly
advertise IP prefixes, establish a connection to the victim’s
mail relay, and withdraw the route to that IP address space
after spam is sent. Anecdotal evidence has suggested that
spammers might be exploiting the routing infrastructure to
remain untraceable [1, 30]; this paper quantifies and docu-
ments this activity for the first time. To our surprise, we dis-
covered a new class of attack, where spammers attempt to
evade detection by hijacking large IP address blocks (e.g.,
/8s) and sending spam from widely dispersed “dark” (i.e.,
unused or unallocated) IP addresses within this space.
Beyond these findings, this paper’s joint analysis of several
datasets provides a unique window into the network-level charac-
teristics of spam. To our knowledge, this paper presents the first
study that examines the interplay between spam, botnets, and the
Internet routing infrastructure.
We acknowledge that our spam corpus represents only a sin-
gle vantage point, and, as such, drawing general conclusions about
Internet-wide spam is not possible. Our goal is not to present con-
clusive figures about Internet-wide characteristics of spam. Indeed,
the data we have collected is a small, localized sample of all spam
traffic, and our statistics may not be reflective of Internet-wide char-
acteristics. However, the spam we have collected represents an in-
teresting dataset as it reflects the complete set of spam emails re-
ceived by a single Internet domain. This dataset exposes spamming
as a typical network operator for some Internet domain might also
witness it. This unique view can help us better understand whether
the features of spam that any single network operator observes
could be useful in developing more effective filtering techniques.
With these goals in mind and an understanding of the context
of our data, we offer the following additional observations on the
implications of our results for the design of more effective tech-
niques for spam mitigation, which we revisit in more detail in Sec-
tion 7. First, the ability to trace the identities of spammers hinges
on securing the routing infrastructure. Second, the distribution of
spam and botnet activity across IP space suggests that, for some IP
address ranges and networks, spam filters might monitor network-
wide spam arrival patterns and attribute higher levels of suspicion
to spam originating from networks with higher spam activity. Given
the highly variable nature of the content of spam messages, incor-
porating general network-level properties of spam into filters may
ultimately provide significant gains over more traditional methods
(e.g., content-based filtering), both through increased robustness
and the ability to stop spam closer to its source.
The rest of this paper is organized as follows. Section 2 pro-
vides background on spamming and an overview of previous re-
lated work. In Section 3, we describe our data collection techniques
and the datasets we used in our analysis. In Section 4, we study the
distribution of spammers, spamming botnets, and legitimate mail
senders across IP address space. Section 5 presents our findings
regarding the relationship between the spam received at our sink-
holes and known spamming bots. Section 6 examines the extent to
which spammers use IP addresses that are generally unreachable
(e.g., using short-lived BGP route announcements) to send spam
untraceably. Based on our findings, Section 7 offers positive rec-
ommendations for designing more effective mitigation techniques.
We conclude in Section 8.
2. Background and Related Work
This section provides an overview of techniques both for sending
and for mitigating spam and discusses related work in these areas.
2.1 Spam: Methods and Mitigation
In this section, we offer background on the main techniques used
by spammers to send email, as well as some of the more commonly
used mitigation techniques.
2.1.1 Spamming methods
Spammers use various techniques to send large volumes of mail
while attempting to remain untraceable. We describe several of
these techniques, beginning with “conventional” methods and pro-
gressing to more intricate techniques.
Direct spamming. Spammers may purchase upstream connec-
tivity from “spam-friendly ISPs”, which turn a blind eye to the
activity. Occasionally, spammers buy connectivity and send spam
from ISPs that do not condone this activity and are forced to change
ISPs. Ordinarily, changing from one ISP to another would require
a spammer to renumber the IP addresses of their mail relays. To
remain untraceable and avoid renumbering headaches, spammers
sometimes obtain a pool of dispensable dialup IP addresses, send
outgoing traffic from a high-bandwidth connection the IP address
spoofed to appear as if it came from the dialup connection, and
proxy the reverse traffic through the dialup connection back to the
spamming hosts [25].
Open relays and proxies. Open relays are mail servers that
allow unauthenticated Internet hosts to connect and relay email
through them. Originally intended for user convenience (e.g., to let
users send mail from a particular relay while they are traveling or
otherwise in a different network), open relays have been exploited
by spammers due to the anonymity and amplification offered by
the extra level of indirection. It appears that the widespread deploy-
ment and use of blacklisting techniques have all but extinguished
the use of open relays and proxies to send spam [21, 26].
Botnets. Conventional wisdom suggests that the majority of
spam on the Internet today is sent by botnets—collections of ma-
chines acting under one centralized controller [3, 4, 31]. The
W32/Bobax (“Bobax”) worm (of which there are many variants)
exploits the DCOM and LSASS vulnerabilities on Windows sys-
tems [18], allows infected hosts to be used as a mail relay, and at-
tempts to spread itself to other machines affected by the above vul-
nerabilities, as well as over email. This paper studies the network-
level properties of spam sent by Bobax drones. Agobot and SDBot
are two other bots purported to send spam [12].
BGP spectrum agility. This study has discovered a new type of
cloaking mechanism—BGP “spectrum agility”—whereby spam-
mers briefly announce (often hijacked) IP address space from
which they send spam and the routes to that IP address space once
the spam has been sent. Although we have observed this behavior
informally several years ago [6] and subsequent anecdotal evidence
has suggested that spammers may use this technique [1], our study
thoroughly documents this activity, and further finds that spammers
may be using spectrum agility to complement spamming by other
methods.
2.1.2 Mitigation techniques
Techniques for mitigating spam are as varied as techniques to
send spam, and most existing techniques have significant draw-
292

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

43 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
42% Ph.D. Student
 
19% Student (Master)
 
12% Assistant Professor
by Country
 
37% United States
 
7% China
 
7% United Kingdom

Groups

Web Page Pubs