Sign up & Download
Sign in

Facebook Immune System

by Tao Stein, Erdong Chen, Karan Mangla
Defense (2011)

Abstract

The carboxyl tail of the human mu opioid receptor was shown to bind the carboxyl terminal region of human filamin A, a protein known to couple membrane proteins to actin. Results from yeast two-hybrid screening were confirmed by direct protein-protein binding and by coimmunoprecipitation of filamin and mu opioid receptor from cell lysates. To investigate the role of filamin A in opioid receptor function and regulation, we used the melanoma cell line M2, which does not express filamin A, and its subclone A7, transfected with human filamin A cDNA. Both cell lines were stably transfected with cDNA encoding myc-tagged human mu opioid receptor. Fluorescent studies, using confocal microscopy, provided evidence that filamin and mu opioid receptors were extensively colocalized on the membranes of filamin-expressing melanoma cells. The immunostaining of mu opioid receptors indicated that the lack of filamin had no detectable effect on membrane localization of the receptors. Moreover, mu opioid receptors function normally in the absence of filamin A, as evidenced by studies of opioid binding and DAMGO inhibition of forskolin-stimulated adenylyl cyclase. However, agonist-induced receptor down-regulation and functional desensitization were virtually abolished in cells lacking filamin A. The level of internalized mu-opioid receptors, after 30-min exposure to agonist, was greatly reduced, suggesting a role for filamin in mu opioid receptor trafficking. During these studies, we observed that forskolin activation of adenylyl cyclase was greatly reduced in filamin-lacking cells. An even more unexpected finding was the ability of long-term treatment with d-Ala2,N-Me-Phe4,Gly5-ol-enkephalin of M2 cells, containing mu opioid receptors, to restore normal forskolin activation. The mechanism of this effect is currently unknown. It is postulated that the observed effects on mu opioid receptor regulation by filamin A and, by implication, of the actin cytoskeleton may be the result of its role in mu opioid receptor trafficking.

Cite this document (BETA)

Available from research.microsoft.com
Page 1
hidden

Facebook Immune System

Facebook Immune System
Tao Stein
Facebook
stein@fb.com
Erdong Chen
Facebook
rogerc@fb.com
Karan Mangla
Facebook
kmangla@fb.com
Abstract
Popular Internet sites are under attack all the time from phishers,
fraudsters, and spammers. They aim to steal user information and
expose users to unwanted spam. The attackers have vast resources
at their disposal. They are well-funded, with full-time skilled labor,
control over compromised and infected accounts, and access to
global botnets. Protecting our users is a challenging adversarial
learning problem with extreme scale and load requirements. Over
the past several years we have built and deployed a coherent,
scalable, and extensible realtime system to protect our users and
the social graph. This Immune System performs realtime checks
and classifications on every read and write action. As of March
2011, this is 25B checks per day, reaching 650K per second at peak.
The system also generates signals for use as feedback in classifiers
and other components. We believe this system has contributed to
making Facebook the safest place on the Internet for people and
their information. This paper outlines the design of the Facebook
Immune System, the challenges we have faced and overcome, and
the challenges we continue to face.
Keywords Machine Learning, Adversarial Learning, Security,
Social Network Security
1. Introduction
The Facebook social graph comprises hundreds of millions of users
and their relationships with each other and with objects such as
events, pages, places, and apps. The graph is an attractive target
for attackers. Attackers target it to gain access to information or to
influence actions. They can attack the graph in two ways: either by
compromising existing graph nodes or by injecting new fake nodes
and relationships. Protecting the graph is a challenging problem
with both algorithmic and systems components.
Algorithmically, protecting the graph is an adversarial learning
problem. Adversarial learning differs from more traditional learn-
ing in one important way: the attacker creating the pattern does not
want the pattern to be learned. For many learning problems the pat-
tern creator wants better learning and the interests of the learner
and the pattern creator are aligned and the pattern creator may even
be oblivious to the efforts of the learner. For example, the receiver
of ranked search results wants better search ranking and may be
oblivious to the efforts being done to improve ranking. The pattern
creator will not actively work to subvert the learning and may even
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
EuroSys Social Network Systems (SNS) 2011 April 10, 2011, Salzburg
Copyright c© 2011 ACM Jan 1, 2011. . . $10.00
voluntarily give hints to aid learning. In adversarial learning, the
attacker works to hide patterns and subvert detection. To be effec-
tive, the system must respond fast and target the features that are
most expensive for the attacker to change, being careful also not to
overfit on the superficial features that are easy for the attacker to
change.
Attacker Detects
Defender Responds Begin Attack
Initial Detection
Attacker Controls
Defender Controls
Attack Detect
Defense Mutate
Figure 1. The adversarial cycle.
This diagram shows the adversarial cycle. The attacker controls the upper
phases and the defender controls the bottom phases. In both Attack and De-
tect phases the attacker is only limited by its own resources and global rate-
limits. During Attack, the attack has not yet been detected and is largely
unfettered. During Detect, the attack has been detected but the system is
forming a coherent response. This includes the time to train a model or ex-
pand the set of bad attack vectors and upload the patterns to online classifier
services. The response can form continuously with some models being de-
ployed earlier than others. During Defense, the attack has been rendered in-
effective. The attacker may eventually detect this and begin Mutate to work
around the defense mechanism. This cycle can repeat indefinitely. The de-
fender seeks to shorten Attack and Detect while lengthening Defense and
Mutate. The attacker seeks the opposite, to shorten the bottom phases while
lengthening Attack and Detect. This cycle illustrates why detection and re-
sponse latencies are so important for effective defense.
Adversarial learning is a cyclical process shown in Figure 1.
An example will make the process more concrete. Several years
ago phishers would attack the graph using spammy messages with
predictable subject lines. The messages included links to phishing
sites. They sent out these messages repeatedly from compromised
accounts to hundreds of friends of the compromised accounts. The
predictable text patterns and volume made these straightforward to
detect and filter. To overcome this filtering, attackers obfuscated by
inserting punctuation, HTML tags, and images into their messages.
As well, the attackers varied their distribution channels to evade de-
tection. The system responded to this by using mark as spam feed-
Page 2
hidden
back features, IP address features, and also the presence of other
unusual obfuscation signatures. The IP address features were an ef-
fective response, forcing the attackers to employ botnets and global
proxy networks to attack. This particular attack was destroyed, but
similar attacks happen all the time. The adversarial cycle is an arms
race. As detection and protection improve, the attackers in turn im-
prove their methods.
The phase lengths in the adversarial learning cycle can vary and
the goal of an effective defense is to lengthen phases the defender
controls while shortening phases the attacker controls. Attacks are
destroyed by making them unprofitable. In terms of Figure 1, this
means lengthening the bottom phases Defense and Mutate while
shortening the upper phases Attack and Detect. Together, this raises
the attacker’s costs to participate in the cycle and lowers their
returns.
Improving the cycle requires work across all phases using mul-
tiple techniques. The Attack phase is shortened by improving de-
tection methods: better user feedback, and more effective unsuper-
vised learning and anomaly detection. The Detect phase is short-
ened by improving methods for quickly building and deploying
new features and models. The defense phases Defense and Mutate
are lengthened by making it harder for the attacker to detect and
adapt their exploit to the defensive response. The Defense phase is
lengthened by obscuring responses and subverting attack canaries.
For example, a suspected phishing account can see their phishing
messages, but others including the target victim cannot. The Mu-
tate phase is lengthened by emphasizing features that are more ex-
pensive for the attacker to change. For example, using IP-related
features instead of text patterns if the former are more expensive
for the attacker to adapt. Shortening the length of attacker control
is especially critical to defending the graph because graph-based
distribution is viral and can grow exponentially. The Immune Sys-
tem is designed to shorten the phases controlled by attackers and
lengthen the phases under defensive control.
The Immune System has two advantages over the attacker; user
feedback and global knowledge. User feedback is both explicit and
implicit. Explicit feedback includes mark as spam or reporting a
user. Implicit feedback includes deleting a post or rejecting a friend
request. Both implicit and explicit feedback are valuable and cen-
tral to defense. In addition to user feedback, the system has knowl-
edge of aggregate patterns and what is normal and unusual. This
facilitates anomaly detection, clustering, and feature aggregation.
The system uses these two advantages in both detection and re-
sponse.
Some of the more traditional machine learning metrics do not
really apply to adversarial learning in our context, or at least are
less important. For example, classifier accuracy. The graph is be-
ing defended across multiple simultaneous attacks using finite re-
sources. The goal is to protect the graph against all attacks rather
than to maximize the accuracy of any one specific classifier. The
opportunity cost of refining a model for one attack may be increas-
ing the detection and response on other attacks. For these reasons,
response and detection latencies can be more important than pre-
cision and recall. Even considering an attack in isolation, spend-
ing more time improving a classifier can be problematic for two
reasons. Damage accumulates quickly. More accounts get compro-
mised and more users get exposed to spam. A 2% false-positive
rate today on an attack affecting 1,000 users is better than a 1%
false-positive rate tomorrow on the same attack affecting 100,000
users. As well, as time progresses attacks mutate and training data
becomes less relevant. Done is often better than perfect.
Protecting the graph differs from email anti-abuse in several
ways. Users tend to trust Facebook identities more than email. The
Facebook user interface has many different channels for communi-
cation, and new ones emerge as the interface evolves. Communi-
cation can move seamlessly between these different channels. As
well, communication on Facebook tends strongly toward the real-
time. These differences all have implications for the design of the
Immune System and will be discussed more in Section 3.
Attacks can mutate quickly, in some cases the Defense and
Mutate phases may be short, and the system must be ready to detect
and respond. Due to the viral distribution of the graph, substantial
damage can happen quickly when the attacker is in control. The
need for a fast response has motivated much of the design described
in this paper. A basic design principle is that all updates are online,
classifier services and feature data providers adapt to new attacks
without going offline or restarting. Responding to new attacks is a
part of normal operation and normal operation should never require
a service restart.
In addition to responding quickly, it is important to target fea-
tures that are difficult for the attacker to detect (Defense) and
change (Mutate). This differs from traditional machine-learning
where the features are chosen solely on how strongly they improve
the accuracy of the classifier. In general, some features of an attack
are much easier and cheaper for an attacker to change than others.
For example, text patterns versus IPs.
The main components of the Immune System will be described
in detail in Section 4. To summarize, these are:
• Classifier services: Classifier services are networked interfaces
to an abstract classifier interface. That abstraction is imple-
mented by a number of different machine-learning algorithms,
using standard object-oriented methods. Implemented algo-
rithms include random forests, SVMs, logistic regression, and a
version of boosting, among other algorithms. Classifier services
are always online and are designed never to be restarted.
• Feature Extraction Language (FXL): FXL is the dynamically-
executed language for expressing features and rules. It is a
Turing-complete, statically-typed functional language. Feature
expressions are checked then loaded into classifier services and
feature tailers1 online, without service restart.
• Dynamic model loading: Models are built on features and
those features are either basic or derived via an FXL expression.
Like features, models are loaded online into classifier services,
without service or tailer restart. As well, many of classifier
implementations support online training.
• Policy Engine: Policies organize classification and features to
express business logic, policy, and also holdouts for evaluat-
ing classifier performance. Policies are boolean-valued FXL
expressions that trigger responses. Policies execute on top of
machine-learned classification and feature data providers. Re-
sponses are system actions. There are numerous responses.
Some examples are blocking an action, requiring an authenti-
cation challenge, and disabling an account.
• Feature Loops (Floops): Classification generates all kinds of
information and associations during feature extraction. The
floops take this data, aggregate it, and make it available to the
classifiers as features. The floops also incorporate user feed-
back, data from crawlers2, and query data from the data ware-
house.
The scalability and latency requirements are challenging. As of
early 2011, there are 25B user read and write actions per day. About
20% of these are write operations. In many cases classification
needs to be synchronous with user actions. In those cases latency
1 Tailers are stream-processing programs. They read and aggregate log file
data in realtime. They are called tailers because they tail logs.
2 Crawlers are processes that take URLs and fetch their web contents.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

12 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
50% Ph.D. Student
 
17% Student (Master)
 
8% Student (Bachelor)
by Country
 
50% United States
 
8% Italy
 
8% China