TaintEraser: protecting sensitive data leaks using application-level taint tracking
- ISSN: 01635980
- DOI: 10.1145/1945023.1945039
Abstract
We present TaintEraser, a new tool that tracks the movement of sensitive user data as it flows through off-the-shelf applications. TaintEraser uses application-level dynamic taint analysis to let users run applications in their own environment while preventing unwanted information exposure. It is made possible by techniques we developed for accurate and efficient tainting: (1) Semantic-aware instruction-level tainting is critical to track taint accurately, without explosion or loss. (2) Function summaries provide an interface to handle taint propagation within the kernel and reduce the overhead of instruction-level tracking. (3) On-demand instrumentation enables fast loading of large applications. Together, these techniques let us analyze large, multi-threaded, networked applications in near real-time. In tests on Internet Explorer, Yahoo! Messenger, and Windows Notepad, Taint- Eraser generated no false positives and instrumented fewer than 5% of the executed instructions while precisely scrubbing user-defined sensitive data that would otherwise have been exposed to restricted output channels. Our research provides the first evidence that it is viable to track taint accurately and efficiently for real, interactive applications running on commodity hardware.
Author-supplied keywords
TaintEraser: protecting sensitive data leaks using application-level taint tracking
Using Application-Level Taint Tracking
David (Yu) Zhu
∗
UC Berkeley
yuzhu@cs.berkeley.edu
Jaeyeon Jung
Intel Labs Seattle
jaeyeon.jung@intel.com
Dawn Song
UC Berkeley
dawnsong@cs.berkeley.edu
Tadayoshi Kohno
University of Washington
yoshi@cs.washington.edu
David Wetherall
University of Washington & Intel Labs Seattle
djw@cs.washington.edu
ABSTRACT
We present TaintEraser, a new tool that tracks the move-
ment of sensitive user data as it flows through off-the-shelf
applications. TaintEraser uses application-level dynamic taint
analysis to let users run applications in their own environ-
ment while preventing unwanted information exposure. It is
made possible by techniques we developed for accurate and
efficient tainting: (1) Semantic-aware instruction-level taint-
ing is critical to track taint accurately, without explosion or
loss. (2) Function summaries provide an interface to handle
taint propagation within the kernel and reduce the overhead
of instruction-level tracking. (3) On-demand instrumenta-
tion enables fast loading of large applications. Together,
these techniques let us analyze large, multi-threaded, net-
worked applications in near real-time. In tests on Internet
Explorer, Yahoo! Messenger, and Windows Notepad, Taint-
Eraser generated no false positives and instrumented fewer
than 5% of the executed instructions while precisely scrub-
bing user-defined sensitive data that would otherwise have
been exposed to restricted output channels. Our research
provides the first evidence that it is viable to track taint
accurately and efficiently for real, interactive applications
running on commodity hardware.
Categories and Subject Descriptors
D4.6 [Operating Systems:]: Security & Protection—In-
formation Flow Controls
General Terms
Security, Privacy, Performance, Design
Keywords
Sensitive data protection, dynamic information flow tracking
1. INTRODUCTION
Media and research papers regularly report privacy vulnera-
bilities in which sensitive information is leaked to the public
domain. Some of these incidents are due to malware that
maliciously exfiltrate data, but many are not. A confidential
document of the House ethics committee stored in a staffer’s
∗This work was mostly done when the first author was at
Intel Labs Seattle.
machine accidentally found its way out to peer-to-peer net-
works [26]. British companies banned the use of the Google
Desktop application on employees’ machine due to the secu-
rity risk to corporate data when the search across comput-
ers feature is enabled [13]. Tom-Skype tracks personal chat
messages [12]. An innocuous text editor may unintentionally
cause information leak via temporary copies [6].
These examples highlight the fact that legitimate commer-
cial off-the-shelf applications may expose user information in
ways that their users neither expect, nor appreciate. Many
of these leaks are not the result of a malicious intent by
the author of the application but rather are a consequence
of misconfiguration of these applications or unexpected side
effects. Unfortunately, it is not feasible for users to check
whether every single configuration option of the applications
they run meets their privacy expectations, company guide-
lines, or any other policies they have for the handling of
sensitive information. Consider Alice, who uses a messenger
client on a company laptop and wants to be sure that her
messages are not recorded in a log that may surface later.
Alice’s messenger client may archive messages locally, which
are then copied to the company’s online backup server. Or
consider Bob, who uses a text editor to view a confidential
document and wants to be sure that no temporary copies
are left around on his laptop that may be recovered later if
the laptop is lost. Bob’s text editor may create temporary
copies of a working document in a directory that is accessi-
ble by any application. Applications typically do not offer
“privacy” options for such concerns, or if they do then it re-
quires onerous searching to enable the right option. With
the lack of accessible solutions, users must simply hope for
the best once they chose to use an application.
Our long term goal is to develop systems that will help users
enforce when and where applications reveal their sensitive
data without requiring the users to fully understand the
workings or configuration options of the applications. To
be valuable in practice, we have four requirements. First,
we must be able to run on real applications. These may
be large, multi-threaded and make heavy use of operating
system services. Second, we must run in the user’s own
environment without the need for application source code.
Requiring either source code or testing environments would
greatly limit applicability. Third, while we do not target ma-
licious applications that intentionally avoid our techniques,
142
the output stream. Encryption is one important transfor-
mation that is often used with sensitive data, and there are
many other ways that information is encoded in practice.
Fourth, our system must be fast enough to run networked
and interactive applications. Heavyweight mechanisms can
introduce delays that cause timeouts in client-server pro-
grams (e.g., web browsers) and prevent normal use. These
goals are ambitious, but they define what we believe to be
a highly usable and desirable system.
In this paper, we present TaintEraser, a tool that blocks
unintended data exposure to the network or to the local
file system by applications. TaintEraser is a significant step
towards our long term goal. It implements dynamic taint
analysis on applications by using dynamic binary translation
with Pin [11]. On this base, we develop a set of techniques
to track where user information goes accurately and with
enough run-time efficiency that it is plausible for end users
to run the tool. TaintEraser supports simple and intuitive
privacy policies; a user specifies sensitive input data (e.g.,
keystrokes or files) to monitor and TaintEraser blocks any
data derived from the input data from escaping to output
channels that are specified as restricted (e.g., file system,
network socket). To do this, TaintEraser monitors applica-
tions’ output to the network and the local file system and
replaces sensitive bytes with randomly chosen bytes.
As researchers pointed out earlier [15, 22, 23], the accuracy
of taint tracking is a key challenge. While the idea is con-
ceptually simple and has been widely applied to other prob-
lems [16,19,31], there are corner cases in which some instruc-
tions (e.g., MOV, AND) or situations (e.g., system call side-
effects) need special taint-propagation logic. As we found
when testing on Windows on a PC, failure to handle these
cases quickly results in taint explosion or loss of taint with
real applications. After finding and overcoming these spe-
cial cases, we have been able to interpose on system calls
and precisely track information between keystroke, file and
network socket input and output.
Our main contribution is developing new approaches for
taint tracking that are simultaneously accurate and efficient,
and that are broadly applicable in the context of every-
day applications. Our contribution manifests in the Taint-
Eraser tool that we built for empowering users to prevent
unexpected personal information leakage while running off-
the-shelf software packages. Specifically, TaintEraser em-
bodies the following mechanisms for accurate and efficient
personal information tracking:
Semantic-aware Taint Propagation Rules. At the in-
struction level, specialized taint routines are prescribed for
uncommon data movements (e.g., the REP MOV string copy
instruction). At the function level, pre-generated models
propagate taint to capture important side-effects for calls
into the kernel. Our evaluation results show that Taint-
Eraser is highly accurate, generating no false positives when
analyzing real world applications running on Windows. It
successfully detected exfiltration of sensitive data even when
some of these applications transformed or encrypted the
data before sending them out or writing them to a file.
Multi-level Instrumentation. We find that function sum-
maries speed up taint tracking eight to nine times com-
pared to instruction-level instrumentation, and their impact
is greatly multiplied by using them for frequently called func-
tions and as part of our approach to instrument the appli-
cation but not the operating system. Our on-demand in-
strumentation dramatically reduces the number of instruc-
tions that are analyzed compared to typical load-time in-
strumentation, e.g., to 5% for Internet Explorer. Together,
our techniques provide almost an order of magnitude speed
up for our experiments. Combined with our approach of ap-
plication rather than whole system instrumentation, Taint-
Eraser reaches a level of efficiency that makes taint tracking
plausible for the first time for real interactive applications
on commodity hardware.
The rest of the paper is organized as follows: §2 describes
our approach. §3 describes the techniques we developed for
accurate taint-tracking real applications. §4 presents our
optimization techniques and the performance microbench-
marks. §5 shows application evaluation results. §6 reviews
related work. We discuss remaining challenges in §7, and
then concludes in §8.
2. APPROACH
The privacy policy we want to enable is simple and intuitive.
A user specifies sensitive input data to an application and
the output channels to which the application is restricted
from exposing the sensitive data. Alternatively, a user may
specify the output channels through which the sensitive data
should be allowed to leave by the application. In either case,
TaintEraser monitors the application as it runs and enforces
the policy by (a) tracking how the sensitive input data is
processed by the application and (b) interposing when the
application attempts to write the sensitive data to restricted
output channels.
TaintEraser differs from existing tools that delete traces left
by web browsers (e.g., cookies, browser cache files) such as
Privacy Eraser [18] or limit network access (e.g., two-way
firewalls such as Little Snitch [17]). These other tools pro-
vide limited all-or-nothing protection, e.g., either block or
allow network access, or delete all or none of the files in a
temp directory. They are only useful if the user knows the
exact content that needed to be blocked. TaintEraser is able
to, for example, block network access when it derives from
sensitive data and allow it otherwise.
Moreover, simply inspecting output content for leaks quickly
fails as applications may transform input data however they
want. Previous works [4,31] have successfully used dynamic
taint analysis to track how sensitive input data is accessed
and propagated by using whole system simulation. How-
ever, instrumenting the whole system incurs significant per-
formance and analysis overheads, making this work valu-
able for offline forensic analysis but unsuitable for inspecting
interactive network applications, let alone providing online
protection.
Hence, we apply application-level dynamic taint analysis for
efficiently tracking sensitive data through an off-the-shelf
application. However, application-level taint analysis loses
track of information flow when the application moves tainted
143
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


