xCalls : Safe I / O in Memory Transactions
Available from www.cs.wisc.edu
Page 1
xCalls : Safe I / O in Memory Transactions
xCalls: Safe I/O in Memory Transactions
Haris Volos, Andres Jaan Tack, Neelam Goyal∗, Michael M. Swift, and Adam Welc+
University of Wisconsin–Madison, ∗Oracle, +Intel
{hvolos,tack,neelam,swift}@cs.wisc.edu, adam.welc@intel.com
Abstract
Memory transactions, similar to database transactions, allow
a programmer to focus on the logic of their program and let
the system ensure that transactions are atomic and isolated.
Thus, programs using transactions do not suffer from dead-
lock. However, when a transaction performs I/O or accesses
kernel resources, the atomicity and isolation guarantees from
the TM system do not apply to the kernel.
The xCall interface is a new API that provides transac-
tional semantics for system calls. With a combination of de-
ferral and compensation, xCalls enable transactional mem-
ory programs to use common OS functionality within trans-
actions.
We implement xCalls for the Intel Software Transactional
Memory compiler, and found it straightforward to convert
programs to use transactions and xCalls. In tests on a 16-core
NUMA machine, we show that xCalls enable concurrent I/O
and system calls within transactions. Despite the overhead
of implementing transactions in software, transactions with
xCalls improved the performance of two applications with
poor locking behavior by 16 and 70%.
Categories and Subject Descriptors D.4.1 [Operating Sys-
tems]: Process Management-Concurrency
General Terms Design, Languages, Performance
Keywords Concurrent programming, Transactional mem-
ory, xCalls, System calls, I/O
1. Introduction
As the microprocessor industry transitions to multithreaded
and multicore chips, programs must use multiple threads to
obtain the full performance of the underlying platform [Sut-
ter 2005]. Transactional memory (TM) [Herlihy 1992] has
garnered interest in research and industry as a mechanism
to simplify concurrent programming. Transactions allow a
programmer to declare a block of code atomic, and the TM
system ensures that (1) it executes to completion or not at
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
EuroSys’09, April 1–3, 2009, Nuremberg, Germany.
Copyright c© 2009 ACM 978-1-60558-482-9/09/04. . . $5.00
all, and (2) intermediate states of memory are not visible to
other transactions. Programmers are given the illusion that
transactions execute in a serial order, while the TM system
executes them concurrently. As a result, a transaction may
abort when it accesses the same data as another transac-
tion, because the two transactions cannot be serialized. This
prevents deadlock, which after decades of research remains
a problem for many applications [Jula 2008, Wang 2008].
Most major processor and OS vendors have expressed in-
terest in transactional memory [Microsoft Corp. 2008, Saha
2006b, Schlaeger 2008, Tremblay 2008].
While memory within a process is under the TM sys-
tem’s control, memory in the kernel may not be. As a re-
sult the atomicity and isolation properties are not automat-
ically enforced for changes to kernel data structures. For
example, file data changed by one transaction that subse-
quently aborts may be read by another transaction. In ad-
dition, most I/O operations, such as sending a packet, cannot
be reversed on abort. Analyses of multithreaded programs
written with locks show that system calls are a regular occur-
rence in critical sections [Baugh 2007, Swift 2008]. Forbid-
ding system calls in transactions reduces the utility of TM
and threatens its validity as a solution to real concurrency
problems [Cantrill 2008, Lu 2006].
Prior work on transactions has identified three mecha-
nisms for handling irreversible actions and system calls with
side-effects: (1) defer existing system calls and I/O until
commit [Baugh 2007, McDonald 2006, Rossbach 2007]; (2)
execute existing system calls during the transaction and re-
verse side effects on abort [Baugh 2007, Moravan 2006], or
(3) ensure that transactions with system calls always commit
(called irrevocable transactions) [Blundell 2007, Olszewski
2007, Spear 2008, Welc 2008]. However, each approach is
itself insufficient. When two operations are deferred, the OS
may not be able to guarantee that both will succeed, leading
to an inconsistent state. When system calls must be reversed
on abort, actions to reverse side effects may fail. To guaran-
tee theywill commit, irrevocable transactions cannot execute
concurrently, limiting performance.
This paper presents a new programming interface for
transactional memory programs called xCalls. The xCall in-
terface provides transactional access to common OS ser-
vices, such as file handling, communication, and threading.
For example, rather than calling the write() system call,
247
Haris Volos, Andres Jaan Tack, Neelam Goyal∗, Michael M. Swift, and Adam Welc+
University of Wisconsin–Madison, ∗Oracle, +Intel
{hvolos,tack,neelam,swift}@cs.wisc.edu, adam.welc@intel.com
Abstract
Memory transactions, similar to database transactions, allow
a programmer to focus on the logic of their program and let
the system ensure that transactions are atomic and isolated.
Thus, programs using transactions do not suffer from dead-
lock. However, when a transaction performs I/O or accesses
kernel resources, the atomicity and isolation guarantees from
the TM system do not apply to the kernel.
The xCall interface is a new API that provides transac-
tional semantics for system calls. With a combination of de-
ferral and compensation, xCalls enable transactional mem-
ory programs to use common OS functionality within trans-
actions.
We implement xCalls for the Intel Software Transactional
Memory compiler, and found it straightforward to convert
programs to use transactions and xCalls. In tests on a 16-core
NUMA machine, we show that xCalls enable concurrent I/O
and system calls within transactions. Despite the overhead
of implementing transactions in software, transactions with
xCalls improved the performance of two applications with
poor locking behavior by 16 and 70%.
Categories and Subject Descriptors D.4.1 [Operating Sys-
tems]: Process Management-Concurrency
General Terms Design, Languages, Performance
Keywords Concurrent programming, Transactional mem-
ory, xCalls, System calls, I/O
1. Introduction
As the microprocessor industry transitions to multithreaded
and multicore chips, programs must use multiple threads to
obtain the full performance of the underlying platform [Sut-
ter 2005]. Transactional memory (TM) [Herlihy 1992] has
garnered interest in research and industry as a mechanism
to simplify concurrent programming. Transactions allow a
programmer to declare a block of code atomic, and the TM
system ensures that (1) it executes to completion or not at
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
EuroSys’09, April 1–3, 2009, Nuremberg, Germany.
Copyright c© 2009 ACM 978-1-60558-482-9/09/04. . . $5.00
all, and (2) intermediate states of memory are not visible to
other transactions. Programmers are given the illusion that
transactions execute in a serial order, while the TM system
executes them concurrently. As a result, a transaction may
abort when it accesses the same data as another transac-
tion, because the two transactions cannot be serialized. This
prevents deadlock, which after decades of research remains
a problem for many applications [Jula 2008, Wang 2008].
Most major processor and OS vendors have expressed in-
terest in transactional memory [Microsoft Corp. 2008, Saha
2006b, Schlaeger 2008, Tremblay 2008].
While memory within a process is under the TM sys-
tem’s control, memory in the kernel may not be. As a re-
sult the atomicity and isolation properties are not automat-
ically enforced for changes to kernel data structures. For
example, file data changed by one transaction that subse-
quently aborts may be read by another transaction. In ad-
dition, most I/O operations, such as sending a packet, cannot
be reversed on abort. Analyses of multithreaded programs
written with locks show that system calls are a regular occur-
rence in critical sections [Baugh 2007, Swift 2008]. Forbid-
ding system calls in transactions reduces the utility of TM
and threatens its validity as a solution to real concurrency
problems [Cantrill 2008, Lu 2006].
Prior work on transactions has identified three mecha-
nisms for handling irreversible actions and system calls with
side-effects: (1) defer existing system calls and I/O until
commit [Baugh 2007, McDonald 2006, Rossbach 2007]; (2)
execute existing system calls during the transaction and re-
verse side effects on abort [Baugh 2007, Moravan 2006], or
(3) ensure that transactions with system calls always commit
(called irrevocable transactions) [Blundell 2007, Olszewski
2007, Spear 2008, Welc 2008]. However, each approach is
itself insufficient. When two operations are deferred, the OS
may not be able to guarantee that both will succeed, leading
to an inconsistent state. When system calls must be reversed
on abort, actions to reverse side effects may fail. To guaran-
tee theywill commit, irrevocable transactions cannot execute
concurrently, limiting performance.
This paper presents a new programming interface for
transactional memory programs called xCalls. The xCall in-
terface provides transactional access to common OS ser-
vices, such as file handling, communication, and threading.
For example, rather than calling the write() system call,
247
Page 2
code in a transaction calls x write(). Data written by this
call is not visible until the transaction commits.
This interface is guided by two design principles. First,
system calls should be executed as early as possible, but
no earlier. This ensures that errors from the OS are avail-
able early, to allow application recovery, but that irreversible
actions are deferred until the transaction commits. Second,
xCalls must expose all failures to the application, as do sys-
tem calls. This bypasses the intractable problem of handling
all low-level failures within the xCall API and ensures that
transactional programs can be as reliable as lock-based ones.
We implement xCalls purely at user-mode implementa-
tion to provide portability across systems and to avoid costly
kernel modifications. We find that the majority of system
calls can be accessed within transactions without support
from the operating system. Furthermore, our implementation
makes only general demands on the supporting transactional
memory system. While implemented for a single software
TM system, it could easily be implemented or ported to other
software TMs as well as proposed hardware TM systems.
Rather than making every system call transactional, the
xCall API handles the common cases of file access, com-
munication, and threading. xCalls provide isolation for ker-
nel resources with sentinels, which are revocable user-level
locks. A transaction acquires a sentinel when it accesses a
kernel resource, such as a file, through an xCall. Competing
threads must block until the transaction completes and re-
leases the sentinel. xCalls provide atomicity for system calls
through a combination of deferral, delaying execution un-
til the transaction commits, and compensation, calling back
into the kernel to undo the side effects of a previous call.
Rather than concealing the execution model, the xCall inter-
face specifies when every call executes, so programmers are
aware when the side effects of an xCall become visible. Fi-
nally, xCalls return errors after the transaction completes to
notify programs when a deferred system call or compensa-
tion fails.
We implement xCalls for prototype Intel Software Trans-
actional Memory (STM) compiler [Intel 2008] and apply it
to three applications: the Berkeley DB embedded database,
the BINDDNS server, and the XMMSmedia player.We find
that using transaction in place of locks is straightforward,
and that adapting existing code to use xCalls is simple.
In line with recent analysis of STM systems [Cascaval
2008], we find that software TM has non-trivial performance
overheads: the Intel STM can slow critical sections by up
to 1100%. Thus, we find that TM is best suited to improve
(1) the programmability of rarely executed critical sections,
where the overhead is small, and (2) the performance of
heavily contended critical sections where additional concur-
rency is possible. Programs with high transaction rates and
conflicting critical sections experience performance degra-
dation.
In tests on a 16-core NUMA machine, transactions with
xCalls improved performed better than the native transac-
tions provided by the Intel STM. For one workload, per-
formance decreased due to the overhead of the transactional
memory system. For another two workloads with heavy lock
contention, performance increased by 16 and 70%. With
hardware support to remove the overhead of transactions,
performance could be even better.
In the next section, we present a primer on transactional
memory. We follow with the design of xCalls in Section 3
and the interface in Section 4. We present experimental eval-
uation of the system in Section 5. We end the paper with
related work and conclusions.
2. Transactional Memory Overview
Transactional memory (TM) seeks to simplify multithreaded
programming by removing the need for explicit locks. In-
stead, a programmer can declare a section of code atomic,
and the TM system will enforce isolation (i.e., no access
to uncommitted data) and atomicity (i.e., all or nothing) for
the code, and resolve any conflicts that occur. Conflicts arise
when two concurrent transactions access the same memory
items and one transaction performs a write. Transactions can
execute concurrently if they do not conflict. Thus, they can
improve performance if critical sections rarely conflict. If a
conflict occurs, a resolution policy may stall or abort one
of the transactions to clear up the conflict. TM systems en-
force isolation by detecting when two transactions conflict,
and provide atomicity by buffering either old or new values
to allow the transaction to abort.
Transactional memory has been implemented in software
(an STM) [Dice 2006, Harris 2003, Saha 2006a] and can
be implemented in hardware (an HTM) [Hammond 2004,
Moore 2006], or with a combination [Baugh 2008, Damron
2006, Minh 2007]. Because a software TM system must
perform version management to store both old and new
values written and conflict detection on loads and stores,
performance may drop by 65% or more [Harris 2006, Saha
2006a]. Proposed hardware transactional memory systems
would perform these operations in hardware, so transactions
that do not conflict are executed with almost no overhead.
However, such hardware is not yet available.
Some TM systems implement irrevocable transactions
(also called inevitable transactions) that cannot abort. These
allow system calls to execute within transactions by remov-
ing the need to reverse the system call’s effects [Blundell
2007, Olszewski 2007, Spear 2008, Welc 2008]. However,
this approach only allows a single irrevocable transaction
at a time to prevent conflicts and thus limit concurrency.
In addition, irrevocable transactions may not abort them-
selves, which prevents the use of transactions for error
handling [Fetzer 2007] or for conditional blocking [Harris
1991].
248
call is not visible until the transaction commits.
This interface is guided by two design principles. First,
system calls should be executed as early as possible, but
no earlier. This ensures that errors from the OS are avail-
able early, to allow application recovery, but that irreversible
actions are deferred until the transaction commits. Second,
xCalls must expose all failures to the application, as do sys-
tem calls. This bypasses the intractable problem of handling
all low-level failures within the xCall API and ensures that
transactional programs can be as reliable as lock-based ones.
We implement xCalls purely at user-mode implementa-
tion to provide portability across systems and to avoid costly
kernel modifications. We find that the majority of system
calls can be accessed within transactions without support
from the operating system. Furthermore, our implementation
makes only general demands on the supporting transactional
memory system. While implemented for a single software
TM system, it could easily be implemented or ported to other
software TMs as well as proposed hardware TM systems.
Rather than making every system call transactional, the
xCall API handles the common cases of file access, com-
munication, and threading. xCalls provide isolation for ker-
nel resources with sentinels, which are revocable user-level
locks. A transaction acquires a sentinel when it accesses a
kernel resource, such as a file, through an xCall. Competing
threads must block until the transaction completes and re-
leases the sentinel. xCalls provide atomicity for system calls
through a combination of deferral, delaying execution un-
til the transaction commits, and compensation, calling back
into the kernel to undo the side effects of a previous call.
Rather than concealing the execution model, the xCall inter-
face specifies when every call executes, so programmers are
aware when the side effects of an xCall become visible. Fi-
nally, xCalls return errors after the transaction completes to
notify programs when a deferred system call or compensa-
tion fails.
We implement xCalls for prototype Intel Software Trans-
actional Memory (STM) compiler [Intel 2008] and apply it
to three applications: the Berkeley DB embedded database,
the BINDDNS server, and the XMMSmedia player.We find
that using transaction in place of locks is straightforward,
and that adapting existing code to use xCalls is simple.
In line with recent analysis of STM systems [Cascaval
2008], we find that software TM has non-trivial performance
overheads: the Intel STM can slow critical sections by up
to 1100%. Thus, we find that TM is best suited to improve
(1) the programmability of rarely executed critical sections,
where the overhead is small, and (2) the performance of
heavily contended critical sections where additional concur-
rency is possible. Programs with high transaction rates and
conflicting critical sections experience performance degra-
dation.
In tests on a 16-core NUMA machine, transactions with
xCalls improved performed better than the native transac-
tions provided by the Intel STM. For one workload, per-
formance decreased due to the overhead of the transactional
memory system. For another two workloads with heavy lock
contention, performance increased by 16 and 70%. With
hardware support to remove the overhead of transactions,
performance could be even better.
In the next section, we present a primer on transactional
memory. We follow with the design of xCalls in Section 3
and the interface in Section 4. We present experimental eval-
uation of the system in Section 5. We end the paper with
related work and conclusions.
2. Transactional Memory Overview
Transactional memory (TM) seeks to simplify multithreaded
programming by removing the need for explicit locks. In-
stead, a programmer can declare a section of code atomic,
and the TM system will enforce isolation (i.e., no access
to uncommitted data) and atomicity (i.e., all or nothing) for
the code, and resolve any conflicts that occur. Conflicts arise
when two concurrent transactions access the same memory
items and one transaction performs a write. Transactions can
execute concurrently if they do not conflict. Thus, they can
improve performance if critical sections rarely conflict. If a
conflict occurs, a resolution policy may stall or abort one
of the transactions to clear up the conflict. TM systems en-
force isolation by detecting when two transactions conflict,
and provide atomicity by buffering either old or new values
to allow the transaction to abort.
Transactional memory has been implemented in software
(an STM) [Dice 2006, Harris 2003, Saha 2006a] and can
be implemented in hardware (an HTM) [Hammond 2004,
Moore 2006], or with a combination [Baugh 2008, Damron
2006, Minh 2007]. Because a software TM system must
perform version management to store both old and new
values written and conflict detection on loads and stores,
performance may drop by 65% or more [Harris 2006, Saha
2006a]. Proposed hardware transactional memory systems
would perform these operations in hardware, so transactions
that do not conflict are executed with almost no overhead.
However, such hardware is not yet available.
Some TM systems implement irrevocable transactions
(also called inevitable transactions) that cannot abort. These
allow system calls to execute within transactions by remov-
ing the need to reverse the system call’s effects [Blundell
2007, Olszewski 2007, Spear 2008, Welc 2008]. However,
this approach only allows a single irrevocable transaction
at a time to prevent conflicts and thus limit concurrency.
In addition, irrevocable transactions may not abort them-
selves, which prevents the use of transactions for error
handling [Fetzer 2007] or for conditional blocking [Harris
1991].
248
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
8 Readers on Mendeley
by Discipline
by Academic Status
75% Ph.D. Student
13% Researcher (at a non-Academic Institution)
13% Professor
by Country
50% China
25% United States
13% United Kingdom



