A Container-based Fast Bridge for Virtual Routers on Commodity Hardware
- ISSN: 1930529X
- ISBN: 9781424456383
- DOI: 10.1109/GLOCOM.2010.5684322
Abstract
Virtual routers on commodity hardware are an attractive solution for service providers that look for extensibility, flexibility, reuse and low deployment cost. However, these routers still suffer from performance limitations due to the virtualization overhead and the commodity hardware architecture itself. In this paper, we first evaluate the baseline forwarding performance of virtual routers based on a Xen environment. Then, we show that the memory latency is the bottleneck. Hence, we propose a Fast Bridge that demultiplexes incoming packets and then transfers them to the destined guest machines. It constructs packets containers in the driver domain, based on their destination and their delay constraints. Then it transfers them as a unit to the guests. This allows transferring more packets and fastest memory access and results in a much better throughput with an acceptable guaranteed delay.
Author-supplied keywords
A Container-based Fast Bridge for Virtual Routers on Commodity Hardware
on Commodity Hardware
Manel Bourguiba, Kamel Haddadou and Guy Pujolle
LIP6, Pierre & Marie Curie University
104 Avenue du President Kennedy
75016 Paris, France
Email: {manel.bourguiba, kamel.haddadou, guy.pujolle}@lip6.fr
Abstract—Virtual routers on commodity hardware are an
attractive solution for service providers that look for extensibility,
flexibility, reuse and low deployment cost. However, these routers
still suffer from performance limitations due to the virtualization
overhead and the commodity hardware architecture itself. In this
paper, we first evaluate the baseline forwarding performance of
virtual routers based on a Xen environment. Then, we show that
the memory latency is the bottleneck. Hence, we propose a Fast
Bridge that demultiplexes incoming packets and then transfers
them to the destined guest machines. It constructs packets
containers in the driver domain, based on their destination and
their delay constraints. Then it transfers them as a unit to the
guests. This allows transferring more packets and fastest memory
access and results in a much better throughput with an acceptable
guaranteed delay.
I. INTRODUCTION
Since ISPs are competing to offer new services, there has
been a need for more flexible and extensible cost-effective
equipments. High-end routers are built on top of a specialized
and closed hardware and are hence difficult to extend and pro-
gram. On the other hand, software routers performing packet
processing on general purpose processors offer programma-
bility, extensibility and reuse and are therefore best suited for
a low deployment cost. However, these benefits come at the
cost of performance. In fact, these software routers suffer from
low forwarding rates, inherent to the commodity hardware
architecture. Fortunately, the recent advances in multicore
processors and smart network interfaces make software routers
on commodity hardware still viable [9].
Furthermore, many research efforts are exploring how multi-
ple independent networks can share the same substrate through
network virtualization [10] [11]. This would result in a more
efficient use of physical resources, an easier deployment of
old and new experimental protocols on the same architecture.
Thus, network virtualization can also be promoted as an
enabling technology to softly migrate from current to future
internet architecture. Besides, virtualization offers increased
flexibility since it allows on the fly network instantiation
and on demand topology modification. However, virtualizing
a network goes through virtualizing its components (links,
routers, etc.). In our work, we focus on router virtualization.
Then, the abilities provided by software routers combined
with recent virtualization technology advances enable multiple
instances of software routers to concurrently run over the same
physical machine. Several virtualization technologies allow the
partitioning of physical router resources at different levels to
enable multiple isolated virtual routers to coexist on the same
physical routing platform.
The performance limitations experienced by software
routers on general-purpose processors are even more dramatic
with the virtualization overhead. Nevertheless, the attractive
sides (isolation, flexibility, extensibility, reuse..) offered by vir-
tualization technologies on top of commodity hardware make
building high speed virtual routers on commodity hardware
worth exploring although very challenging.
In this paper, we focus on forwarding performance. Specif-
ically, we evaluate the forwarding performance of virtual
routers on general-purpose multi-core processors when vir-
tual routers share the networking device through a dedicated
driver domain. We argue that packets forwarding should be
performed in the different guest machines. We investigate the
bottleneck and show that the memory latency drastically limits
the forwarding rates of the guests. Then, we propose a new
bridging mechanism to transfer packets from the driver domain
to the guests. The new bridge transfers containers of packets
constructed based on the destination. This results in reduced
memory latency and then in a much better throughput. We
also study how the proposed bridge affects the flow’s delay.
The remainder of this paper is organized as follows: in
section 2 we describe our platform and state our problem.
In section 3, we introduce the related work. In section 4, we
evaluate our system basic forwarding performance and analyze
the limiting factor. In section 5 we propose a container-based
fast bridge that reduces memory access and then increases
forwarding rates. Section 6 concludes the paper and introduces
our future work.
II. VIRTUAL ROUTERS ON GENERAL-PURPOSE
MULTI-CORE PROCESSORS
The increasing capabilities of commodity hardware, com-
bined with the interesting levels of isolation and flexibility
offered by new virtualization technologies make virtual routers
on commodity hardware very attractive. However, tradeoffs
between performance, isolation and flexibility have to be made
depending on the adopted level of virtualization.
978-1-4244-5637-6/10/$26.00 ©2010 IEEE
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.
A. Platform architecture
1) Hardware architecture: The huge difference between
high-end routers and software routers speed is inherent to the
hardware architecture differences. In high-end routers, packets
are directly processed by special-purpose processors in the
line-card. Each processor stores packets in its own memory
and forwards them through a high speed dedicated switch
to outgoing interfaces. However, with commodity hardware
servers, Peripheral Component Interconnect Express (PCIe)
buses are used to transfer packets from the Network Interface
Card (NIC) to the main memory. Then, packets are accessed by
multiple cores sharing this same memory. The main memory
is accessed through the Front Side Bus (FSB). Fast L2 caches
are leveraged to store packets to be processed next to the cores,
in order to avoid costly memory accesses. The system we are
using is a Dell Poweredge 2950 server (Figure 1), with two
Intel Quad-core CPUs. Pairs of cores share the same L2 cache,
and all 8 cores share the same main DDR2 667Mhz memory.
2) Virtualization technology: A Virtual Machine Monitor
(VMM) enables multiple virtual machines to share the same
physical machine. Modern VMMs for commodity hardware
such as Vmware [2] and Xen [1] virtualize processors, memory
and I/O devices in software. Then, the VMM provides shared
access to the network interface and ensures isolation. Shared
access to devices is offered by a special virtual machine called
I/O domain or driver domain (Figure 2). The driver domain
hosts the real device drivers and is the only domain that
communicates with the devices. In addition to real device
drivers, virtual drivers are implemented in both the driver
domain and the guests. A virtual driver is split into: a netback
(in the driver domain) and a netfront (in the guest). Xen [1]
is a popular open source VMM for the x86 architecture that
uses this networking model. Guests in a Xen environment are
referred to as Unprivileged Domains (DomU). One special
privileged domain called Domain0 (Dom0) is responsible for
managing (creating, migrating, etc.) the other guest machines.
The driver domain can be either a DomU or the Dom0 itself.
Xen I/O architecture is representative of other systems that
use a dedicated machine to host the device drivers. Xen
offers a high level of isolation through its secure memory
sharing mechanism [12]. The driver domain is responsible
for protecting I/O access and is trusted to transfer traffic to
Fig. 2. Packets forwarding through the driver domain
the appropriate virtual machine. Moreover, high flexibility
is offered since it is possible to customize data planes by
modifying the network stack in the kernel. This is not possible
in application-level virtualization since only the application
level is virtualized and all virtual instances share the same
kernel. For our experiments, we used Xen as a virtualization
layer and Dom0 as the driver domain.
B. Forwarding path
Virtual routers on commodity hardware should provide
speed, flexibility and isolation and allow implementing both
custom control plane and data plane. Evaluating the overhead
of virtualizing the control plane is beyond the scope of this
paper. In this work, we focus on evaluating the virtualization
of the data plane. Forwarding the packets in the corresponding
guests is straightforward to the driver domain networking
model we are based on. In this case, all the virtual routers share
the same NIC. Upon the arrival of a packet, it is transferred
to the memory page of the device driver, which sends it
to the bridge. This latter demultiplexes the packet based
on its destination address and delivers it to the appropriate
netback. The netback transfers the packet to the corresponding
netfront over an I/O channel. A great level of transparency is
hence reached since the guests do not have to implement the
potentially buggy device drivers. Besides, since all the traffic
goes through the driver domain, this latter enjoys more traffic
monitoring abilities like admission control or establishing
priorities between the flows based on their types. Isolation
is also achieved since different flows are forwarded by dif-
ferent virtual routers with strongly protected memory spaces.
However, the I/O networking model performance experiences
limitations due to the overhead incurred by the communication
between the driver domain and the guests. We will further
analyze this limitation in the next section.
To overcome the limitation, two other possible forwarding
scenarios are explored in [6]. The first one consists in consol-
idating all the guests forwarding planes in the driver domain.
This avoids the costly communication between the driver
domain and the guests but lacks isolation and live migration.
In the second one, NICs are directly mapped to the guests
and packets are forwarded directly in the guests. High level of
isolation and better performance are then reached. However,
this scenario looses all the driver domain model advantages.
978-1-4244-5637-6/10/$26.00 ©2010 IEEE
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


