Real-time voice communication over the internet using packet path diversity
Proceedings of the ninth ACM international conference on Multimedia MULTIMEDIA 01 (2001)
- ISBN: 1581133944
- DOI: 10.1145/500203.500205
Available from portal.acm.org
or
Available from portal.acm.org
Page 1
Real-time voice communication over the internet using packet path diversity
Real-time Voice Communication over the Internet Using
Packet Path Diversity
Yi J. Liang, Eckehard G. Steinbach, and Bernd Girod
Information Systems Laboratory, Department of Electrical Engineering
Stanford University, Stanford, CA 94305, USA
{yiliang, steinb, bgirod}@stanford.edu
ABSTRACT
The quality of real-time voice communication over best-
effort networks is mainly determined by the delay and loss
characteristics observed along the network path. Excessive
playout buffering at the receiver is prohibitive and signifi-
cantly delayed packets have to be discarded and considered
as late loss. We propose to improve the tradeoff among
delay, late loss rate, and speech quality using multi-stream
transmission of real-time voice over the Internet, where mul-
tiple redundant descriptions of the voice stream are sent over
independent network paths. Scheduling the playout of the
received voice packets is based on a novel multi-stream adap-
tive playout scheduling technique that uses a Lagrangian
cost function to trade delay versus loss. Experiments over
the Internet suggest largely uncorrelated packet erasure and
delay jitter characteristics for different network paths which
leads to a noticeable path diversity gain. We observe signif-
icant reductions in mean end-to-end latency and loss rates
as well as improved speech quality when compared to FEC
protected single-path transmission at the same data rate. In
addition to our Internet measurements, we analyze the per-
formance of the proposed multi-path voice communication
scheme using the ns network simulator for different network
topologies, including shared network links.
Keywords
Packet path diversity, multi-stream transmission, multi-path
transmission, adaptive playout scheduling, multiple descrip-
tion coding, forward error correction, voice over IP.
1. INTRODUCTION
High quality real-time voice communication over the Inter-
net requires low end-to-end delay and low loss rate. Best
effort networks such as today’s Internet, however, are char-
acterized by highly varying delay and loss characteristics
that contradict our Quality-of-Service (QoS) requirements.
One widely accepted way to reduce the effective packet loss
observed by the receiver is to add redundancy to the voice
stream at the sender. This is possible without imposing too
much extra network load since the data rate of voice traffic
is very low when compared with other types of data and
multimedia traffic.
A common method to add redundancy is forward error cor-
rection (FEC), which transmits redundant information of
each packet in subsequent packets [6][5][16]. In this sender-
based scheme, a lost packet can be recovered from the copies
piggybacked in subsequent packets should they be received
successfully. In this scheme, loss recovery is performed at
the cost of higher latency [5]. In many cases, however, the
loss of successive packets is correlated, due to the way pack-
ets are dropped as networks get congested and router buffers
are becoming full. A packet loss may usually be followed by
a burst of loss, which significantly decreases the efficiency of
FEC schemes [4]. In order to combat burst loss, redundant
information has to be added into temporally distant pack-
ets, which introduces even higher delay. Hence, the repair
capability of FEC is limited by the delay budget.
Another sender-based loss recovery technique, interleaving,
which does not increase the data rate of transmission, also
faces the same dilemma. The efficiency of loss recovery de-
pends on over how many packets the source packet is inter-
leaved and spread. Again, the wider the spread, the higher
the introduced delay [19].
In this work we look at the problem of reliable voice com-
munication over best-effort networks from a different an-
gle. Instead of restricting our transmission to one network
path, we send multiple redundant descriptions of the voice
stream over different independent paths and take advantage
of their largely uncorrelated loss and delay characteristics.
As a result, the probability of a negative disturbance, such
as packet erasure or increasing delay, impacting all channels
at the same time will be small.
In previous literature, path diversity has been proposed for
reliable video communication over lossy networks using mul-
tiple state encoding, where odd and even frames of a video
sequence are transmitted on different network paths [2]. It
has been observed in [2] that for multi-path transmission
the end-to-end application sees a virtual average path which
exhibits a smaller variability in quality than any of the in-
dividual paths. Multi-path transmission also alleviates the
problem that the default path determined by the routing
algorithm is not optimum, which might be often the case
Packet Path Diversity
Yi J. Liang, Eckehard G. Steinbach, and Bernd Girod
Information Systems Laboratory, Department of Electrical Engineering
Stanford University, Stanford, CA 94305, USA
{yiliang, steinb, bgirod}@stanford.edu
ABSTRACT
The quality of real-time voice communication over best-
effort networks is mainly determined by the delay and loss
characteristics observed along the network path. Excessive
playout buffering at the receiver is prohibitive and signifi-
cantly delayed packets have to be discarded and considered
as late loss. We propose to improve the tradeoff among
delay, late loss rate, and speech quality using multi-stream
transmission of real-time voice over the Internet, where mul-
tiple redundant descriptions of the voice stream are sent over
independent network paths. Scheduling the playout of the
received voice packets is based on a novel multi-stream adap-
tive playout scheduling technique that uses a Lagrangian
cost function to trade delay versus loss. Experiments over
the Internet suggest largely uncorrelated packet erasure and
delay jitter characteristics for different network paths which
leads to a noticeable path diversity gain. We observe signif-
icant reductions in mean end-to-end latency and loss rates
as well as improved speech quality when compared to FEC
protected single-path transmission at the same data rate. In
addition to our Internet measurements, we analyze the per-
formance of the proposed multi-path voice communication
scheme using the ns network simulator for different network
topologies, including shared network links.
Keywords
Packet path diversity, multi-stream transmission, multi-path
transmission, adaptive playout scheduling, multiple descrip-
tion coding, forward error correction, voice over IP.
1. INTRODUCTION
High quality real-time voice communication over the Inter-
net requires low end-to-end delay and low loss rate. Best
effort networks such as today’s Internet, however, are char-
acterized by highly varying delay and loss characteristics
that contradict our Quality-of-Service (QoS) requirements.
One widely accepted way to reduce the effective packet loss
observed by the receiver is to add redundancy to the voice
stream at the sender. This is possible without imposing too
much extra network load since the data rate of voice traffic
is very low when compared with other types of data and
multimedia traffic.
A common method to add redundancy is forward error cor-
rection (FEC), which transmits redundant information of
each packet in subsequent packets [6][5][16]. In this sender-
based scheme, a lost packet can be recovered from the copies
piggybacked in subsequent packets should they be received
successfully. In this scheme, loss recovery is performed at
the cost of higher latency [5]. In many cases, however, the
loss of successive packets is correlated, due to the way pack-
ets are dropped as networks get congested and router buffers
are becoming full. A packet loss may usually be followed by
a burst of loss, which significantly decreases the efficiency of
FEC schemes [4]. In order to combat burst loss, redundant
information has to be added into temporally distant pack-
ets, which introduces even higher delay. Hence, the repair
capability of FEC is limited by the delay budget.
Another sender-based loss recovery technique, interleaving,
which does not increase the data rate of transmission, also
faces the same dilemma. The efficiency of loss recovery de-
pends on over how many packets the source packet is inter-
leaved and spread. Again, the wider the spread, the higher
the introduced delay [19].
In this work we look at the problem of reliable voice com-
munication over best-effort networks from a different an-
gle. Instead of restricting our transmission to one network
path, we send multiple redundant descriptions of the voice
stream over different independent paths and take advantage
of their largely uncorrelated loss and delay characteristics.
As a result, the probability of a negative disturbance, such
as packet erasure or increasing delay, impacting all channels
at the same time will be small.
In previous literature, path diversity has been proposed for
reliable video communication over lossy networks using mul-
tiple state encoding, where odd and even frames of a video
sequence are transmitted on different network paths [2]. It
has been observed in [2] that for multi-path transmission
the end-to-end application sees a virtual average path which
exhibits a smaller variability in quality than any of the in-
dividual paths. Multi-path transmission also alleviates the
problem that the default path determined by the routing
algorithm is not optimum, which might be often the case
Page 2
according to [17].
In the context of delay-sensitive applications such as interac-
tive VoIP, the novelty and key point of this work lies in the
fact that we explicitly take advantage of the largely uncor-
related characteristics of the delay variation (also known as
jitter) on multiple network paths using an adaptive multi-
stream playout scheduling technique. Packet loss in such
applications is a result of not only packet erasure, but also
delay jitter, which greatly impairs communication quality.
Due to the stringent delay budget and the need to output
speech periodically and continuously, packets experiencing
sudden high delay have to be discarded at the receiving
end if they arrive later than the scheduled playout deadline
(which results in late loss). With multi-stream voice trans-
mission along different network paths we have now more
freedom to trade off delay, late loss, and speech reconstruc-
tion quality. We formulate this tradeoff as a Lagrangian cost
function where we can vary the relative importance of these
quantities.
The multiple streams to be delivered via different paths are
formed by multiple description coding (MDC), which gen-
erates multiple descriptions of the source signal of equal im-
portance. These descriptions can be decoded independently
at the receiver. If all descriptions are received, the source
signal can be reconstructed in full quality. If we receive only
a subset of the descriptions, the quality of the reconstruc-
tion is degraded, but is still better than the quality result-
ing from losing all descriptions. Depending on the MDC
scheme selected, the overall data rate of the payload does
not necessarily increase as a result of transmitting multiple
streams. The data rate only increases if we desire redun-
dancy between the multiple streams. A small increase in
data rate with the use of FEC has been widely accepted for
speech communication and we therefore compare our scheme
of transmitting two MDC streams with a standard scheme
that uses FEC protected single-stream transmission at the
same payload data rate.
In order to maximize the benefits of path diversity we have
to select paths that exhibit largely uncorrelated jitter and
loss characteristics. Sending streams along different routers
from source to destination naturally leads to path diversity
which could include streams traversing different ISPs or even
streams being sent in different directions around the globe.
With today’s Internet protocols, the path a packet takes
across the Internet is a function of its source and destina-
tion IP addresses as well as the entries of the routing tables
involved. Selecting a specific path for a packet is largely
unsupported in today’s infrastructure. As discussed in [2],
IPv4 source routing is usually turned off within the Inter-
net for security reasons. More promising is to implement
path diversity by means of an overlay network that con-
sists of relay nodes [2],[1]. Here, packets can be sent along
different routes as being encapsulated into IP packets that
have the addresses of different relay nodes as their destina-
tion. At the relay nodes, packets are forwarded to other
relay nodes such that the packets from different description
streams travel along as few common links as possible. In
the context of a peer-to-peer framework, every peer could
serve as a relay node for voice traffic, potentially leading
to many different paths a voice stream could take from its
E O-E . . .Stream 1
Stream 2
(a)
1 2 1 3 2 . . .
FEC
MDC
0
(b)
E O-E E O-E
O E-O . . .O E-O O E-O
Figure 1: Source encoding: a) MDC; b) single-
stream with FEC.
source to its destination. With the next-generation IP pro-
tocol IPv6, the source node has a great amount of control
over each packet’s route. IPv6’s loose source routing (LSR)
allows packets to be sent via specified intermediate nodes.
This source routing feature of IPv6 will make future imple-
mentation of multi-stream transmission with path diversity
even simpler.
This paper is organized as follows. We first present the
employed multiple description coding scheme used to pro-
duce two redundant voice streams that can be sent across
two different paths. In Section 3, we introduce our receiver
playout scheduling algorithm for multiple streams. Section 4
presents multi-path measurements performed in the Inter-
net. Using the measured traces we compare single-path
and multi-path transmissions and show that considerable
improvements can be obtained for voice transmission with
packet path diversity. In Section 5, we analyze the per-
formance of the proposed multi-path voice communication
scheme more systematically using a network simulator for
different network topologies and varying network load.
2. MULTIPLEDESCRIPTIONCODINGOF
VOICE STREAMS
Various MDC schemes have been proposed for speech coding
[11][18][10]. For low complexity, we use the scheme described
in [10] to generate the two streams with redundancy at the
sender. The basic idea is to quantize the even samples in
finer resolution (e.g., PCM, 8 bits/sample) and the differ-
ence between adjacent even and odd samples in coarser res-
olution (e.g., ADPCM, 2 bits/sample), and then packetize
them into stream 1. For stream 2, we quantize even and odd
samples in the opposite way (Fig. 1(a)). Using this scheme,
the redundancy imposed when neglecting packet headers is
25%.
If a packet from one stream, e.g., stream 1, is dropped by the
network or discarded at the receiver due to its late arrival,
the chances are good that the corresponding packet from
stream 2 is successfully received and can be played out if
the packet erasure and delay on the two channels are largely
uncorrelated. Should that take place, the odd samples of the
source signal can be reconstructed in full resolution, while
the even samples are reproduced at a coarser resolution. The
overall speech quality is degraded with quantization noise,
but is still better than the quality when losing both packet.
In order to make a fair comparison to previous work, we
compare our proposed packet path diversity voice communi-
In the context of delay-sensitive applications such as interac-
tive VoIP, the novelty and key point of this work lies in the
fact that we explicitly take advantage of the largely uncor-
related characteristics of the delay variation (also known as
jitter) on multiple network paths using an adaptive multi-
stream playout scheduling technique. Packet loss in such
applications is a result of not only packet erasure, but also
delay jitter, which greatly impairs communication quality.
Due to the stringent delay budget and the need to output
speech periodically and continuously, packets experiencing
sudden high delay have to be discarded at the receiving
end if they arrive later than the scheduled playout deadline
(which results in late loss). With multi-stream voice trans-
mission along different network paths we have now more
freedom to trade off delay, late loss, and speech reconstruc-
tion quality. We formulate this tradeoff as a Lagrangian cost
function where we can vary the relative importance of these
quantities.
The multiple streams to be delivered via different paths are
formed by multiple description coding (MDC), which gen-
erates multiple descriptions of the source signal of equal im-
portance. These descriptions can be decoded independently
at the receiver. If all descriptions are received, the source
signal can be reconstructed in full quality. If we receive only
a subset of the descriptions, the quality of the reconstruc-
tion is degraded, but is still better than the quality result-
ing from losing all descriptions. Depending on the MDC
scheme selected, the overall data rate of the payload does
not necessarily increase as a result of transmitting multiple
streams. The data rate only increases if we desire redun-
dancy between the multiple streams. A small increase in
data rate with the use of FEC has been widely accepted for
speech communication and we therefore compare our scheme
of transmitting two MDC streams with a standard scheme
that uses FEC protected single-stream transmission at the
same payload data rate.
In order to maximize the benefits of path diversity we have
to select paths that exhibit largely uncorrelated jitter and
loss characteristics. Sending streams along different routers
from source to destination naturally leads to path diversity
which could include streams traversing different ISPs or even
streams being sent in different directions around the globe.
With today’s Internet protocols, the path a packet takes
across the Internet is a function of its source and destina-
tion IP addresses as well as the entries of the routing tables
involved. Selecting a specific path for a packet is largely
unsupported in today’s infrastructure. As discussed in [2],
IPv4 source routing is usually turned off within the Inter-
net for security reasons. More promising is to implement
path diversity by means of an overlay network that con-
sists of relay nodes [2],[1]. Here, packets can be sent along
different routes as being encapsulated into IP packets that
have the addresses of different relay nodes as their destina-
tion. At the relay nodes, packets are forwarded to other
relay nodes such that the packets from different description
streams travel along as few common links as possible. In
the context of a peer-to-peer framework, every peer could
serve as a relay node for voice traffic, potentially leading
to many different paths a voice stream could take from its
E O-E . . .Stream 1
Stream 2
(a)
1 2 1 3 2 . . .
FEC
MDC
0
(b)
E O-E E O-E
O E-O . . .O E-O O E-O
Figure 1: Source encoding: a) MDC; b) single-
stream with FEC.
source to its destination. With the next-generation IP pro-
tocol IPv6, the source node has a great amount of control
over each packet’s route. IPv6’s loose source routing (LSR)
allows packets to be sent via specified intermediate nodes.
This source routing feature of IPv6 will make future imple-
mentation of multi-stream transmission with path diversity
even simpler.
This paper is organized as follows. We first present the
employed multiple description coding scheme used to pro-
duce two redundant voice streams that can be sent across
two different paths. In Section 3, we introduce our receiver
playout scheduling algorithm for multiple streams. Section 4
presents multi-path measurements performed in the Inter-
net. Using the measured traces we compare single-path
and multi-path transmissions and show that considerable
improvements can be obtained for voice transmission with
packet path diversity. In Section 5, we analyze the per-
formance of the proposed multi-path voice communication
scheme more systematically using a network simulator for
different network topologies and varying network load.
2. MULTIPLEDESCRIPTIONCODINGOF
VOICE STREAMS
Various MDC schemes have been proposed for speech coding
[11][18][10]. For low complexity, we use the scheme described
in [10] to generate the two streams with redundancy at the
sender. The basic idea is to quantize the even samples in
finer resolution (e.g., PCM, 8 bits/sample) and the differ-
ence between adjacent even and odd samples in coarser res-
olution (e.g., ADPCM, 2 bits/sample), and then packetize
them into stream 1. For stream 2, we quantize even and odd
samples in the opposite way (Fig. 1(a)). Using this scheme,
the redundancy imposed when neglecting packet headers is
25%.
If a packet from one stream, e.g., stream 1, is dropped by the
network or discarded at the receiver due to its late arrival,
the chances are good that the corresponding packet from
stream 2 is successfully received and can be played out if
the packet erasure and delay on the two channels are largely
uncorrelated. Should that take place, the odd samples of the
source signal can be reconstructed in full resolution, while
the even samples are reproduced at a coarser resolution. The
overall speech quality is degraded with quantization noise,
but is still better than the quality when losing both packet.
In order to make a fair comparison to previous work, we
compare our proposed packet path diversity voice communi-
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
6 Readers on Mendeley
by Discipline
17% Law
by Academic Status
50% Ph.D. Student
17% Senior Lecturer
17% Researcher (at a non-Academic Institution)
by Country
17% Malta
17% United Kingdom
17% Norway


