Packet loss recovery for streaming video
12th International Packet Video Workshop (2002)
Available from
Nick Feamster's profile on Mendeley.
or
Abstract
While there is an increasing demand for streaming video applications on the Internet, various network characteristics make the deployment of these applications more challenging than traditional TCP-based applications like email and the Web. Packet loss can be detrimental to compressed video with interdependent frames because errors potentially propagate across many frames. While latency requirements do not permit retransmission of all lost data, we leverage the characteristics of MPEG-4 to...
Available from
Nick Feamster's profile on Mendeley.
Page 1
Packet loss recovery for streaming video
Packet Loss Recovery for Streaming Video
Nick Feamster and Hari Balakrishnan
M.I.T. Laboratory for Computer Science
Cambridge, MA 02139
feamster,hari @lcs.mit.edu
http://nms.lcs.mit.edu/projects/videocm/
ABSTRACT
While there is an increasing demand for streaming video applica-
tions on the Internet, various network characteristics make the de-
ployment of these applications more challenging than traditional
TCP-based applications like email and the Web. Packet loss can be
detrimental to compressed video with interdependent frames be-
cause errors potentially propagate across many frames. While la-
tency requirements do not permit retransmission of all lost data,
we leverage the characteristics of MPEG-4 to selectively retrans-
mit only the most important data in the bitstream. When latency
constraints do not permit retransmission, we propose a mechanism
for recovering this data using postprocessing techniques at the re-
ceiver. We quantify the effects of packet loss on the quality of
MPEG-4 video, develop an analytical model to explain these ef-
fects, present a system to adaptively deliver MPEG-4 video in the
face of packet loss and variable Internet conditions, and evaluate
the effectiveness of the system under various network conditions.
1. INTRODUCTION
Streaming media is becoming increasingly prominent on the
Internet. Although some progress has been made in media de-
livery, today’s solutions (e.g., RealPlayer and Windows Media
Player) [35, 43] are proprietary, inflexible, and do not provide
the user with a pleasant viewing experience. The lack of an
open framework hampers innovative research, particularly in the
area of video delivery that adapts to changing network conditions.
While today’s streaming applications are closed and proprietary,
the emerging MPEG-4 standard is gaining increasing acceptance
and appears to be a promising open standard for Internet video [13,
17, 20, 28, 31, 37].
In this paper, we describe a system that enables the adaptive uni-
cast delivery of streaming MPEG-4 video by responding to varying
network conditions. This paper primarily focuses on techniques to
deal with packet losses, which are common on the Internet.
Inter-frame video compression algorithms such as MPEG-4 ex-
ploit temporal correlation between frames to achieve high levels of
compression by independently coding reference frames, and repre-
senting the majority of the frames as the difference from each frame
and one or more reference frames. However, these algorithms suf-
fer from the well-known propagation of errors effect, because er-
rors due to packet loss in a reference frame propagate to all of the
dependent difference frames. The resulting stream is not even re-
silient to small amounts of packet loss. There is a fundamental
tradeoff between bandwidth efficiency (obtained by compression)
and error resilience (obtained by coding or retransmission). Inter-
frame compression schemes (such as MPEG-4) achieve significant
compression of bits in comparison to other schemes that do not ex-
ploit temporal correlation (such as motion JPEG [21]), but they are
also less resilient to packet loss because of the dependencies that
exist between data from different frames. While many methods
have been proposed to add redundancy to the bitstream to allow
for more effective error correction [9, 10, 51, 54], they also reduce
much of the gains garnered from compression.
Errors in reference frames are more detrimental than those in
derived frames due to propagation of errors and should therefore be
given a higher level of protection than other data in the bitstream.
One solution is to add redundancy to more important portions of
the bitstream, or to code more important portions of the stream at
a relatively higher bitrate [1, 25, 39, 50]; however, this approach
reduces compression gains and in many cases does not adequately
handle packet losses that occur in bursts.
Prior work has gathered experimental results that describe packet
loss characteristics for MPEG video and suggest the need for bet-
ter error recovery and concealment techniques [11]. Motivated by
prior analysis, as well a general model we have developed to ex-
plain the effects of packet loss on MPEG video, we have developed
a system that uses receiver-driven selective reliability in conjunc-
tion with receiver postprocessing to efficiently recover from packet
losses in reference frames.
Some researchers have argued that retransmission-based error
resilience is infeasible for Internet streaming because retransmis-
sion of lost data takes at least one additional round-trip time, which
may be too much latency to allow for adequate interactivity [10,
51, 54]. However, because of the nature of inter-frame compres-
sion, certain types of packet loss can be excessively detrimental to
the quality of the received bitstream. We show that such losses can
be corrected via retransmission without significantly increasing de-
lay, using only a few frames’ worth of extra buffering. In a stream-
ing system that transports video bitstreams with inter-dependent
frames, careful retransmission of lost packets provides significant
benefits by alleviating the propagation of errors.
While our system primarily focuses on the use of selective re-
transmission for packet loss recovery, we also show how our sys-
tem allows selective retransmission to be used in conjunction with
other error control and concealment techniques. When delay or
transient loss is prohibitively high, retransmission of lost pack-
ets may not always be feasible. In these circumstances, we pro-
pose a mechanism for recovering data in reference frames using
postprocessing at the receiver. Specifically, we propose a scheme
for reconstructing important missing data in reference frames us-
ing texture and motion information from surrounding frames. Our
motion-compensated recovery techniques allow partial recovery of
important data, limiting propagation of errors without imposing the
buffering constraints required for selective retransmissions.
To recover from packet losses, our system uses application-level
framing (ALF) [16]. Because dealing with data loss is application-
1
Nick Feamster and Hari Balakrishnan
M.I.T. Laboratory for Computer Science
Cambridge, MA 02139
feamster,hari @lcs.mit.edu
http://nms.lcs.mit.edu/projects/videocm/
ABSTRACT
While there is an increasing demand for streaming video applica-
tions on the Internet, various network characteristics make the de-
ployment of these applications more challenging than traditional
TCP-based applications like email and the Web. Packet loss can be
detrimental to compressed video with interdependent frames be-
cause errors potentially propagate across many frames. While la-
tency requirements do not permit retransmission of all lost data,
we leverage the characteristics of MPEG-4 to selectively retrans-
mit only the most important data in the bitstream. When latency
constraints do not permit retransmission, we propose a mechanism
for recovering this data using postprocessing techniques at the re-
ceiver. We quantify the effects of packet loss on the quality of
MPEG-4 video, develop an analytical model to explain these ef-
fects, present a system to adaptively deliver MPEG-4 video in the
face of packet loss and variable Internet conditions, and evaluate
the effectiveness of the system under various network conditions.
1. INTRODUCTION
Streaming media is becoming increasingly prominent on the
Internet. Although some progress has been made in media de-
livery, today’s solutions (e.g., RealPlayer and Windows Media
Player) [35, 43] are proprietary, inflexible, and do not provide
the user with a pleasant viewing experience. The lack of an
open framework hampers innovative research, particularly in the
area of video delivery that adapts to changing network conditions.
While today’s streaming applications are closed and proprietary,
the emerging MPEG-4 standard is gaining increasing acceptance
and appears to be a promising open standard for Internet video [13,
17, 20, 28, 31, 37].
In this paper, we describe a system that enables the adaptive uni-
cast delivery of streaming MPEG-4 video by responding to varying
network conditions. This paper primarily focuses on techniques to
deal with packet losses, which are common on the Internet.
Inter-frame video compression algorithms such as MPEG-4 ex-
ploit temporal correlation between frames to achieve high levels of
compression by independently coding reference frames, and repre-
senting the majority of the frames as the difference from each frame
and one or more reference frames. However, these algorithms suf-
fer from the well-known propagation of errors effect, because er-
rors due to packet loss in a reference frame propagate to all of the
dependent difference frames. The resulting stream is not even re-
silient to small amounts of packet loss. There is a fundamental
tradeoff between bandwidth efficiency (obtained by compression)
and error resilience (obtained by coding or retransmission). Inter-
frame compression schemes (such as MPEG-4) achieve significant
compression of bits in comparison to other schemes that do not ex-
ploit temporal correlation (such as motion JPEG [21]), but they are
also less resilient to packet loss because of the dependencies that
exist between data from different frames. While many methods
have been proposed to add redundancy to the bitstream to allow
for more effective error correction [9, 10, 51, 54], they also reduce
much of the gains garnered from compression.
Errors in reference frames are more detrimental than those in
derived frames due to propagation of errors and should therefore be
given a higher level of protection than other data in the bitstream.
One solution is to add redundancy to more important portions of
the bitstream, or to code more important portions of the stream at
a relatively higher bitrate [1, 25, 39, 50]; however, this approach
reduces compression gains and in many cases does not adequately
handle packet losses that occur in bursts.
Prior work has gathered experimental results that describe packet
loss characteristics for MPEG video and suggest the need for bet-
ter error recovery and concealment techniques [11]. Motivated by
prior analysis, as well a general model we have developed to ex-
plain the effects of packet loss on MPEG video, we have developed
a system that uses receiver-driven selective reliability in conjunc-
tion with receiver postprocessing to efficiently recover from packet
losses in reference frames.
Some researchers have argued that retransmission-based error
resilience is infeasible for Internet streaming because retransmis-
sion of lost data takes at least one additional round-trip time, which
may be too much latency to allow for adequate interactivity [10,
51, 54]. However, because of the nature of inter-frame compres-
sion, certain types of packet loss can be excessively detrimental to
the quality of the received bitstream. We show that such losses can
be corrected via retransmission without significantly increasing de-
lay, using only a few frames’ worth of extra buffering. In a stream-
ing system that transports video bitstreams with inter-dependent
frames, careful retransmission of lost packets provides significant
benefits by alleviating the propagation of errors.
While our system primarily focuses on the use of selective re-
transmission for packet loss recovery, we also show how our sys-
tem allows selective retransmission to be used in conjunction with
other error control and concealment techniques. When delay or
transient loss is prohibitively high, retransmission of lost pack-
ets may not always be feasible. In these circumstances, we pro-
pose a mechanism for recovering data in reference frames using
postprocessing at the receiver. Specifically, we propose a scheme
for reconstructing important missing data in reference frames us-
ing texture and motion information from surrounding frames. Our
motion-compensated recovery techniques allow partial recovery of
important data, limiting propagation of errors without imposing the
buffering constraints required for selective retransmissions.
To recover from packet losses, our system uses application-level
framing (ALF) [16]. Because dealing with data loss is application-
1
Page 2
dependent, the application, rather than the transport layer, is most
capable of handling these losses appropriately. Moreover, in the
case of video, the receiver is best-equipped to make decisions with
regard to packet loss recovery (e.g., whether to request a retrans-
mission, to use postprocessing and error concealment, or simply
to drop the frame). The ALF principle states that data must be
presented to the application in units that are both meaningful to
that application and independently processible. These units, called
application data units (ADUs), are also the unit of error recov-
ery. We have used this philosophy in our design of a backwards-
compatible receiver-driven selective retransmission extension to
RTP [48] called SR-RTP. This extension provides semantics for re-
questing the retransmission of independently-processible portions
of the bitstream and a means for reassembling fragmented portions
of independently processible units. ALF allows the application to
be notified when incomplete frames arrive and control error con-
cealment decisions.
In addition to providing a means for recovering from packet loss,
a video streaming system for the Internet should adapt its sending
rate and the quality of the video stream it sends in accordance with
the available bandwidth. It is widely believed that the stability of
the modern Internet is in large part due to the cooperative behav-
ior of the end hosts implementing the window increase/decrease
algorithms described in [2, 29]. A video streaming system should
deliver video at the highest possible quality for the available band-
width and share bandwidth fairly with TCP flows. To accomplish
this, our video server uses information in RTCP receiver reports
to discover lost packets and round-trip time variations and adapt
its sending rate according to a certain congestion control algorithm
using the Congestion Manager (CM) [3, 5] framework. Rapid os-
cillations in the instantaneous sending rate often degrade the qual-
ity of the received video by increasing the required buffering and
inducing layer oscillations. To achieve smoothing of video quality,
our system exploits binomial congestion control algorithms [6, 19],
a family of TCP-friendly congestion control algorithms that reduce
rate oscillation.
This paper focuses on packet loss recovery and describes our
implementation that enables this framework. We describe:
A system employing SR-RTP, receiver postprocessing, and
the CM to enable the adaptive transmission of MPEG-4
video in the face of packet loss, bandwidth variation, and
delay variation.
An analytical model to explain the effects of packet loss on
the overall quality of an MPEG-4 bitstream and an evaluation
of our system based on this model.
Section 2 presents an overview of the MPEG-4 video compres-
sion standard, derives a model for propagation of error due to
packet loss based on empirical observations, and quantifies the ef-
fects of packet losses in reference frames. Section 3 presents our
framework and implementation for streaming multimedia data in
a manner that is resilient to packet loss and adaptive to varying
network conditions. Section 4 discusses experiments that we per-
formed with our streaming system that demonstrate situations in
which selective reliability can provide considerable benefit. We
discuss related projects in Section 5 and conclude in Section 6.
2. MODEL
In this section, we develop the case for selective reliability,
whereby certain portions of an MPEG-4 bitstream can be trans-
mitted reliably.
Prior work has proposed protocols that use selective retransmis-
sion for recovering from bit errors [36]. Others have gathered em-
pirical data on the effect of transmitting MPEG video over the In-
ternet [11]. In our work, we derive a general packet loss model that
explains the quality degradation of MPEG-4 in the face of packet
loss as seen on the Internet, validate our packet loss model with
experiments, and show through analysis and experiments how our
system provides performance benefits. This section presents our
packet loss model and quantifies the benefits of selective retrans-
mission for packet loss recovery.
We describe the problem in detail, present an analysis of video
quality in the presence of packet loss, make a quantitative case for
selective reliability, and argue how selective reliability can be used
in conjunction with other loss recovery techniques.
2.1 Problem Description
We start with a description of the MPEG-4 video compression
standard, then analyze the quality degradation caused by packet
loss. We focus on whole packet erasures, modeling congestion-
related loss, rather than bit corruption.
2.1.1 MPEG-4 Background
The MPEG-4 compression standard achieves high compression
ratios by exploiting both spatial and temporal redundancy in video
sequences. While spatial redundancy can be exploited by simply
coding each frame separately (just as it is exploited in still images),
many video sequences exhibit temporal redundancy, as two con-
secutive frames are often very similar. An MPEG bitstream takes
advantage of this by using three types of frames.1
“I-VOPs” or “I-frames” are intra-coded images, coded inde-
pendently of other frames in a manner similar to a JPEG image.
These are reference frames and do not exploit temporal redundancy.
MPEG uses two types of dependent frames: predictively coded
frames (“P-VOPs” or “P-frames”), and bi-directionally coded
frames (“B-VOPs” or “B-frames”). P-frames are coded predic-
tively from the closest previous reference frame (either an I-frame
or a preceding P-frame), and B-frames are coded bi-directionally
from the preceding and succeeding reference frames.
2.1.2 Error Propagation
The ability to successfully decode a compressed bitstream with
inter-frame dependencies depends heavily on the receipt of refer-
ence frames (i.e., I-frames, and to a lesser degree P-frames). While
the loss of one or more packets in a frame can degrade its quality,
the more problematic situation is the propagation of errors to de-
pendent frames. An example of error propagation is shown in Fig-
ures 1 and 2; the rectangular patch near the bottom of Figure 1 is
the result of a single loss in an I-frame (no local error concealment
is done in this example). This error spreads to neighboring frames
as well, as shown in Figure 2 which depends on several preceding
differentially coded frames.
Figure 3 shows the evolution of frame-by-frame PSNR2 for the
luminance (i.e., “Y”) component as a function of the original raw
frame number for various packet loss rates. The evolution for a
1 In fact, MPEG-4 codes each independent object within a frame
as a “VOP”, or “video object plane”, but for simplicity and with-
out loss of generality, we will use the terms frame and VOP in-
terchangeably.
2 Peak Signal to Noise Ratio (PSNR) is a coarse and controversial
indicator of picture quality that is derived from the root mean
squared error (RMSE). The PSNR for a degraded
8-bit
image
from the original image is computed by the formula
ffflfi
ffi
fi
ffi "!
ffi
fi$#
fi
%$&('
!
ffi
#
fi
)*&
',+ -/.102 35467-98:.02 354<;
*=
fi?>
.
2
capable of handling these losses appropriately. Moreover, in the
case of video, the receiver is best-equipped to make decisions with
regard to packet loss recovery (e.g., whether to request a retrans-
mission, to use postprocessing and error concealment, or simply
to drop the frame). The ALF principle states that data must be
presented to the application in units that are both meaningful to
that application and independently processible. These units, called
application data units (ADUs), are also the unit of error recov-
ery. We have used this philosophy in our design of a backwards-
compatible receiver-driven selective retransmission extension to
RTP [48] called SR-RTP. This extension provides semantics for re-
questing the retransmission of independently-processible portions
of the bitstream and a means for reassembling fragmented portions
of independently processible units. ALF allows the application to
be notified when incomplete frames arrive and control error con-
cealment decisions.
In addition to providing a means for recovering from packet loss,
a video streaming system for the Internet should adapt its sending
rate and the quality of the video stream it sends in accordance with
the available bandwidth. It is widely believed that the stability of
the modern Internet is in large part due to the cooperative behav-
ior of the end hosts implementing the window increase/decrease
algorithms described in [2, 29]. A video streaming system should
deliver video at the highest possible quality for the available band-
width and share bandwidth fairly with TCP flows. To accomplish
this, our video server uses information in RTCP receiver reports
to discover lost packets and round-trip time variations and adapt
its sending rate according to a certain congestion control algorithm
using the Congestion Manager (CM) [3, 5] framework. Rapid os-
cillations in the instantaneous sending rate often degrade the qual-
ity of the received video by increasing the required buffering and
inducing layer oscillations. To achieve smoothing of video quality,
our system exploits binomial congestion control algorithms [6, 19],
a family of TCP-friendly congestion control algorithms that reduce
rate oscillation.
This paper focuses on packet loss recovery and describes our
implementation that enables this framework. We describe:
A system employing SR-RTP, receiver postprocessing, and
the CM to enable the adaptive transmission of MPEG-4
video in the face of packet loss, bandwidth variation, and
delay variation.
An analytical model to explain the effects of packet loss on
the overall quality of an MPEG-4 bitstream and an evaluation
of our system based on this model.
Section 2 presents an overview of the MPEG-4 video compres-
sion standard, derives a model for propagation of error due to
packet loss based on empirical observations, and quantifies the ef-
fects of packet losses in reference frames. Section 3 presents our
framework and implementation for streaming multimedia data in
a manner that is resilient to packet loss and adaptive to varying
network conditions. Section 4 discusses experiments that we per-
formed with our streaming system that demonstrate situations in
which selective reliability can provide considerable benefit. We
discuss related projects in Section 5 and conclude in Section 6.
2. MODEL
In this section, we develop the case for selective reliability,
whereby certain portions of an MPEG-4 bitstream can be trans-
mitted reliably.
Prior work has proposed protocols that use selective retransmis-
sion for recovering from bit errors [36]. Others have gathered em-
pirical data on the effect of transmitting MPEG video over the In-
ternet [11]. In our work, we derive a general packet loss model that
explains the quality degradation of MPEG-4 in the face of packet
loss as seen on the Internet, validate our packet loss model with
experiments, and show through analysis and experiments how our
system provides performance benefits. This section presents our
packet loss model and quantifies the benefits of selective retrans-
mission for packet loss recovery.
We describe the problem in detail, present an analysis of video
quality in the presence of packet loss, make a quantitative case for
selective reliability, and argue how selective reliability can be used
in conjunction with other loss recovery techniques.
2.1 Problem Description
We start with a description of the MPEG-4 video compression
standard, then analyze the quality degradation caused by packet
loss. We focus on whole packet erasures, modeling congestion-
related loss, rather than bit corruption.
2.1.1 MPEG-4 Background
The MPEG-4 compression standard achieves high compression
ratios by exploiting both spatial and temporal redundancy in video
sequences. While spatial redundancy can be exploited by simply
coding each frame separately (just as it is exploited in still images),
many video sequences exhibit temporal redundancy, as two con-
secutive frames are often very similar. An MPEG bitstream takes
advantage of this by using three types of frames.1
“I-VOPs” or “I-frames” are intra-coded images, coded inde-
pendently of other frames in a manner similar to a JPEG image.
These are reference frames and do not exploit temporal redundancy.
MPEG uses two types of dependent frames: predictively coded
frames (“P-VOPs” or “P-frames”), and bi-directionally coded
frames (“B-VOPs” or “B-frames”). P-frames are coded predic-
tively from the closest previous reference frame (either an I-frame
or a preceding P-frame), and B-frames are coded bi-directionally
from the preceding and succeeding reference frames.
2.1.2 Error Propagation
The ability to successfully decode a compressed bitstream with
inter-frame dependencies depends heavily on the receipt of refer-
ence frames (i.e., I-frames, and to a lesser degree P-frames). While
the loss of one or more packets in a frame can degrade its quality,
the more problematic situation is the propagation of errors to de-
pendent frames. An example of error propagation is shown in Fig-
ures 1 and 2; the rectangular patch near the bottom of Figure 1 is
the result of a single loss in an I-frame (no local error concealment
is done in this example). This error spreads to neighboring frames
as well, as shown in Figure 2 which depends on several preceding
differentially coded frames.
Figure 3 shows the evolution of frame-by-frame PSNR2 for the
luminance (i.e., “Y”) component as a function of the original raw
frame number for various packet loss rates. The evolution for a
1 In fact, MPEG-4 codes each independent object within a frame
as a “VOP”, or “video object plane”, but for simplicity and with-
out loss of generality, we will use the terms frame and VOP in-
terchangeably.
2 Peak Signal to Noise Ratio (PSNR) is a coarse and controversial
indicator of picture quality that is derived from the root mean
squared error (RMSE). The PSNR for a degraded
8-bit
image
from the original image is computed by the formula
ffflfi
ffi
fi
ffi "!
ffi
fi$#
fi
%$&('
!
ffi
#
fi
)*&
',+ -/.102 35467-98:.02 354<;
*=
fi?>
.
2
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
8 Readers on Mendeley
by Discipline
13% Engineering
by Academic Status
38% Student (Master)
13% Lecturer
13% Post Doc
by Country
38% United States
13% South Korea
13% Sweden



