Scalable Video Coding

The H.264 Scalable Video Coding standard was developed jointly by the ITU and ISO standards organizations. These two groups created the Joint Video Team (JVT) to develop the H.264 standard and then continued in development of the H.264 Scalable Video Coding standard which was ratified in 2007.

H.264 Scalable Video Coding (hereinafter “SVC”) enables the encoding of a high quality video bit stream that contains within it an H.264 compatible base layer and one or more enhancement layers. This layered approach to coding provides the ability for multiple representations of the video to be derived from the stream. The following modalities are possible:

  • Temporal scalability: the ability to present the video at different frame rates
  • Spatial scalability: the ability to present the video at different spatial resolutions
  • SNR Quality scalability: the ability to present the video at different quality levels (i.e. bit rates)
  • Combined scalability – combinations of three modalities described above

An example of spatial scalability is presented in figure 1. In order to delivervideo at multiple resolutions using traditional methods, a separate copy of thevideo must be produced, stored and delivered at each resolution. With the layered approach of SVC a single stream can be produced from which derivative streams can be extracted at each of the required resolutions with minimal overhead.

fig11

This type of layering facilitates what is called Heterogeneous Device Support, meaning the ability to support multiple devices with different capabilities from a single stream. An example of this is shown below.

fig21


Video Communications

Real-time video communication applications (consumer video chat, enterprise videoconferencing, etc.) are experiencing extremely rapid growth.  Fueling that growth are ubiquitous, inexpensive personal computers, high speed broadband services and smartphones with unprecedented computing power. H.264 Scalable Video Coding technology addresses the most critical factors necessary for real-time video communication deployment.  These are:

  • Low latency
  • Error resilience
  • Rate adaption
  • Scalable multi-endpoint support

Due to the crucial need for latency minimization in real-time video communications, error resilience methods such as packet retransmission used in HTTP are not viable.  Unidirectional transport protocols such as UDP achieve the requirement for low latency but are subject to packet loss.  SVC elegantly addresses the packet loss problem with very robust error resilience – 1) the SVC base layer is typically a small portion of the overall stream so it is statistically unlikely to lose packets, 2) base layer can also be strongly protected using FEC with far less overhead than applying FEC to the entire stream, 3) error concealment is significantly improved because predicting higher resolution layers using the base layer at the current time avoids glitches caused from using higher resolution layers from earlier in the stream.

img-video-comm01

Improved Error Resilience Using SVC

SVC also supports real-time rate adaption, providing to each end-point the maximum quality stream possible based on its capabilities (resolution, frame rate, power) and current network conditions.

img-video-comm03

SVC enables each endpoint to have optimal video quality while minimizing bandwidth usage

In the case of multi-endpoint applications, SVC eliminates the costs associated with today’s solutions using Multipoint Conferencing Units (MCUs).  By reducing the cost of high quality, multi-endpoint chats so dramatically, SVC is clearly a key enabling technology for deploying real-time video communication on a massive scale. Video quality is also significantly enhanced by eliminating generational loss and processing latency incurred inside the MCU.

img-video-comm04

SVC enables telepresence quality multi-party videoconferencing without requiring an MCU