H.264 Advanced Video Coding: A Whirlwind Tour

Rate control

Why the buzz about H.264? It's the bitrate!

 

Concepts

H.264 is getting so much attention because it can encode video with approximately 3 times
fewer bits than comparable MPEG-2 encoders. This opens new possibilities such as: 

  • Squeezing more television programs into a given channel bandwidth;
  • Delivering quality video over bandwidth-constrained networks (e.g., 3 and 4G Mobile),
  • Fitting a high-definition movie feature onto a standard DVD.

Because H.264 is up to twice as efficient as MPEG-4 Part 2 (natural video) encoding, it has recently been welcomed into the MPEG-4 standard as Part 10 – Advanced Video Coding. Many established encoder and decoder vendors are moving directly to H.264 and skipping the intermediate step of MPEG-4 Part 2.

If you have some experience with video compression, the best way to appreciate the buzz is to run a run a case yourself. An easy approach is to acquire our Expert-H264 demo and interactively encode video content from a variety of popular formats.

In either case, the resulting .26L file will seem too small to be believed. When you run the decoder and view the decoded file, you will be amazed at how good the quality is at such a low bit rate.

Glance at these articles, which are typical of the enthusiasm for H.264:

Goals & Approach of H.264

The International Telecommunications Union (ITU) initiated the h.26L (for long term) effort in 1998 as a continuation of work following the MPEG-2 and H.263 standards. The overriding goal was to achieve a factor-of-2 reduction in bit rate compared to any competing standard.

Recall that MPEG-2 was optimized with specific focus on Standard and High Definition digital television services, which are delivered via circuit-switched head-end networks to dedicated satellite uplinks, cable infrastructure or terrestrial facilities. MPEG2's ability to cope is being strained as the range of delivery media expands to include heterogeneous mobile networks, packet-switched IP networks, and multiple storage formats, and as the variety of services grows to include multimedia messaging, increased use of HDTV, and others. Thus, a second goal for H.264 was to accommodate a wider variety of bandwidth requirements, picture formats, and unfriendly network environments that throw high jitter, packet loss, and bandwidth instability into the mix.

The H.264 approach is a strictly evolutionary extension of the block-based encoding approach so well established in the MPEG and ITU standards. Key steps include:
  • Use of Motion Estimation to support Inter-picture prediction for eliminating temporal redundancies
  • Use of spatial correlation of data to provide Intra-picture prediction.
  • Construction of residuals as the difference between predicted images and source images.
  • Use of a discrete spatial transform and filtering to eliminate spatial redundancies in the residuals.
  • Entropy coding of the transformed residual coefficients and of the supporting data such as motion vectors.

Although H.264 introduces carries several techniques that push the block based approach up to the limits of its efficiency. It is not a fundamentally different approach, but rather a significant refinement of well-established methods.

Major Features of H.264

To understand the new features of H.264, it is assumed that you are familiar with the approach and terminology used in MPEG 1 & 2 – if not, there are many fine books and reviews on this subject. For example, you can go to a useful MPEG-2 tutorial for a summary of MPEG-2 video and systems.

Now we will summarize the key features and point you to some useful background material. Note that some H.264 features not supported by initial profiles are identified by an asterisk (*).

Improved Inter-Prediction and Motion Estimation

First recall the limitations of motion estimation in MPEG-2, which searches reference pictures for a 16x16 set of pixels that closely matches the current macro block. The matching set of pixels must be completely within the reference picture. In contrast, H.264 provides:

  • Fine-grained motion estimation. Temporal search seeks matching sub-macro blocks of variable size as small as 4x4, and finds the motion vector to _ pel resolution. Searches may also identify motion vectors associated with matching sub-macro blocks of 4x8, 8x4, 8x8, 8x16, 16x8, or the full 16x16. In future, even finer 1/8 pel resolution will be supported.
  • Multiple reference frames. H.264 provides additional flexibility for frames to point to more than multiple frames – which may be any combination of past and future frames. This capability provides opportunities for more precise inter-prediction, but also improved robustness to lost picture data.
  • Unrestricted motion search. Motion search allows for reference frames that may be partly outside the picture; missing data can be spatially predicted from boundary data. Users may choose to disable this feature by specifying a Restricted Motion search.
  • Motion vector prediction. Where sufficient temporal correlation exists, motion vectors may be accurately predicted and only their residuals transmitted explicitly in the bitstream.

Such techniques not only provide for more accurate inter-prediction, but also help to partition and scale the bitstream with priority given to data that is more globally applicable. Thus, they not only improve compression but also resilience to errors and network instabilities.

Improved Intra Spatial Prediction and Transform

Because "intra prediction" is concerned with only one picture at a time, it relies upon spatial rather than temporal correlations. As the algorithm works through a picture's macro blocks in raster scan order, earlier results may be used to "predict" the downstream calculations. Then we need only transmit residuals as refinements to the predicted results.

 

H.264 performs intra prediction in the spatial domain (prior to the transform, and it is a key part of the approach. Even for an intra-picture, every block of data is predicted from its neighbors before being transformed and coefficients generated for inclusion in the bitstream.

  • Coarse versus fine intra prediction. Intra prediction may be performed either on 4x4 blocks, or 16x16 macro blocks. The latter is more efficient for uniform areas of a picture.
  • Direction Dependent Intra Modes. By doing intra prediction in the spatial domain (rather than in the transform domain), H.264 can employ prediction that is direction dependent, and thus can focus on the most highly correlated neighbors. For Intra 16x16 coding and Intra 4 x 4 coding, there are 9 and 4 directional modes, respectively.
  • 4x4 transform of Residual Data. For initially supported profiles, residual data transforms are always performed for 4x4 blocks of data, and coefficients transmitted on this fine-grained basis.
  • Variable block sizes for spatial transform*. Future profiles will allow transform of variable size blocks (4x8, 8x8, etc.) with the same level of flexibility as motion estimation blocks. This will provide more flexibility and further reduction of bitrate.
  • Integer transforms. Efficiency in both computation and bitrate is gained by implementing the traditional Discrete Cosine Transform (DCT) as an integer transform that requires no multiplications, except for a single normalization. It can also be inverted exactly without mismatch.
  • Deblocking filter. To eliminate fine structure blockiness that might be aggravated by the smaller transform blocks, a context-sensitive deblocking filter smooths out the internal edges. Its filter strength depends upon the prediction modes and relationship between the neighboring blocks. In addition to increasing signal-to-noise ratio (S/N), this technique significantly improves the subjective quality of the image for a given S/N.
*SI and SP Pictures (or slices)

 

Improved Algorithms for Encoding

Two alternative methods improve efficiency of the entropy coding process by selecting variable length codes depending upon context of the data being encoded.

  • Context-Adaptive Variable Length Coding (CAVLC) employs multiple variable length codeword tables to encode transform coefficients, which consume the bulk of bandwidth. Based upon a priori statistics of already processed data, the best table is selected adaptively. For non-coefficient data, a simpler scheme is used that relies upon only a single table.
  • Context-Adaptive Binary Arithmetic Coding (CABAC) provides an extremely efficient encoding scheme when it is known that certain symbols are much more likely than others. Such dominant symbols may be encoded with extremely small bit/symbol ratios. The CABAC method continually updates frequency statistics of the incoming data and adaptively adjusts the algorithm in real-time. This method is an advanced option available in profiles beyond the baseline profile.

Techniques for Mitigation of Errors, Packet Losses, and Network Variability

Error containment and scalability. H.264 includes several other features that are useful in containing the impact of errors, and in enabling the use of scalable or multiple bit streams:
  • Slice coding. Each picture is subdivided into one or more slices. The slice is given increased importance in H.264 as the basic spatial segment that is independent from its neighbors. Thus, errors or missing data from one slice cannot propagate to any other slice within the picture. This also increases flexibility to extend picture types (I, P, B) down to the level of "slice types." Redundant slices are permitted.
  • Data partitioning is supported to allow higher priority data (e.g., sequence headers) to be separated from lower priority data (e.g., B-picture transform coefficients).
  • Flexible macro block ordering (FMO) can be used to scatter the bits associated with adjoining macro blocks more randomly throughout the bit stream. This reduces the chance that a packet loss will affect a large region and enables error concealment by ensuring that neighboring macro blocks will be available for prediction of a missing macro block.
  • The Multiple Reference Frames that are used for improved motion estimation also allow for partial motion compensation for a P picture when one of its referenced frames is missing or corrupted.

MPEG-2 practice is to insert intra pictures (I) at regular intervals to contain errors that otherwise could propagate through the picture sequence indefinitely. In addition, intra-pictures provide a means for random access or fast-forward actions, because intra frames do not require any knowledge of other referenced frames. Similarly, regular I pictures would be necessary to switch promptly from between higher and lower bitrate streams – an important feature for accommodating the bandwidth variability in mobile networks. However, I pictures typically require far more bits than P pictures and thus are an inefficient means for addressing these two requirements.

H.264 introduces two new slice types
, "Switching I Pictures" (SI) and "Switching P Pictures" (SP), which help address these needs with significantly reduced bit rate. Identical SP frames can be obtained even though different reference frames are used – thus, they can be substituted for I frames as temporal resynchronization points, but with significantly reduced bitrate. SP pictures rely upon the transformation and quantization of predicted inter blocks. Because SP pictures do not take full advantage of intra-prediction, at the cost of some bits they can be extended to SI pictures which do so.

Note that because slices are coded independently, switching slices (SI or SP) can be defined at that level.

Low Latency Feature

Arbitrary Slice Ordering (ASO) relaxes the constraint that all macro blocks must be sequenced in decoding order, and thus enhances flexibility for low-delay performance important in teleconferencing applications and interactive Internet applications.

 

Simplified Profiles

H.264 is completely focused on efficient coding of natural video and does not directly address the object-oriented functionality, synthetic video, and other systems functionality in MPEG-4, which carries a very complex structure of over 50 profiles.

In contrast, H.264 is initially defined with only three profiles:

  • Baseline Profile. A basic goal of H.264 was to provide a royalty-free baseline profile to encourage early application of the standard. The baseline profile consists most of the major features described above, with the exception of: B slices and weighted prediction; CABAC encoding; field coding; and SP & SI slices. Thus, the baseline profile is appropriate for many progressive scan applications such as video conferencing and video-over-IP, but not for interlaced television or multiple stream applications.
  • Main Profile. Main profile contains all of the features in Baseline, except flexible macro block ordering (FMO), arbitrary slice order (ASO) and redundant slices. However, it adds field coding, B slices and weighted prediction, and CABAC entropy coding. This profile is appropriate for efficient coding of interlaced television applications where bit or packet error is not excessive, and where low latency is not a requirement.
  • Extended Profile. This profile contains all features from the baseline profile and main profiles, except that CABAC is not supported. In addition, the Extended profile adds SP and SI for stream switching, and up to 8 slice groups. This profile is appropriate for server-based streaming applications where bit-rate scalability and error rate is very important. Mobile video services would be an example.

Where will H.264 have the biggest impact?

Any video application can benefit from a reduction in bandwidth requirements, but highest impact will involve applications where such reduction relieves a hard technical constraint, or which makes more cost-effective use of bandwidth as a limiting resource.

In addition, other H.264 features such error containment, error concealment, and efficient bitstream switching is especially useful for IP and wireless environments.
  • Squeeze More Services into a Broadcast Channel. Reduction in bandwidth requirements by factors of 2-3 provide cost savings for bandwidth-constrained services such as satellite and DVB-Terrestrial, or alternatively allow such providers to expand services at reduced incremental cost.
  • Facilitate High Quality Video Streaming over IP Networks. H.264 can produce very good quality, TV Quality streaming at less than 1Mbps (standard definition). This slips under 1 Mbps thresholds for xDSL and thus opens possibilities for new access methods for high quality, larger format video.
  • High Definition Transmission and Storage. Recall that MPEG-2 consumes 15-20 Mbps for High Definition video at suitable quality for broadcast or DVD. Use of H.264 will bring this down to about 8 Mbps, making it possible for bandwidth-strapped satellite service providers to fit 4 HD channels per QPSK channel. Even more significant is that this reduction enables burning one HD movie onto a conventional DVD, thus avoiding the need for the industry to adapt a higher density ("blue laser") DVD format.

Mobile Video Applications

Mobile networks such as 3 and 4G present an unusual array of technical challenges that have driven many features in H.264. Applications include video conferencing, streaming video on demand, multimedia-messaging services, and low resolution broadcast. Some key issues, and H.264 tools for dealing with them, include:

  • Low bandwidth (50 – 300 kbps) is the key issue. The expected trend is for 3G deployment to start with h.263 and move up to H.264 as it matures. An industry analyst points out "… 3G networks are only likely to offer 57.6kbit/s initially. As those bit rates increase, mobiles and networks will move to the new H.264 codec, which offers twice the performance of H.263. This should result in the same picture quality being achieved at half the bit rate."
  • Small devices with many formats; variability of available bandwidth. For streaming applications, these two separate issues can be addressed by providing multiple streams with different formats and bandwidths, and selecting the appropriate stream at run-time. H.264's SP and SI pictures facilitate dynamic switching among multiple streams to accommodate bandwidth variability.
  • High bit error rates, packet losses, and latency. For video applications, retransmissions are impractical for dropped or delayed packets, so H.264 provides several means (e.g., FMO, data partitioning, etc.) to contain error impacts and facilitate error concealment.

What is the relationship to MPEG-4 and MPEG-2?

Compared to MPEG-2 - H.264 employs the same general approach as MPEG 1 & 2 as well as the h.261 and h.263 standards, but adds many incremental improvements to obtain coding efficiency improvement of about a factor-of-3.

MPEG-2 was optimized with specific focus on Standard and High Definition digital television services, which are delivered via circuit-switched head-end networks to dedicated satellite uplinks, cable infrastructure or terrestrial facilities. MPEG2's ability to cope is being strained as the range of delivery media expands to include heterogeneous mobile networks, packet-switched IP networks, and multiple storage formats, and as the variety of services grows to include multimedia messaging, increased use of HDTV, and others. Thus, a second goal for H.264 was to accommodate a wider variety of bandwidth requirements, picture formats, and unfriendly network environments that throw high jitter, packet loss, and bandwidth instability into the mix.

Compared to MPEG-4 - During 2002, the H.264 Video Coding Experts Group combined forces with MPEG4 experts to form the Joint Video Team (JVT), so H.264 is being published as MPEG-4 Part 10 (Advanced Video Coding) and will in essence become part of future releases of MPEG-4.

MPEG-4 is really a family of standards whose overall theme is object-oriented multimedia applications. It thus has much broader scope than H.264, which is strictly focused on more efficient and robust video coding. The comparable part of MPEG-4 is Part 2 Visual (sometimes called "Natural Video"). Other parts of MPEG address scene composition, object description and java representation of behavior, animation of human body and facial movements, audio and systems.

Compared to other results - Numerous comparisons between H.264 performance and other standards can be found at the end of general articles, or within the standards group. Such comparisons are frequently based upon Signal-to-Noise ratios or upon subjective comparisons of the video clip or of individual frames. We include a few frames from our own comparisons, as well as some articles presenting results independent from the Joint Video Team.

Static Comparisons - Of course, if our hardware and networks provided an infinite bitrate, efficient video compression would not be such an important issue, and the advanced compression methods of H.264 would be unnecessary. However, the bitrate is limited in the real world of broadcasting, DVD, and mobile video, and advanced standards such as H.264 provided greatly improve quality at any given bitrate.

Figures 1 & 2 provide some static comparisons of an individual frame from the popular Foreman 176x144 (QCIF) clip. Later, we will tell you how to download the short (100 frame) clips corresponding to these cases in a form that can be played back on any popular media player and compared.

 

foreman 1 foreman 2
Figure 1A shows a picture encoded in MPEG2 at 2400 kbps (kilobits per second) – in essence an infinite number of bits for such a small format. MPEG2 encoding at this bitrate is essentially lossless, so the resulting quality closely matches that of the original source data. In Figure 1B, the bitrate has been reduced to 400 kbps, and you can begin to see some fuzziness in the frame as the quantization has been made coarser to drop the bit rate; when you view the corresponding video clip, you will also see some smearing of fast panning motions.
foreman3
When the MPEG2 bitrate is further dropped to 100 kpbs in Figure 1C, things begin to fall apart. You begin to see blockiness at the macro block level as some macro blocks can only be resolved as uniform (DC) values, and any fast motion is distorted.

The value of H.264 is most obvious at low bit rates. In figure 2, you can see the difference of encoding via MPEG-2 and H.264 at 100 kpbs. The tremendous improvement in quality produced by H.264 is self-explanatory.

foreman 4 foreman 5

Figure 2A. MPEG-2 Figure 2B. H.264

Another way to look at this is to compare the bit rates needed by MPEG-2 and H.264, for similar quality images. In our judgment, figure 1b (MPEG-2 at 400 kpbs) and Figure 2b (H.264 at 100 kbps) show very similar quality. While comparing quality is very subjective, this is consistent with PixelTools evaluations of many tests – generally showing a 3-4-fold decrease in bit rate from MPEG-2 for the same level of quality.

H.264 is so new that few free decoding tools are easily available, so making evaluations can be awkward. For example, you can decode with the reference code and then view the result on a YUV Viewer, but that requires quite a few steps.

To make it easier for you to easily compare video clips from MPEG and H.264, we have performed a little sleight of hand so that you can simply use a standard media player such as RealPlayer or Windows Media Player to view and compare all our results. After producing the 100 kpbs H.264 stream shown in Figure 2b, we ran the result through our MPEG-2 encoder at very high bitrate – 2400 kpbs – so as not to introduce any further distortion. So you can easily compare the H.264 clip listed below with any of pure MPEG2 results at various bitrates.

To get these files, go to the PixelTools' ftp site and enter the folder: you will see 6 files that can be opened or downloaded:

    1. foreman_H264_100kbps.26L
    2. foreman_H264_100kbps.mp
    3. foreman_mpeg2_100kbps.mpg
    4. foreman_mpeg2_200kbps.mpg
    5. foreman_mpeg2_400kbps.mpg
    6. foreman_mpeg2_2400kbps.mpg

The first file is the actual H.264 encoded stream, if you have easy access to an H.264 decoder. If you would prefer to view the wrapped version through a media player, use the second file instead. The remaining 4 files are MPEG2-generated cases at very high, medium and very low bitrates for comparison.

Independent Evaluations of H.264 Performance

Nokia and Tampere University of Technology paper focuses on comparisons between H.264 and H.263 for very low bit rates of interest to wireless video conference applications. The results confirm that most of the conglutination time is spent in the motion estimation search involving variable block sizes. Very low bit rates are achieved by lowering the global quantization parameter.

What H.264 products are available now or on the way?

As of February 2003, there was considerable standards evaluation, prototyping, and development activity under way among many digital systems vendors, many of whom are active participants in the Joint Video Team. The number of released products and announcements is continuing to have an impact of an every growing market.

PixelTools Activities with H.264

PixelTools Corporation has been providing MPEG solutions and products to customers since 1994. Anticipating the next step in our product line, we have been engaged in following and evaluating H.264 progress since mid-2002. Our focus is to serve the off-line content encoding market with a very flexible, high quality software implementation of H.264 reference software, similar to our current MPEG2 products such as MPEG Repair, DVD Expert, and Expert HD.

As a first step, we are providing Expert H264, which provides windows interface for running the most current reference implementation of H.264. The interface is consistent with that of our newest high performance encoder, Expert HD, and provides the ability to encode from a variety of popular source formats and to monitor encoding progress through the encoding UI. In addition, we are providing value to the specification and reference implementation by providing the following:

  • Global Rate Control. We are currently prototyping a client-side mechanism for global rate control. This mechanism provides soft control of overall bit rate, without interfering with the rate-distortion optimization at the macro block level.
  • Performance Optimization. In Expert HD, our latest software encoder for MPEG-2, we have applied a variety of optimization techniques to retain our high flexibility and quality, while reducing execution times to reduce execution times by a factor 3 to 10. Some of these techniques are platform independent, while others take advantage of platform-specific capabilities such as Intel SSE-2. We are currently working closely with leading H.264 experts to employ fast motion estimation algorithms and performance optimization techniques to produce a fast, high flexibility and high quality software implementation of H.264 reference code behavior.
  • Shrink-wrapped Package with GUI for Content Producers. We are extending our MPEG-Repair and DVD-Expert products to add H.264 encoding to the current MPEG 1 & 2 capabilities. Higher flexibility UIs will be extended to provide user control of all options supported by the Baseline, Main and Extended profiles.
  • Optimization for Error Robustness. Many H.264 encoding features must be applied at the video layer to optimize the trade-offs between bitrate and error reduction. PixelTools is developing UI support and quantitative guidance for optimally employing these features under different system environments.

 

Let us know if we can help or request a free demo of our products. View our products features at a glance.

Visit our products page and check out at our PixelTools Store to purchase any of our products


Thank you for your interest in PixelTools

 

| Contact Us

 
© 2012 PixelTools| Privacy Policy | Site Map