Sunday, August 26, 2007

MPEG-1/2/4

MPEG-1, ISO 11172, was designed for progressively scanned video and the target bit rate is around 1.2Mbps (totally 1.5Mbps including audio and data). And it had to support basic VCR-like interactivity, like forward, fast reverse, etc. MPEG-1 consists of five parts. Only video part is considered here.

* Picture Formats Supported
SIF, progressive scanning

* Video Coding Tools
- I, P and B frames. Bidirectional motion compensation is used for B frame. Two MV needed for B frames. And the coding order is different from the scanning order.
- For I frame, the weight matrix is used to adapt DCT coeffs to human visual system before the uniform quantization of them. The DC of I-block is coded predictively.
- Half pel predication precision and MV range is +/-64 pels (7bits).
- GOP structure for random access and interactivity

MPEG-2, ISO13818 or H.262, was designed to extend the MPEG-1 functionality to interlaced picture, primarily using BT601 4:2:0 format. The target was to produce TV-quality pictures at data rate of 4~8Mbps and high quality picture at 10~15Mbps. MPEG-2 consists of nine parts.
* Picture Formats Supported
SIF, BT601, SMPTE 296/295M, progressive and interlaced scanning

* Video Coding Tools
- Chroma samples in the 4:2:0 are located horizontally shifted by 0.5 pel compared to MPEG-1, H.261 and H.263
- Two picture structures: frame picture and field picture with I, P and B picture mode
- More predication modes for interlaced video
1) Field predication for field picture: reference picture chosen more flexibly for P and B picture
2) Field predication for frame picture: 16X8 splitting and apply field predication. Two MVs for P and four MVs for B picture
3) Dual prime for P picture: MV and DMV for two reference fields; average two field block to obtain the predication block
4) 16X8 motion predication: Two parts of one MB in field picture for predication with two or four MVs, due to the low vertical resolution of field picture
- Field DCT
Reorganize the pels in MB to increase the vertical correlation within a block
- Alternate scan
Other than zig-zag scan, alternate scan could be used to increase vertical correlation too for frame pictures.
- Modes of scalability: data partition(like spectral selection in progressive JPEG), spatial(like hierarchy coding in JPEG), temporal(subsampling in time), and SNR(like the successive approximation in progressive JPEG)

- For progressive sequence, all the pictures are frame-picture. For interlaced sequence, the picture could be either frame or field picture. In the beginning of interlaced sequence, the order of field parity would be specified and kept all the time, like {top, bottom, top, ...} for NTSC or {bottom, top, bottom, ...} for PAL and 1080i. Field pictures occurs in pairs with different parity but the same picture mode so the GOP is in the unit of frame. Note the luma and chroma sample lattices for field picture with 4:2:0 color sampling. The chroma sample would be shifted up by one quarter luma sample relative to the field sampling grid, while the chroma sample would be shifted down by one quarter luma sample relative to the field sampling grid. Frame and field are the store type of pictures and have almost nothing with coding.
- In order to display the progressive sequence on the interlaced devices, 2-3 pull-down is done. Half of frames would repeat one of its field to generate the required fields. (24 * 2 + 12 = 60 for NTSC)
- For field picture, the predictor must be field predictor for all the MBs. But for frame picture, the predictor could be chosen MB by MB, either field prediction mode or frame prediction mode based on the motion activity between two fields. Both MB type (I, B and P) and predication mode determines the exact predication type of one single MB.
- Frame DCT could be used in progressive sequence and interlaced sequence if only the motion activity in between two fields are little. In interlaced sequence field DCT could be used in frame picture if motion activity are high. And it is possible that field picture uses frame DCT, for example, the 2-3 pull-down sequence.
- The horizontal search range is +/-10bits and vertical one is +/-7bits with half pel precision.
- Unlike JPEG, the quantization table and VLC table for MPEG are standardized even though they could be overrided by ones within the video sequence. Like JPEG, the intra MB would be perceptually weighted when the quantization is being done. But for non-intra MB, they are predication error and not viewed directly, so the quantization matrix is flat. Furthermore, in order to do adaptive quantization for rate control, quantization scale factor (MQUANT) is used to scale up and down the standardized quantization tables.

MPEG-4 is designed for the new generation of highly interactive multimedia applications with supporting traditional applications. It is an object-based coding standard for video, audio, and graphics.

Profiles and Levels
* Profiles describe the tools required for decoding a bit stream. Primarily they are picture mode, color subsampling format, and scalability for MPEG-2. More for MPEG-4.

* Levels describe the parameter ranges for these tools. They are the size of the picture, frame rate and bit rate requirement.

For MPEG-2, common combinations are MP@ML, MP@HL. For MPEG-4, they are simple profile and advanced simple profile.

No comments: