Monday, November 19, 2007

H.264

H.264, Advanced Video Coding (AVC) or MPEG-4 Part10 outperforms the MPEG-4 Visual and H.263 standards, providing better compression of video images. It could output bitrate about half MPEG-2 bitstream with the same quality.

* Picture Format Supported
Almost all video resolutions from SubQCIF to BT.709 are supported. Progressive and interlaced scanning.
Like MPEG-2, the default color sampling is 4:2:0 and the phase relationship between Y and C samples is the same as MEPG-2.

* Coded Data Format (Data Stream Syntax)
- Video Coding Layer (VCL): the output of encoding process, a sequence of bits representing the coded video data, which are mapped to NAL units prior to transmission or storage.
- Network Abstraction Layer (NAL): basic unit of a coded H.264 video sequence. Each contains an Raw Byte Sequence Payload (RBSP). The type of RBSP is indicated in the header of NAL (one byte) and the RBSP data makes up the rest of the NAL unit. Some important RBSPs are Parameter Set (sequence or picture), Coded Slice and End of Sequence, etc.

* Profile and Level
H.264 supports four profiles only, unlike MPEG-4. They are baseline for low bitrate applications, main for broadcasting and storage, extended for media streaming applications and high definition for HD and video studio.

* Video Coding Tools
- No GOB and GOP in bitstreams. Sequence is similar to GOP. Sequence supports progressive and interlaced sequence. Picture formats are field and frame. Each picture has a picture order count which defines the presentation order of this picture. Reference pictures are organized into one or two lists, list0 and list1, with frame numbers.

- A coded picture consists of slices, which is a set of MB or MB pairs in raster scan order. Slices have I-, P- and B-slice. MBs have three types too, I-, P- and B-MB. For I-slice, only I-MB is used. For P-slice, it could contain P and I-MB and a B-slice may contain B and I-MB. Slices are still the basic unit for resync and error recovery and keep independence on each other by applying intra prediction and motion vector prediction only within the same slice.

- I-MB, i.e. intra MB, is totally different from that of previous standards. Intra prediction from decoded samples in the current slice are used for I-MB. And the residual data is transformed, coded and transferred. This is actually the technique of DPCM. Note it is for pixel sample but not the DC component in frequence domain. An alternative to intra prediction is I-PCM for I-MB, which enables an encoder to transmit the values fo the image samples directly without prediction or transformation.

P- and B-MB are inter MB with inter prediction. P-MB uses list0 and B-MB uses both of list0 and list1. The MB partition and MB sub partition are supported and the reference picture might different for each of them. About the reference pictures, they could be before or after current picture in temporal order.

For B-MB, many prediction modes could be used: direct mode, MC from list0, MC from list1, or MC from both list0 and list1. Different modes may be chosen for each partition. And if 8X8 partition size is used the chosen mode would be applied to all sub partition within that partition. Note the backward and forward prediction are not really applicable anymore here.

- Inter Prediction
The differences between H.264 and earlier standards include the support for a range of block size and fine subsample motion vectors.

The luma component of one MB could be split up in FOUR ways, 16X16, two 16X8 partitions, two 8X16 partitions and four 8X8 partitions. For 8X8 partitions, another FOUR sub partitions are supported, i.e. 8X8, two 8X4 and two 4X8 and four 4X4. For chroma components, the same way to partition happens except the different sizes, which have exactly half the horizontal and vertical resolution of luma ones.

Each partition or sub partitions in an inter MB is predicted from an area of the SAME size in the reference picture. Note the different MB or partition might have different reference picture. The offset between the two areas has quarter-sample resolution for luma component and one-eighth-sample resolution for the chroma components. So the interpolation may be necessary for reference pictures. Note here the resolution of chroma components is half of that of luma. So the MV should be halved when applied to the chroma blocks, and the precision for chroma prediction would be half that of the luma, i.e. one eighth sample. For luma interpolation, half samples are generated first with six tap FIR and then quarter samples with average. For chroma interpolation, the linear interpolation or weighted average is used.

The residual data with less energy could be obtained to be coded by using smaller block prediction. However, the number of MVs are increased greatly. Also more side information is needed for correct decoding. In order to decrease the bitrate further, motion vector prediction is used. MV prediction is in the unit of MB, which might have partitions or sub partitions. Different prediction modes might used depending on the motion compensation partition size and on the availability of nearby vectors. In general, three partitions, i.e. left one, upper one and upper right one are used.

- Direct Prediction
No MV is transmitted for a B-MB or partitions in Direct Mode, i.e. they are different from skipped B-MBs. The MV for them would be reconstructed using direct prediction.

- Weighted Prediction
To modify the samples of prediction data in a P/B-MB before the compensation.
Two ways for this function, explicit and implicit weighted predictions.

- Intra Prediction
For luma component, the sizes for intra prediction could be 4X4 blocks with nine modes or 16X16 blocks with four modes. For chroma components, 8X8 blocks are used with four modes.

Note the intra prediction depends on the availability of all the required prediction samples.

Predictive coding is used to signal 4X4 intra modes. The left and upper sub partitions is used for the most probable prediction mode.

- Deblocking Filter

- Transform and Quantisation
Because the minimal size of predication is 4X4, the size of transform would be 4X4 instead of 8X8. The DC component for chroma blocks are transformed further with 2X2 Hadamard. And one special case is for intra MB with 16X16 prediction. The 4X4 Hadamard transform is applied for DC components of each block.

The transmission order for one MB: In general 26 blocks starting from index 0 to 25 is transmitted in order (24 blocks + 2 additional chroma DC sub blocks). For intra MB with 16X16 prediction, one additional block is needed with index -1 and would be transmitted first.

Quantisation step is determined by Quantisation Parameter (QP). Totally 52 values are supported and Q step doubles in size for every increment of six in QP. Such arrangement makes it possible to fine control the bitrate and video quality. Also predictive coding is used for QP in one slice.

The block of 4X4 would be scanned with zig-zag order for frame block and alternate order for field block.

- Entropy Coding

- Interlaced Video

No comments: