Tuesday, November 20, 2007

Concepts of H.264

0. Abbreviations
- Access Unit: a set of NAL units always containing exactly one primary coded picture. One or more redundant coded pictures or other NAL units not containing slices or slice data partitions of a coded picture. The decoding of an access unit always results in a decoded picture.
- Coded Frame/Field: No frame/field picture concepts in h.264. Coded frame consists of 2 field coded together as a single picture. A complementary field pair consists of two fields coded as separate pictures. No pic order count relationship is required for coded frames or complementary field pair. In general they would be stored in one frame buffer. The only requirement for them is that no other pics have order counts that fall in between the order counts of these two fields. Once a frame is decoded, it contains two fields. The two fields together can be used to predict a coded frame, or each of those fields can also be used separately as reference pic to predict a coded field. Two subsequent fields can be coded as separate pic which once decoded, are combined together as complementary ref OR non-ref field pair. Note coded fields may either be part of complementary field pairs or they may be non-paired fields.
There are two kinds of complementary field pairs, complementary reference field pairs(both of the two fields are reference picture) and complementary non-ref field pairs(both of the fields are non-reference picture). If the two fields of a frame is different for reference property,for example, one is reference picture and the other is non-reference picture,either of the field is non-paird field. the reference one is called non-paired reference field, and the non-reference one is called non-paired non-reference field.
If field pictures are used they should occur in pairs and together constitute one coded frame. When coding interlaced sequences using frame pictures, two fields should be interleaved with one another and then the entire frame is coded as one frame picture. -- MPEG-2
- IDR: Instantaneous Decoding Refresh, similar to I picture. The picture of memory management control operation that marks all reference pictures as unused for reference (with value of 5) has the same function. A video seq shall start with one IDR picture and the following are all non-IDR pictures. So for h.264, the seq is similar to GOP of MPEG-2. So there is another NAL of end of stream, indicating the end of video stream. IDR could be used for short term or long term reference picture. Non-IDR would be short term reference picture.
- Decoded Picture Buffer (DPB): Store all the reconstructed pictures
- Picture order count: Non-decreasing value relative to the previous IDR picture in decoding order. It is used to identify the dependence of the OUTPUT picture ordering.
- Frame number: in the decoding order instead of presentation order to number REFERENCE pictures. B picture is not reference pic, it could be ignored and the frame num of I/P increments. B pic is reference pic, the frame num is exactly the decoding order. Note: it would be reset to zero when an IDR picture is obtained.
- PicNum: frame num for short-term reference pic based on current frame num and reference pic frame num
- LongTermPicNum: specified externally

1. The index of MB in pictures
In general MBs are indexed in the raster scanning order. In the case of MB-adaptive frame/field mode, the MB pair is used and each MB would be indexed first in its MB pair and then incremented in the raster scanning order of MB pair. This might be used for inverse scanning processes (6.4)

3. Availability for current MB and neighbouring MB
- Not available if one of three conditions is satisfied for current MB.
- Special cases for neighbouring MB

4. Coordinates in the picture
- X: right is positive
- Y: down is positive

5. Derivation process for neighbouring MB, block (4X4 or 8X8) and partitions (6.4.8)
- The objective is to get the index of the neighbouring units (A B C D). The key step is to use the routine in (6.4.9). Its input is a luma or chroma location (xN, yN) expressed relative to the upper left corner of the current MB. It outputs the MB index that contains (xN, yN) and its location relative to the upper left corner of this resulting MB.
- The location difference Table 6-2?
From (6.4.1) to (6.4.6), one location of the unit relative to the picture or MB or sub partition could be calculated. In order to use routine in (6.4.9) to get the index of the neighbouring unit, one location within the neighbouring unit is needed. Table 6-2 gives the relationship between these two locations.

No comments: