Wednesday, December 26, 2007

Inter Prediction MV Derivation

Totally three modes are used to derive inter prediction motion vector for MB partitions/Sub MB partitions.
* P skip mode
* B skip/direct mode (spatial/temporal)
* Others

The mv's could be directly calculated in the first two modes while they are derived with mvp and mvd in the third mode. There are two issues which are very important in the course of the derivation, i.e. how to locate the colPartition/colBlk in the colPic and how to locate the neighboring partition/blk in the current pic.

* ColPartition/colBlk
Note: The basic processing unit for B_skip, B_16X16_direct and B_8X8_direct are 4X4 blk. For P skip it is 16X16.
Three parameters are needed for temporal direct mode: colPic, colPart/colBlk, and refIdxL0
- colPic: different combinations of fld, frm and afrm. Table 8-6
- colPart/colBlk: The basic unit is 4X4 block in current MB and colMB. If the partition in colMB is larger than this unit, the motion inforamtion would be copied on all the composited blocks. When 8X8_direct_flag is enabled, the basic units in one 8X8 sub partition of current MB share the same mv and refidx. Otherwise, 16 blocks have their own mv and refidx. To save memory, mv and refidx are needed for only several units in colMB for the derivation. The basic unit mapping could be defined with Table 8-8. LUT could be used to implement this kind of mapping.
- refIdxL0: Keep in mind the picture structure of colRef might be different. The current MB is field MB in afrm is always a special case.

* Neighboring partition/blk
The basic unit is still 4X4 blk as the above and it is possible that multiple units share the same mv information due to copy operation. It is not necessary to store all the blk information for the derivation. For example, only 4 units are needed for B and C. It is special for the processing of A and D, however. And all the blk information is needed in the current MB.

Tuesday, December 11, 2007

Neighboring Location Derivation

P6.4.9 is used to define the address of neighboring MB/Partition/Sub MB partition, given one location coordinates. The derivation would be much more complicated when MBAFF is enabled and Table 6-4 would be used. Under this circumstance, the following points should be kept in mind.

- The basic unit in one slice would be a MB pair instead of a MB. Therefore the indicing order for MB is different.
- If MBAFF is disabled, only PAFF is applicable, which means slices/MBs structure is the same for the whole picture. When MBAFF is enabled the neighboring MB may be either field or frame MB.
- Only top MbAddr could be derived by using p6.4.7.
- Field MB and frame MB have different starting points and steps vertically. Keep in mind that two fields are interleaved in the FRAME grid. The starting point for top field would be the most left point of first line while bottom field be the most left point of the second line. The unit step for fields would be two instead of one for frames.
- When coordinates are negative, only one value is possible: -1.
- Note this clause just defines the derivation of neighboring MB index since MaxW and MaxH are for MB boundary. For some cases of calculating neighboring sub MB partitions, it might be possible that neighboring partition C is located on the top MB. Here the key parameter is the difference of luma/chrma location: xD, yD for MB, MB partition, subMB partition, luma8X8Blk, luma4X4Blk, chroma4X4Blk.
- The predPartWidth of xD is special for these cases: P skip, B skip, B direct 16X16 and B direct 8X8, it would be 16. Otherwise, it would be SubMbPartWidth/MbPartWidth.
- Generally the MB could be divided into 16 blocks each of which stores the motion information. The motion information might be the same for one partition or sub MB. Given one MB/Partition/Sub MB Partition, the neighboring ones could be located with the first block of this partition with the help of x, y and predPartWidth, according to Table 6-3/6-4.

Monday, December 3, 2007

Reference Picture Management

- The processing flow is reference list initialization (setup and sort) -> resorting -> decoding one picture -> reference picture marking
- Before decoding one picture, the reference pictures for every MB/partition should be ready with reference list0/1. The pictures in DPB with reference flag would be put into lists (the picture with non reference flag should not be put into DPB?). Sorting means that the short term reference pictures would be first with the decreasing PicNum order and the long term reference pictures follow with the increasing LongTermPicNum order.
- It might be possible that some reference pictures no matter if they are short or long term reference would be used more often than others by MB. The small indexes of lists for these reference pictures would reduce bitrate further. So resorting process starts on every picture of lists based on the slice header information of the current slice.
- After the current picture is decoded, this picture would be flagged as three modes: unused for reference, used for short term reference, and used for long term reference. And it is stored into DPB if it is used for reference picture. This is called reference picture marking. Also it includes the memory management of DPB. Two ways are used for this management: sliding window or adaptive_ref_pic_marking_mode with 7 commands.

Syntax elements of H.264

* Syntax for video sequence stream
- VCL represents the content of the video data. NAL is to format that data and provide header information for comm and storage.
- Big endian for video stream in byte while little endian for bits in one byte with LSBit on the right. The MSBit is always first in bit stream.
- One coded slice NAL needs to contain all the data of one slice. The data struct is slice header, slice data and trailing bits. This means NAL represents one slice of a picture instead of one picture.

- Syntax elements for NAL
1). nal_ref_idc:
For seq and pic parameter sets NAL, it shall be 1.
For slices of reference pic, it shall be 1.
For slices of non reference pic, it shall be 0.
2). nal_unit_type:

- Syntax elements for Seq Parameter Set (SPS)
0) seq_parameter_set_id: [0, 31]
1) log2_max_frame_num_minus4: [0, 12] (maximal num of MaxFrameNum is 2^16)
2) pic_order_cnt_type: specify the method to decode picture order count. [0, 2]
3) log2_max_pic_order_cnt_lsb_minus4: [0, 12] (MaxPicOrderCntLsb)
4) several elements for the decoding of picture order count
5) num_ref_frames: [0, MaxDpbSize], the sum of reference frames, complementary reference field pair and non-paired reference fields
6) frame_mbs_only_flag: indicate only frames exist in the video seq
7) mb_adaptive_frame_field_flag:

- Syntax elements for Pic Parameter Set (PPS)
0) pic_parameter_set_id: [0, 255]
1) mb to slice group map
2) QP initial value for Y/C

- Syntax elements for Slice header
0) first_mb_in_slice: MB index in general and MB pair index for MBAFF
1) slice_type: IDR only contains I/SI slices and so does the video seq when num_ref_frames is 0
2) frame_num: number reference pictures.
3) field_pic_flag: this slice is one of a coded field, i.e. the picture is field picture. The picture structure could be defined with this flag. But if it is 0 the MB structure may be either frame or field.
4) bottom_field_flag: this slice is part of a coded bottom field. The picture is bottom field.
5) pic_order_cnt_lsb: the picture order count modulo MaxPicOrderCntLsb for the top field of a coded frame or for a coded field.
6) delta_pic_order_cnt_bottom
7) delta_pic_order_cnt[0-1]?
8) idr_pic_id: identifies an IDR picture. All slices in one IDR have the same value of idr_pic_id.

- Syntax elements for slice data
0) mb_field_decoding_flag: identify if the current MB is field or frame structure in MBAFF mode