Saturday, January 22, 2011

H.264 decoding delay

Quote one email thread from Intel IPP for the understanding of DPB operation/decoding delay idea in h264:

Well, I think you're incorrect on the buffer (DPB) described in the H.264 specification is merely a suggestion as such. The buffering mechanism (however it is handled) will have to adhere to the specification to claim it is a conforming decoder (see Appendix C). Note that there are two types of conformance, output timing conformance and output order conformance.

To my knowledge: The Intel implementation in the IPP samples can only deliver in the correct reordered output order, whereas some of the other codecs also allow decoding order (immediate) output ordering. When providing the reordered output, buffering in the decoder needs to take place to take care of the reordered frames. Usually, this would be B-frames, but in H.264 this can also be P-frames. Therefore, for the GOP pattern described, we can not really know - but we assume that no reordering is taking place (as it just adds to the delay). In general, the decoder can not in advance know whether or not a reordered picture may appear at some point in the stream. Therefore, it seems that Intel has chosen a "safe path" in that the decoder use the "worst" possible buffering (delaying) that would be necessary to deliver the stream in a fluent manner. Elaborating on that, if the decoder did not buffer (delay) and an out-of-order picture suddenly appears, the flow out of the decoder would contain a gap as the out-of-order picture would need to be buffered before output. In other words, the decoder will buffer to the maximum number of pictures that is allowed for a given stream (I'll come to that later) to be able to deliver the frames in a fluent (one-by-one) flow.

The maximum buffering required is determined by the 'max_dec_frame_buffering' parameter as described in the H.264 specification in Annex E. This is part of the bitstream_restrictions in the VUI parameters of the SPS. As it is optional, the parameter is to be derived from 'MaxDpbSize', which again is specified/derived from the profile and level and the coded picture resolution as defined in Annex A. Note that the 'max_dec_frame_buffering' parameter is constrained at the low end to be >= the 'num_ref_frames' parameter of the SPS. The Intel decoder uses the 'max_dec_frame_buffering' parameter to set the "worst-case" buffering, and thus you can with the right encoding parameters and with the proper addition of the VUI parameters set this as low as possible to obtain the smallest possible buffering.

The 'max_dec_frame_buffering' parameter defines the maximum for the 'num_reorder_frames' parameter, which is also given in the VUI set. This thus sets a limit to the amount of reordering that can occur in a stream. This is actually the only information a decoder can derive about reordering directly from the H.264 stream. The SPS thus does not explicitly state whether or not there will be B-pictures in a stream (and P-pictures may also be reordered), and it also does not state if they actually do appear, i.e. even if the 'num_reorder_frames' parameter is >0, the stream is not required to actually use it.

Anyway, this does not mean you can handle it otherwise; especially if you have a closed-circuit system with control over both encoder and decoder side, as it seems to be the case. In this case, it is essential to choose the right encoding parameters and provide the right information in the stream, and/or adapt the decoder to use as little buffering as possible.

Hope this helps shed some light on the subject...


- Jay