How to decide using MTAP over STAP in h264 rtp payload - h.264

I've been digging into encoding and streaming h264 over the past week. To night I'm implementing the rtp h264 payload.
According to RFC 3984 ("RTP Payload Format for H.264 Video - February 2005")
multiple new NALU were introduced. Among them MTAP (multi time aggregation packet) and the STAP (single time aggre...).
As the names indicating, in STAP mode, all units is assumed to have the same timestamps. Does not that mean we can't use STAP for VCL NAL units?
For example one may use STAP for transmitting NAL types 7 or 8 (SPS, PPS) but can't use STAP for types 1,2,3?

You can use STAP packets for aggregating both VCL and Non-VCL NALUs that have the same presentation time, which should be the case if they are part of the same frame.
You're encoder should be providing a series of NALUs for the frame and they should have the same presentation time.
I've worked with encoders that produced a NALU byte stream containing all NALUs for the frame. This byte stream was assigned a single presentation time. I've also seen encoders that produced individual NALUs, which multiple would have the same presentation time if they were part of the same frame.

Related

comparing h.264 encoding decoding performance

I am beginner of video codec. not an video codec expert
I just want to know base on the same criteria, Comparing H254 encoding/decoding which is more efficiency.
Thanks
Decoding is more efficient. To be useful, decoding must run in real time, where encoding does not (except in videophone / conferencing applications).
How much more efficient? An encoder can generate motion vectors. The more compute power used on generating those motion vectors, the more accurate they are. And, the more accurate they are, the more bandwidth is available for the difference frames, so the quality goes up.
So, the kind of encoding used to generate video for streaming or distribution on DVD or BD discs can run many times slower than real time on server farms. But decoding for that kind of program is useless unless it runs in real time.
Even in the case of real-time encoding it takes more power (actual milliwatts, compute cycles, etc) than decoding.
It's true of H.264, H.265, VP8, VP9, and other video codecs.

What is the diffrent between h264 & h264+

I have looking on rfc of h264, and looking the diffrent between h264 and h264+ ,not in the context of compression quality but how the packets are configured in RTP stream.
Is h264+ has another NAL type number (SPS,PPS,IDR etc.) is there any NAL type that configure at h264+ and not configure in h264?
I saw the in h264+ SPS and PPS and IDR send very rarely compared to h264 , why is that?
h264+ seems to be only a commercial denomination. There is no specification about this.

Re encode an audio stream recording on the fly?

Is it possible to rip an audio steam with Variable Bit Rate encoding and re encode it on the fly as it is being recorded with Constant Bit Rate encoding?
I am downloading an audio stream in AAC format with VBR encoding using cURL.
The duration of a VBR encoded file will be calculated by its byte length, resulting in a discrepancy for the duration on different players. The result of this duration discrepancy does not allow me to seek and slice precisely. I would need to re encode it somehow with a constant bit rate to get the seeking to work properly.
The audio stream is hours long so re encoding it afterwards takes way too much time and processing power.
Is there anything I can do about this?
Perhaps I can specify some settings in cURL to achieve a constant recording bit rate?

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.

Mpeg2 PES demultiplexor: how to extract PES packets with H.264 Video Stream?

I detect new PES packet in PES demultiplexor searching packet_start_code_prefix (0x000001). When it occures then I can read PES_packet_length and so I can extract the current PES packet from byte stream. But if it is a H.264 video stream then PES_packet_length=0.
How to extract PES packet in such a case? 0x000001 also may occur in H.264 nal unit byte stream so I can't use this prefix for finding next PES packet.
I noticed that in every H.264 PES packet last nal unit in PES packet is a Filler data (nal_unit_type=12). Do I need to use this fact for detecting the end of current PES packet?
Normally no, this is impossible without knowing the length the the PES packet. However, Because you limit yourself to H.264, we can take advantage of a lucky accident.
An h.264 stream_id is 0xE0. The first bit of a nalu is always 0. so 000001E0 happens to be illegal within an annex B Stream. You must still parse the PES header to determine its length, because the first byte after the PES header may be the tail of a previous NALU, and thus may not necessarily be an annex b start code.
Keeping this for posterity.
You can not simply look for start codes, you need to parse the packet. If this is a transport stream You find the start of the PES by looking for the payload unit start indicator. Then parse the adaption field if on exists. Now you will have your start code (000001E0 in this case. Then look look at the flags. Parse out the 33 bit PTS/DTS (you will need that for playback) and skip any optional fields (determined by the flags in the PES header). You now will have the start of your h.264 ES. Continue parsing the TS. for every TS with the same PID and payload unit start indicator = false, you are reading the frame. Once the payload unit start indicator is true, you have a new PES packer/frame.