What ISO/IEC describes H.264 NALUs, SPS, PPS, etc..? - h.264

What ISO standard describes in detail H.264 NAL units, SPS. PPS, etc..?
Is it ISO/IEC 14496-10, ISO/IEC 14496-15, or some other?

I is ISO 14496-10 or ITU-T h.264.

Related

Understanding AVC Codecs: avc1.42c020 vs avc1.428020

Looking for help understanding the difference between the codecs:
avc1.42c020 and avc1.428020
I have a program that can request video in either of these formats but I'm not sure which one I should choose. Is one higher quality than the other? Will one impact CPU usage / network bitrate more than the other? Or are these mostly the same?
Hoping someone can explain what the numbers represent or point me in the right direction to look it up. Thanks!
This is described in Section 7 of RFC 6190. The three bytes that intrigue you are called the profile-level-id, and indicate the profile and sub-profile of the AVC codec that the peer supports. In your particular case,
42c020 indicates support for the Constrained Baseline profile, and 428020 indicates support for the Baseline profile.
The Baseline profile has slightly better support for dealing with packet loss, but some devices might not support it (only the Constrained Baseline profile is compulsory to implement in WebRTC according to RFC 7742). In practice, however, WebRTC doesn't need the features omitted in the Constrained Baseline profile (it has other mechanisms for dealing with packet loss), so it should be fine to choose the Constrained Baseline profile in all cases.

comparing h.264 encoding decoding performance

I am beginner of video codec. not an video codec expert
I just want to know base on the same criteria, Comparing H254 encoding/decoding which is more efficiency.
Thanks
Decoding is more efficient. To be useful, decoding must run in real time, where encoding does not (except in videophone / conferencing applications).
How much more efficient? An encoder can generate motion vectors. The more compute power used on generating those motion vectors, the more accurate they are. And, the more accurate they are, the more bandwidth is available for the difference frames, so the quality goes up.
So, the kind of encoding used to generate video for streaming or distribution on DVD or BD discs can run many times slower than real time on server farms. But decoding for that kind of program is useless unless it runs in real time.
Even in the case of real-time encoding it takes more power (actual milliwatts, compute cycles, etc) than decoding.
It's true of H.264, H.265, VP8, VP9, and other video codecs.

What is the diffrent between h264 & h264+

I have looking on rfc of h264, and looking the diffrent between h264 and h264+ ,not in the context of compression quality but how the packets are configured in RTP stream.
Is h264+ has another NAL type number (SPS,PPS,IDR etc.) is there any NAL type that configure at h264+ and not configure in h264?
I saw the in h264+ SPS and PPS and IDR send very rarely compared to h264 , why is that?
h264+ seems to be only a commercial denomination. There is no specification about this.

How to decide using MTAP over STAP in h264 rtp payload

I've been digging into encoding and streaming h264 over the past week. To night I'm implementing the rtp h264 payload.
According to RFC 3984 ("RTP Payload Format for H.264 Video - February 2005")
multiple new NALU were introduced. Among them MTAP (multi time aggregation packet) and the STAP (single time aggre...).
As the names indicating, in STAP mode, all units is assumed to have the same timestamps. Does not that mean we can't use STAP for VCL NAL units?
For example one may use STAP for transmitting NAL types 7 or 8 (SPS, PPS) but can't use STAP for types 1,2,3?
You can use STAP packets for aggregating both VCL and Non-VCL NALUs that have the same presentation time, which should be the case if they are part of the same frame.
You're encoder should be providing a series of NALUs for the frame and they should have the same presentation time.
I've worked with encoders that produced a NALU byte stream containing all NALUs for the frame. This byte stream was assigned a single presentation time. I've also seen encoders that produced individual NALUs, which multiple would have the same presentation time if they were part of the same frame.

Is it possible to remove start codes using NVENC?

I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
Force NVENC to encode individual NAL units
Force NVENC to indicate where the NAL units in each encoded data block are
Make Live555 accept the sequence parameters for streaming
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.