SPS and PPS (aka dwSequenceHeader) in Media Foundation's H264 encoder - h.264

I'm using the H264 encoder from Media Foundation (MFT). I extracted the SPS and PPS from it, because I need it for smooth streaming. The MSDN says that the number of bytes used for the length field that appears before each NALU can be 1, 2, or 4. This is all in network byte order. As you can see, the first 4 bytes in the buffer are 0, 0, 0, 1. If we apply any of the possible lengths, we will get nothing. If the number of bytes used for length is 1, then the length is zero, if it is 2, the length is zero again. If it is 4, the length of first NALU is 1?! And, that's not correct. Does anybody know how should I interpret this SPS and PPS concatenated together??

The answer here is simple: the data is valid and formatted according to Annex B, prefixed by start codes 00 00 00 01 and not run length encoding.
H.264 extradata (partially) explained - for dummies
Annex B format
in this format, each NAL is preceeded by a four byte start code: 0x00
0x00 0x00 0x01 thus in order to know where a NAL start and where it
stops, you would need to read each byte of the bitstream, looking for
these start codes, which can be a pain if you need to convert between
this format and the other format.
More details on H.264 spec - freely available for download. Page 326 starts with "Annex B - Byte stream format".

Related

How to understand header of H.265

Could someone explain me the difference between the H.264 header and H.265 header? I just need to parse H265 header but I have difficult to find proper reference.
I did a first version of the parser. I need to retrieve the pic_width_in_luma_samples, pic_height_in_luma_samples, and the aspectRatioH, aspectRatioV.
my code is something like:
while (buf->Size > 0)
{
//forbidden bit
flushbits(buf, 1);
int nNALType = showbits(buf, 6);
if (nNALType == NAL_TYPE_SPS)
{
// flushbits until I retrieve desired parameter
flushbits(buf, 4); // sps_video_parameter_set_id
}
else
{
// align bits
buf->Size -= buf->BitsLeft & 0x7;
}
}
this is the correct way to do? There is a method where I can skip bits until I find a "start sequence" that indicates my desired SPS NAL TYPE?
follow those step to parse H265
each NAL unit starts with a start code that is 3 bytes with value 0x01 (so 00 00 01). Identify each NAL unit;
parse the header (2 bytes)
for the other part of NAL sequence: find for 3 byte sequence 00 00 03, keep the first 2 bytes (00 00) and discard the 03 byte.
with the bytes don't discarded, you can do the parsing (depending of the NAL unit type you have)
The syntax of H.264 and H.265 is relative similar.
Both have parameter sets (PPS, SPS) you find the details in the specification below.
For H.265 - page 33 section 7.3 describes the video parameters sets in detail.
The specification is done in 'C' like pseudocode so relatively easy to translate the specification into compiling code.
You can always look at some existing code - for example:
https://github.com/GStreamer/gstreamer/blob/main/subprojects/gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c
The H.264 (AVC) specification is here:
https://www.itu.int/rec/T-REC-H.264-202108-I/en
The H.265 (HEVC) specification is here:
https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.265-201802-S!!PDF-E&type=items

The maximum size with bytes of one NAL unit

I can't find any clue in H.264 spec, anybody can give a clear and simple maximum size regardless of its profile and level?
I'd like to parse H.264 stream, and copy one complete NAL unit buffer to a fixed-size buffer which can hold all bytes of one NAL unit.
Thanks.
AVC level 6.2 allows up to 139264 macro blocks per frame. If we use 10 bit color 4:4:4 it’s 30 bits per pixel. So (30*139264*16*16)/8 gives about 133.7mbytes for an uncompressed image. H.264 has a PCM_I encoding that allows for uncompressed images. There is a little ovehead for the NAL header, so let’s call it 134Mbyte. But in the real world the frame probablly will not be this large, and will likely be compressed.
I don't think there is a maximum for video NALs. In Annex-B format the NALs are start code delimited 0x00 0x00 0x00 0x01 so there is no limitation through a size field. In MP4 format the size field can have more capacity than most computer's RAM. I'd make a reasonable assumption what buffer size to expect and then reallocate if the maximum is exceeded.

H.264 payload with no '0x00 0x00 0x00 0x01 0x65'

I'm trying to detect I-frames in TS by searching for the:
0x00 0x00 0x00 0x01 0x65
But, it doesn't work on some streams. In some streams this sequence occurs very rare. Is there any other way of detecting I-frames?
Edit:
I also tried saving TS to a file and then extracting H.264 payload. The extracted payload contains only a few 0x00 0x00 0x00 0x01 0x65 byte sequences.
What you are trying to do looks like a blind guess. H.264 specification is freely available. 00 00 00 01 is described in Annex B "Byte stream format" section. Then your 65 is what maps to section 7.3.1 "NAL unit syntax":
So you can split your byte stream into NAL units correctly and identify why your heuristic is not detecting I-Frames. Specifically, you are assuming two bit value to be equal to three exactly.
Also, slice types are defined as this:
See also:
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream
How to detect I/P/B frame from H264 RTP packet

usage of start code for H264 video

I have general question about the usage of start code (0x00 0x00 0x00 0x01) for the H264 video. I am not clear about the usage of this start code as there is no reference in the RTP RFCs that are related to H264 video. But I do see lot of reference in the net and particularly in the stackoverflow.
I am confused as I see one client doesn't have this start code and another client is using this start code. So, I am looking for a specific answer where this start code should be used and where I shouldn't.
KMurali
There are two H.264 stream formats and they are sometimes called
Annex B (as found in raw H.264 stream)
AVCC (as found in containers like MP4)
An H.264 stream is made of NALs (a unit of packaging)
(1) Annex B : has 4-byte start code before each NAL unit's bytes [x00][x00][x00][x01].
[start code]--[NAL]--[start code]--[NAL] etc
(2) AVCC : is size prefixed (meaning each NALU begins with byte size of this NALU)
[SIZE (4 bytes)]--[NAL]--[SIZE (4 bytes)]--[NAL] etc
Some notes :
The AVCC (MP4) stream format doesn't contain any NALs of type SPS, PPS or AU delimter. Since that specific information is now placed within MP4 metadata.
The Annex B format you'll find in MPEG-2 TS, RTP and some encoders default output.
The AVCC format you'll find in MP4, FLV, MKV, AVI and such A/V container formats.
Both formats can be converted into each other.
Annex B to MP4 : Remove start codes, insert length of NAL, filter out SPS, PPS and AU delimiter.
MP4 to Annex B : Remove length, insert start code, insert SPS for each I-frame, insert PPS for each frame, insert AU delimiter for each GOP.

Little Endian - Memory content/address

Consider a system that has a byte-addressable memory organized in 32-bit words according to the big-endian scheme. A program reads ASCII characters entered at a keyboard and stores them in successive byte locations starting at location 1000.
Show the contents of the two memory words at locations 1000 and 1004 after the name johnson has been entered. Write this in the little-endian scheme.
What I got was:
[NULL, n], [o, s], [n,h], [o,j]
00, 6E 6F, 73 6E, 68 6F, 6A
I just want to know if this is correct and if not, what I did wrong.
Thank you all!
There is no such thing as endianes for storing a single byte (such as an ASCII character). Endianes only comes into play when a value is represented as multiple bytes. So for example, storing a sequence of bytes is the same in little- and big-endian, only the representation of the bytes are different. For example, take the number 3 735 928 559 (or 0xdeadbeef in hex notation) and store that as a 32-bit word (e.g., an int) at memory location 1000 will give:
ADR: 1000 1001 1002 1004
BE: de ad be ef
LE: ef be ad de
So, if you were to actually represent your ASCII character as a 32-bit word you would get:
[0, 0, 0, 6a], [0, 0, 0, 6f], ... or,
[6a, 0, 0, 0], [6f, 0, 0, 0], ...
for BE and LE respectively.
I find this question quite confusing.
byte is normally defined as the smallest addressable unit so saying that a machine has byte-addressable memory just tells nothing: every machine has byte-addressable memory because that's the definition of what a byte is, what can change is how many bits is a byte.
If the question is talking about a 32-bit byte machine (I know they exists, but I personally used only machines with 8-bit and 16-bit bytes) then it's not clear what role is playing endian-ness given that no multibyte processing is needed for storing ASCII.
What is often done in large-byte machines is however storing multiple characters per byte to save space (not necessarily a 16-bit byte machine is "big": the one I know is a DSP with a very limited amount of memory) but this seems unrelated to the question and there so "standard" way to do so anyway.
If instead the question assumes that a byte is always 8 bit by definition and talks about storing ASCII chars then once again endian-ness plays no role; chars are just store in memory one after another in consecutive locations. For example if the string "johnson" has been stored (assuming a C string convention) the content of memory would be:
0x6A 0x6F 0x68 0x6E 0x73 0x6F 0x6E 0x00
Reading this memory content as two 32-bit words would be affected by endian-ness of course, but saying that the machine uses big-endian and asking to display the result in little-endian scheme is nonsense.
In a big endian scheme (e.g. 68k) the two 32-bit words would be 0x6A6F686E and 0x736F6E00, in a little-endian scheme (e.g. x86) they would be 0x6E686F6A and 0x006E6F73.