How to understand header of H.265 - h.264

Could someone explain me the difference between the H.264 header and H.265 header? I just need to parse H265 header but I have difficult to find proper reference.
I did a first version of the parser. I need to retrieve the pic_width_in_luma_samples, pic_height_in_luma_samples, and the aspectRatioH, aspectRatioV.
my code is something like:
while (buf->Size > 0)
{
//forbidden bit
flushbits(buf, 1);
int nNALType = showbits(buf, 6);
if (nNALType == NAL_TYPE_SPS)
{
// flushbits until I retrieve desired parameter
flushbits(buf, 4); // sps_video_parameter_set_id
}
else
{
// align bits
buf->Size -= buf->BitsLeft & 0x7;
}
}
this is the correct way to do? There is a method where I can skip bits until I find a "start sequence" that indicates my desired SPS NAL TYPE?

follow those step to parse H265
each NAL unit starts with a start code that is 3 bytes with value 0x01 (so 00 00 01). Identify each NAL unit;
parse the header (2 bytes)
for the other part of NAL sequence: find for 3 byte sequence 00 00 03, keep the first 2 bytes (00 00) and discard the 03 byte.
with the bytes don't discarded, you can do the parsing (depending of the NAL unit type you have)

The syntax of H.264 and H.265 is relative similar.
Both have parameter sets (PPS, SPS) you find the details in the specification below.
For H.265 - page 33 section 7.3 describes the video parameters sets in detail.
The specification is done in 'C' like pseudocode so relatively easy to translate the specification into compiling code.
You can always look at some existing code - for example:
https://github.com/GStreamer/gstreamer/blob/main/subprojects/gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c
The H.264 (AVC) specification is here:
https://www.itu.int/rec/T-REC-H.264-202108-I/en
The H.265 (HEVC) specification is here:
https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-H.265-201802-S!!PDF-E&type=items

Related

Is it possible to get H.264's Picture Parameter Set (PPS) from Android encoder?

I am encoding bitmap to H.264. Please allow me to skip the code here because other places have excellent descriptions such as this one. It would take up a lot of space. The main idea is configuring MediaCodec to do the encoding.
The encoding appears to work well. The output frames have the following H.264 NAL unit types:
7 (Sequence parameter set)
5 (Coded slice of an IDR picture)
1 (Coded slice of a non-IDR picture)
1
1
1
...
You can see it generates SPS but not PPS. My understanding is that PPS is needed for producing a valid MP4 file.
Is there a way to obtain PPS from the encoder?

Reading / Computing Hex received over RS232

I am using Docklight Scripting to put together a VBScript that communicates with a device via RS232. All the commands are sent in Hex.
When I want to read from the device, I send a 32-bit address, a 16-bit read length, and an 8-bit checksum.
When I want to write to the device, I send a 16-bit data length, the data, followed by an 8-bit checksum.
In Hex, the data that is sent to the device is the following:
AA0001110200060013F81800104D
AA 00 01 11 02 0006 0013F818 0010 4D
(spaced for ease of reading)
AA000111020006 is the protocol header, where:
AA is the Protocol Byte
00 is the Source ID
01 is the Dest ID
11 is the Message Type
02 is the Command Byte
0006 is the Length Byte(s)
The remainder of the string is broken down as follows:
0013F818 is the 32-bit address
0010 is the 16 bit read length
4D is the 8-bit checksum
If the string is not correct, or the checksum is invalid the device replies back with an error string. However, I am not getting an error. The device replies back with the following hex string:
AA0100120200100001000000000100000000000001000029
AA 01 00 12 02 0010 00010000000001000000000000010000 29
(spaced for ease of reading)
Again, the first part of the string (AA00011102) is a part of the protocol header, where:
AA is the Protocol Byte
01 is the Source ID
00 is the Dest ID
12 is the Message Type
02 is the Command Byte
The difference between what is sent to the device, and what the device replies back with is that the length bytes is not a "static" part of the protocol header, and will change based of the request. The remainder of the string is broken down as follows:
0010 is the Length Byte(s)
00010000000001000000000000010000 is the data
29 is the 8-bit Check Sum
The goal is to read a timer that is stored in the NVM. The timer is stored in the upper halves of 60 4-byte NVM words.
The instructions specify that I need to read the first two bytes of each word, and then sum the results.
Verbatim, the instructions say:
Read the NVM elapsed timer. The timer is stored in the upper halves of 60 4-byte words.
Read the first two bytes of each word of the timer. Read the 16 bit values of these locations:
13F800H, 13F804H, 13808H, and continue to 13F8ECH.
Sum the results. Multiply the sum by 409.6 seconds, then divide by 3600 to get the results in hours.
My knowledge of bits, and bytes, and all other things is a bit cloudy. The first thing I need to confirm is that I am understanding the read protocol correctly.
I am assuming that when I specify 0010 as the 16 bit read length, that translates to the 16-bit values that the instructions want me to read.
The second thing I need to understand a little better is that when it tells me to read the first two bytes of each word, what exactly constitutes the first two bytes of each word?
I think what confuses me a little more is that the instructions say the timer is stored in the upper half of the 4 byte word (which to me seems like the first half).
I've sat with another colleague of mine for a day trying to figure out how to make this all work, and we haven't had any consistent results with our trials.
I have looked on the internet to find something that would explain this better in the context being used.
Another worry is that the technical data I am using to accomplish this project isn't 100% accurate in their instructions, and they have conflicting information or skipping information throughout their publication (which is probably close to 1000 pages long).
What I would really appreciate is someone who has a much better understanding of hex / binary to review the instructions I've posted, and provide some feedback on my interpretation of the instructions provided, and provide any information.

validation of single h264 AVC nal unit

I have extracted several nal units from hard disk. I want to know which of them is valid nal unit or not. Is there any tool or code that can validate the structure or syntax of single h264 AVC nal unit.
It depends. First you need to figure out what the NAL type is by the first byte. If the NAL is an SPS or PPS you can basically decode that as-is and see if the result is sane.
If the NAL is an actual coded slice, you will need at least three NALs to decode it. The corresponding SPS, PPS and the coded slice. You can decode the first few elements of the slice header without the SPS and PPS, but then you would need the corresponding SPS and PPS based on the PPS ID in the slice header to continue.
There were some command line tools (maybe h264_parse) that would dump this type of header information for you, or you can hack the reference decoder to help you out.
http://iphome.hhi.de/suehring/tml/
In the end the only way to know if your NAL is "good" is to either match it up with the bitstream you started out with or fully decode it and verify the resulting picture output as bit-exact.
Checking the NAL byte length and maybe a checksum or CRC of each NAL can be helpful too, but no such mechanism exists in the bitstream, you'd have to add that on.

SPS and PPS (aka dwSequenceHeader) in Media Foundation's H264 encoder

I'm using the H264 encoder from Media Foundation (MFT). I extracted the SPS and PPS from it, because I need it for smooth streaming. The MSDN says that the number of bytes used for the length field that appears before each NALU can be 1, 2, or 4. This is all in network byte order. As you can see, the first 4 bytes in the buffer are 0, 0, 0, 1. If we apply any of the possible lengths, we will get nothing. If the number of bytes used for length is 1, then the length is zero, if it is 2, the length is zero again. If it is 4, the length of first NALU is 1?! And, that's not correct. Does anybody know how should I interpret this SPS and PPS concatenated together??
The answer here is simple: the data is valid and formatted according to Annex B, prefixed by start codes 00 00 00 01 and not run length encoding.
H.264 extradata (partially) explained - for dummies
Annex B format
in this format, each NAL is preceeded by a four byte start code: 0x00
0x00 0x00 0x01 thus in order to know where a NAL start and where it
stops, you would need to read each byte of the bitstream, looking for
these start codes, which can be a pain if you need to convert between
this format and the other format.
More details on H.264 spec - freely available for download. Page 326 starts with "Annex B - Byte stream format".

Create MDAT from I-frame/P-frame fragments

I am creating an MPEG-4 file from H.264 stream. H.264 stream comes in NAL format (EG: 0,0,0,1,67,...,0,0,1,68,...).
Each video frame is transmitted as multiple I-frame/P-frame fragments. For eg: Frame 1 contains approximately 80 I-frame fragments and Frame 2 contains around 10 P-frame fragments.
I understand that MDAT atom of the MPEG-4 file is supposed to contain H.264 streams in NAL format.
I would like to know how these fragments can be converted to a single I-frame before I can put it into MDAT atom of MPEG-4.
I do not want to use any libraries.
Thanks for your help.
You are going to convert H.264 Annex B NAL stream into MP4 file packets. In order to do that you need to:
Split your original file into NAL units ( 00 00 00 01 yy xx xx ... );
Locate frame boundaries: each H.264 frame typically contains a number of slices and optionally one of these: SPS, PPS, SEI. You'll need to parse the 'yy' octet above to determine what kind of NAL unit you are looking at. Now, in order to know the boundary of a frame you will need to parse the first part of each slice called 'SliceHeader' and compare 'frame_number' of consequitive slices.
As soon as you know the frame boundaries you can form MP4 packets. Each packet will contain exactly one frame and and NAL units in this format:
l1 l1 l1 l1 yy xx xx ...
l2 l2 l2 l2 yy xx xx ...
so basically your replace each delimeter '00 00 00 01' with integer holding the length of this NAL unit.
Then in order to obtain correct MP4 header you'll need to use MP4 muxer and populate correct 'AvcC' atom inside of a sample entry of your video track.
This is a rather tedious process but if you want to get into specifics you can study the source code of JCodec ( http://jcodec.org ): org.jcodec.samples.transcode.TranscodeMain , org.jcodec.containers.mp4.MP4Muxer