Why is pts in PES packet zero for h264? - h.264

I detect new PES packet in PES demultiplexer searching packet_start_code_prefix (0x000001). When it occurs then I can read PES_packet_length and so I can extract the current PES packet from byte stream. But if it is a H.264 video stream then PES packet's PTS is zero for alternate PES packets? In such case can I assume those two packets are actually one but split accross the PES packets and use the previous PES packet's PTS as the PTS of the packet with zero timestamp?

PTS is optional - so it is not a violation of the specification. If you don't have PTS/DTS than you can derive it from information in the elementary stream. Let's ignore frame re-ordering for the moment than you can assume that the next frame's DTS(1) is DTS(0) + frame length. You can assume one access unit per PES packet.

Related

usage of start code for H264 video

I have general question about the usage of start code (0x00 0x00 0x00 0x01) for the H264 video. I am not clear about the usage of this start code as there is no reference in the RTP RFCs that are related to H264 video. But I do see lot of reference in the net and particularly in the stackoverflow.
I am confused as I see one client doesn't have this start code and another client is using this start code. So, I am looking for a specific answer where this start code should be used and where I shouldn't.
KMurali
There are two H.264 stream formats and they are sometimes called
Annex B (as found in raw H.264 stream)
AVCC (as found in containers like MP4)
An H.264 stream is made of NALs (a unit of packaging)
(1) Annex B : has 4-byte start code before each NAL unit's bytes [x00][x00][x00][x01].
[start code]--[NAL]--[start code]--[NAL] etc
(2) AVCC : is size prefixed (meaning each NALU begins with byte size of this NALU)
[SIZE (4 bytes)]--[NAL]--[SIZE (4 bytes)]--[NAL] etc
Some notes :
The AVCC (MP4) stream format doesn't contain any NALs of type SPS, PPS or AU delimter. Since that specific information is now placed within MP4 metadata.
The Annex B format you'll find in MPEG-2 TS, RTP and some encoders default output.
The AVCC format you'll find in MP4, FLV, MKV, AVI and such A/V container formats.
Both formats can be converted into each other.
Annex B to MP4 : Remove start codes, insert length of NAL, filter out SPS, PPS and AU delimiter.
MP4 to Annex B : Remove length, insert start code, insert SPS for each I-frame, insert PPS for each frame, insert AU delimiter for each GOP.

validation of single h264 AVC nal unit

I have extracted several nal units from hard disk. I want to know which of them is valid nal unit or not. Is there any tool or code that can validate the structure or syntax of single h264 AVC nal unit.
It depends. First you need to figure out what the NAL type is by the first byte. If the NAL is an SPS or PPS you can basically decode that as-is and see if the result is sane.
If the NAL is an actual coded slice, you will need at least three NALs to decode it. The corresponding SPS, PPS and the coded slice. You can decode the first few elements of the slice header without the SPS and PPS, but then you would need the corresponding SPS and PPS based on the PPS ID in the slice header to continue.
There were some command line tools (maybe h264_parse) that would dump this type of header information for you, or you can hack the reference decoder to help you out.
http://iphome.hhi.de/suehring/tml/
In the end the only way to know if your NAL is "good" is to either match it up with the bitstream you started out with or fully decode it and verify the resulting picture output as bit-exact.
Checking the NAL byte length and maybe a checksum or CRC of each NAL can be helpful too, but no such mechanism exists in the bitstream, you'd have to add that on.

SPS and PPS (aka dwSequenceHeader) in Media Foundation's H264 encoder

I'm using the H264 encoder from Media Foundation (MFT). I extracted the SPS and PPS from it, because I need it for smooth streaming. The MSDN says that the number of bytes used for the length field that appears before each NALU can be 1, 2, or 4. This is all in network byte order. As you can see, the first 4 bytes in the buffer are 0, 0, 0, 1. If we apply any of the possible lengths, we will get nothing. If the number of bytes used for length is 1, then the length is zero, if it is 2, the length is zero again. If it is 4, the length of first NALU is 1?! And, that's not correct. Does anybody know how should I interpret this SPS and PPS concatenated together??
The answer here is simple: the data is valid and formatted according to Annex B, prefixed by start codes 00 00 00 01 and not run length encoding.
H.264 extradata (partially) explained - for dummies
Annex B format
in this format, each NAL is preceeded by a four byte start code: 0x00
0x00 0x00 0x01 thus in order to know where a NAL start and where it
stops, you would need to read each byte of the bitstream, looking for
these start codes, which can be a pain if you need to convert between
this format and the other format.
More details on H.264 spec - freely available for download. Page 326 starts with "Annex B - Byte stream format".

Create MDAT from I-frame/P-frame fragments

I am creating an MPEG-4 file from H.264 stream. H.264 stream comes in NAL format (EG: 0,0,0,1,67,...,0,0,1,68,...).
Each video frame is transmitted as multiple I-frame/P-frame fragments. For eg: Frame 1 contains approximately 80 I-frame fragments and Frame 2 contains around 10 P-frame fragments.
I understand that MDAT atom of the MPEG-4 file is supposed to contain H.264 streams in NAL format.
I would like to know how these fragments can be converted to a single I-frame before I can put it into MDAT atom of MPEG-4.
I do not want to use any libraries.
Thanks for your help.
You are going to convert H.264 Annex B NAL stream into MP4 file packets. In order to do that you need to:
Split your original file into NAL units ( 00 00 00 01 yy xx xx ... );
Locate frame boundaries: each H.264 frame typically contains a number of slices and optionally one of these: SPS, PPS, SEI. You'll need to parse the 'yy' octet above to determine what kind of NAL unit you are looking at. Now, in order to know the boundary of a frame you will need to parse the first part of each slice called 'SliceHeader' and compare 'frame_number' of consequitive slices.
As soon as you know the frame boundaries you can form MP4 packets. Each packet will contain exactly one frame and and NAL units in this format:
l1 l1 l1 l1 yy xx xx ...
l2 l2 l2 l2 yy xx xx ...
so basically your replace each delimeter '00 00 00 01' with integer holding the length of this NAL unit.
Then in order to obtain correct MP4 header you'll need to use MP4 muxer and populate correct 'AvcC' atom inside of a sample entry of your video track.
This is a rather tedious process but if you want to get into specifics you can study the source code of JCodec ( http://jcodec.org ): org.jcodec.samples.transcode.TranscodeMain , org.jcodec.containers.mp4.MP4Muxer

how to calculate number of bytes going through network with tcpdump?

I have tcpdump like this
sudo tcpdmp tcp -n -i eth0 -w test.dmp
I want to calculate the number of tcp bytes going through eth0. I capture all the package using tcpdump as above. Is the file size equal the number of bytes or tcpdump add additional information into the dump file?
Yes, tcpdump adds additional information to the file.
It (currently) writes only in pcap format, which means there's a 24-byte header at the beginning of the file, giving information such as the link-layer header type for packets in the file, so the first thing you'd need to do would be to subtract 24 from the size of the file.
In addition, each packet has a 16-byte header giving an arrival time stamp for the packet, the length of the packet, and the number of bytes of packet data that was captured. This means that you would need to subtract 16*{number of packets} from the length - but the only way to get the number of packets is to read the file, so you can't get the number of bytes just by looking at the file size!
Note also that some versions of tcpdump did not default to a "snapshot length" of 0, so the number of bytes of packet data that is captured may be less than the number of packet bytes on the network.
Therefore, what you should do is write a program (use libpcap, as it already knows pcap format and you don't have to write your own code to understand it) that reads all the packets and adds up the "length of the packet" field (it's the len field in the struct pcap_pkthdr structure; do not use caplen, as that's the number of bytes of packet data that was captured) values for all the packets.
You say eth0, so the link-layer header type is probably Ethernet, and there is, for example, no radio meta-data, as might be the case if you were capturing in monitor mode on a Wi-Fi adapter. In the cases where there's extra meta-data in the link-layer header, you'd need to subtract that.