What will be a quick way to parse bits used per frame for an encoded H264 bitstream in JM? - h.264

What will be a quick way to parse bits used per frame for an encoded H264 bitstream in JM?

If you want the average number of bits, just take number_of_bits/number_of_frames. For bits from an individual frame, you're going to have to decode the actual bitsream.

Related

Identifying frame type from h264 bitstream over RTP

I need to identify each frame type within a h264 bitstream over RTP. So far, I've been able to identify basically everything that wireshark can detect from the bitstream, including:
Sequence parameter set
NAL and Fragmentation unit headers, including the nal_ref_idc, and nal type
NAL Unit payload, including the slice_type bit.
From what I understand, nal_ref_idc can be combined with the slice_type bit to identify the slice_type - that is, I, P or B. But I'm struggling to understand how that is identified.
Finally, I'm not sure how to identify the type of the frame from this. At first I thought that the slices were the same as frames, but that isn't the case. How can I tell, or estimate which slices belong to the same frame, and then identify the frame type?
Thanks!

Is a single integer considered an "audio frame" in MediaCodec?

I read the following from the official docs on MediaCodec:
Raw audio buffers contain entire frames of PCM audio data, which is one sample for each channel in channel order. Each PCM audio sample is either a 16 bit signed integer or a float, in native byte order.
https://source.android.com/devices/graphics/arch-sh
The way I read this is that a buffer contains an entire frame of audio but a frame is just one signed integer. This doesn't seem to make sense. Or is this two values for the left and right audio? Why call it a buffer when it only contains a single value? To me, a buffer refers to several values spanning several milliseconds.
Here's what the docs for AudioFormat say:
For linear PCM, an audio frame consists of a set of samples captured at the same time, whose count and channel association are given by the channel mask, and whose sample contents are specified by the encoding. For example, a stereo 16 bit PCM frame consists of two 16 bit linear PCM samples, with a frame size of 4 bytes.
You are right that it doesn't make sense to use a buffer for just one frame. And in practice buffers are filled with many frames.
You can figure out the number of frames in a buffer from the size property of MediaCodec.BufferInfo and the frame size.

The maximum size with bytes of one NAL unit

I can't find any clue in H.264 spec, anybody can give a clear and simple maximum size regardless of its profile and level?
I'd like to parse H.264 stream, and copy one complete NAL unit buffer to a fixed-size buffer which can hold all bytes of one NAL unit.
Thanks.
AVC level 6.2 allows up to 139264 macro blocks per frame. If we use 10 bit color 4:4:4 it’s 30 bits per pixel. So (30*139264*16*16)/8 gives about 133.7mbytes for an uncompressed image. H.264 has a PCM_I encoding that allows for uncompressed images. There is a little ovehead for the NAL header, so let’s call it 134Mbyte. But in the real world the frame probablly will not be this large, and will likely be compressed.
I don't think there is a maximum for video NALs. In Annex-B format the NALs are start code delimited 0x00 0x00 0x00 0x01 so there is no limitation through a size field. In MP4 format the size field can have more capacity than most computer's RAM. I'd make a reasonable assumption what buffer size to expect and then reallocate if the maximum is exceeded.

Zero-padded h264 in mdat

I'd like to do some stuff with h.264 data recorded from Android phone.
My colleague told me there should be 4 bytes right after mdat wich specifies NALU size, then one byte with NALU metadata and then the raw data, and then (after NALU size), another 4 bytes with another NALU size and so on.
But I have a lot of zeros right after mdat:
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0e00000000000000000000000000000000000000000000000000000000000000
8100000000000000000000000000000000000000000000000000000000000000
65b84f0f87fa890001022e7fcc9feef3e7fabb8e0007a34000f2bbefffd07c3c
bfffff08fbfefff04355f8c47bdfd05fd57b1c67c4003e89fe1fe839705a699d
c6532fb7ecacbfffe82d3fefc718d15ffffbc141499731e666f1e4c5cce8732f
bf7eb0a8bd49cd02637007d07d938fd767cae34249773bf4418e893969b8eb2c
Before mdat atom are just ftyp mp42, isom mp42 and free atoms. All other atoms (moov, ...) are at the end of the file (that's what Android does, when it writes to socket and not to the file). But If necessary, I've got PPS and SPS from other file with same camera and encoder settings recorded just a seond before this, just to get those PPS and SPS data.
So how exactly can i get NALUs from that?
You can't. The moov atom contains information required to parse the mdat. Without it the mdat has little value. For instance, the first NALU does not need to start at the begining of the mdat, It can start anywhere within the mdat. The byte it starts at is recorded in (I believe) the stco box. If the file has audio, you will find audio and video mixed within mdat with no way to determine what is what without the chunk offsets. In addition, if the video has B frames, there is no way to determine render order without the cts, again only available in the moov. And Technically, the nalu size does not need to be 4 bytes and you cant know that without the moov. I recommend not used mp4. Use a streamable container such as ts or flv. Now if you can make some assumption about the code that is producing the file; Like the chunk offset is always the same, and there is no b frames, you can hard code these values. But is not guaranteed to work after a software update.

validation of single h264 AVC nal unit

I have extracted several nal units from hard disk. I want to know which of them is valid nal unit or not. Is there any tool or code that can validate the structure or syntax of single h264 AVC nal unit.
It depends. First you need to figure out what the NAL type is by the first byte. If the NAL is an SPS or PPS you can basically decode that as-is and see if the result is sane.
If the NAL is an actual coded slice, you will need at least three NALs to decode it. The corresponding SPS, PPS and the coded slice. You can decode the first few elements of the slice header without the SPS and PPS, but then you would need the corresponding SPS and PPS based on the PPS ID in the slice header to continue.
There were some command line tools (maybe h264_parse) that would dump this type of header information for you, or you can hack the reference decoder to help you out.
http://iphome.hhi.de/suehring/tml/
In the end the only way to know if your NAL is "good" is to either match it up with the bitstream you started out with or fully decode it and verify the resulting picture output as bit-exact.
Checking the NAL byte length and maybe a checksum or CRC of each NAL can be helpful too, but no such mechanism exists in the bitstream, you'd have to add that on.