Is it possible to get H.264's Picture Parameter Set (PPS) from Android encoder? - android-mediacodec

I am encoding bitmap to H.264. Please allow me to skip the code here because other places have excellent descriptions such as this one. It would take up a lot of space. The main idea is configuring MediaCodec to do the encoding.
The encoding appears to work well. The output frames have the following H.264 NAL unit types:
7 (Sequence parameter set)
5 (Coded slice of an IDR picture)
1 (Coded slice of a non-IDR picture)
1
1
1
...
You can see it generates SPS but not PPS. My understanding is that PPS is needed for producing a valid MP4 file.
Is there a way to obtain PPS from the encoder?

Related

Where can I cut a H.264 video stream without re-compressing?

I am trying to cut a H.264 video stream without decoding and re-encoding. To find a cutting point in the video stream:
Do I first detect the I-frame and then capture the video for the desired time?
Am I right or I have to look for a combination of I, P, and B frames?
Typically H.264 bitstreams start with a sequence parameter set (SPS), a picture parameter set (PPS), followed by an IDR frame in H.264 bitstream which is then followed by other arbitrary frames (P, B, etc). The parameter sets are required to initialise the decoder correctly.
Therefore to be able to decode each segment you're cutting, each segment should ideally begin with the parameter sets, but whether the each IDR is preceded by parameter sets is both codec and codec configuration dependent.
You'll be able to determine your requirements by looking at the NAL unit types of the bitstream you're wanting to cut.
However it is possible to supply a decoder out of band with the SPS and PPS. In that case they would be able to decode the bitstream starting at an IDR.
You don't have to look for combinations of I, P, B frames, just make sure you have the parameter sets, and your segment begins with an IDR.

Zero-padded h264 in mdat

I'd like to do some stuff with h.264 data recorded from Android phone.
My colleague told me there should be 4 bytes right after mdat wich specifies NALU size, then one byte with NALU metadata and then the raw data, and then (after NALU size), another 4 bytes with another NALU size and so on.
But I have a lot of zeros right after mdat:
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0e00000000000000000000000000000000000000000000000000000000000000
8100000000000000000000000000000000000000000000000000000000000000
65b84f0f87fa890001022e7fcc9feef3e7fabb8e0007a34000f2bbefffd07c3c
bfffff08fbfefff04355f8c47bdfd05fd57b1c67c4003e89fe1fe839705a699d
c6532fb7ecacbfffe82d3fefc718d15ffffbc141499731e666f1e4c5cce8732f
bf7eb0a8bd49cd02637007d07d938fd767cae34249773bf4418e893969b8eb2c
Before mdat atom are just ftyp mp42, isom mp42 and free atoms. All other atoms (moov, ...) are at the end of the file (that's what Android does, when it writes to socket and not to the file). But If necessary, I've got PPS and SPS from other file with same camera and encoder settings recorded just a seond before this, just to get those PPS and SPS data.
So how exactly can i get NALUs from that?
You can't. The moov atom contains information required to parse the mdat. Without it the mdat has little value. For instance, the first NALU does not need to start at the begining of the mdat, It can start anywhere within the mdat. The byte it starts at is recorded in (I believe) the stco box. If the file has audio, you will find audio and video mixed within mdat with no way to determine what is what without the chunk offsets. In addition, if the video has B frames, there is no way to determine render order without the cts, again only available in the moov. And Technically, the nalu size does not need to be 4 bytes and you cant know that without the moov. I recommend not used mp4. Use a streamable container such as ts or flv. Now if you can make some assumption about the code that is producing the file; Like the chunk offset is always the same, and there is no b frames, you can hard code these values. But is not guaranteed to work after a software update.

usage of start code for H264 video

I have general question about the usage of start code (0x00 0x00 0x00 0x01) for the H264 video. I am not clear about the usage of this start code as there is no reference in the RTP RFCs that are related to H264 video. But I do see lot of reference in the net and particularly in the stackoverflow.
I am confused as I see one client doesn't have this start code and another client is using this start code. So, I am looking for a specific answer where this start code should be used and where I shouldn't.
KMurali
There are two H.264 stream formats and they are sometimes called
Annex B (as found in raw H.264 stream)
AVCC (as found in containers like MP4)
An H.264 stream is made of NALs (a unit of packaging)
(1) Annex B : has 4-byte start code before each NAL unit's bytes [x00][x00][x00][x01].
[start code]--[NAL]--[start code]--[NAL] etc
(2) AVCC : is size prefixed (meaning each NALU begins with byte size of this NALU)
[SIZE (4 bytes)]--[NAL]--[SIZE (4 bytes)]--[NAL] etc
Some notes :
The AVCC (MP4) stream format doesn't contain any NALs of type SPS, PPS or AU delimter. Since that specific information is now placed within MP4 metadata.
The Annex B format you'll find in MPEG-2 TS, RTP and some encoders default output.
The AVCC format you'll find in MP4, FLV, MKV, AVI and such A/V container formats.
Both formats can be converted into each other.
Annex B to MP4 : Remove start codes, insert length of NAL, filter out SPS, PPS and AU delimiter.
MP4 to Annex B : Remove length, insert start code, insert SPS for each I-frame, insert PPS for each frame, insert AU delimiter for each GOP.

validation of single h264 AVC nal unit

I have extracted several nal units from hard disk. I want to know which of them is valid nal unit or not. Is there any tool or code that can validate the structure or syntax of single h264 AVC nal unit.
It depends. First you need to figure out what the NAL type is by the first byte. If the NAL is an SPS or PPS you can basically decode that as-is and see if the result is sane.
If the NAL is an actual coded slice, you will need at least three NALs to decode it. The corresponding SPS, PPS and the coded slice. You can decode the first few elements of the slice header without the SPS and PPS, but then you would need the corresponding SPS and PPS based on the PPS ID in the slice header to continue.
There were some command line tools (maybe h264_parse) that would dump this type of header information for you, or you can hack the reference decoder to help you out.
http://iphome.hhi.de/suehring/tml/
In the end the only way to know if your NAL is "good" is to either match it up with the bitstream you started out with or fully decode it and verify the resulting picture output as bit-exact.
Checking the NAL byte length and maybe a checksum or CRC of each NAL can be helpful too, but no such mechanism exists in the bitstream, you'd have to add that on.

How to encode a stream of RGBA values to video?

More specifically:
I have a sequence of 32 bit unsigned RGBA integers for pixels- e.g. 640 integers per row starting at the left pixel, 480 rows per frame starting at the top row, repeat for n frames. Is there an easy way to feed this to ffmpeg (or some other encoder) without first encoding it to a common image format?
I'm assuming ffmpeg is the best tool for me to use in this case, but I'm open to suggestions (the output video format doesn't matter too much).
I know the documentation would enlighten me if I just knew the right keywords... In case I'm asking the wrong question, here's what I'm trying to do at the highest level:
I have some Actionscript code that draws and animates on the display tree, and I've wrapped it in an AIR application that draws BitmapData frame-by-frame. AIR has proved to be woefully inefficient at directly encoding this output- the best I've managed is a few frames per second, and I need to render at least 15 fps, preferably more like 100 fps, which I get out of ffmpeg when I feed it PNG images (AIR can take 1+ seconds to encode one 640x480 png... appalling). Instead of encoding inside AIR I can send the raw byte data out to an encoder or to disk as fast as it's rendered.
If you're wondering why I'm using Actionscript to render an animation or why it has to be encoded quickly, don't. Suffice it to say, the frames are computed at execution time (not stored as an animation in a .swf file, for example), I have a very large amount of video to create and limited time to do so, and using something other than Actionscript to produce the frames is not an option.
The solution I've come up with is to use x264 instead of ffmpeg.
For testing purposes, I saved frames as files: 00.bin, 01.bin, .. nn.bin, containing 640x480x4 ARGB pixel values. The command I used to verify that the approach is feasible is the following horrible hack:
cat *.bin | \
perl -e 'while (sysread(STDIN,$d,4)){print pack("N",unpack("V",$d));}' | \
x264 --demuxer raw --input-csp bgra --fps 15 --input-res 640x480 --qp 0 \
--muxer flv -o out.flv -
The ugly perl snippet in there is a hack to swap four-byte endian order, since x265 can only take BGRA and my test files contained ARGB.
In a nutshell,
Actionscript renders ARGB values into ByteArray
swap the endian to BGRA
pipe it to x264: raw demuxer, bgra colorspace, specify fps/w/h/quality
??
profit.