Including SPS and PPS in a raw h264 track

Including SPS and PPS in a raw h264 track - h.264

Should a raw h.264 video track contain at the start SPS and PPS data in order for a player (at least VLC Player) to be able to play it properly?
In this Android question, I see that in this specific context - MediaCodec decoder on Android API - it is said that the two buffers need to be provided. Is this generalized. Do I provide it as 0,0,0,1,[sps],0,0,0,1,[pps]?
[EDIT]
I've added SPS and PPS as is, and now VLC Player wants to play the video. It seems to decode the proper amount of frames at the good framerate, though the frames are pink and green noise, with movements that seem to follow original movie movements. I feel like there are missing informations regarding the format of my video track data. When I demux an MP4 with FFmpeg, the provided sps at the beginning of the raw h.264 stream is richer than mine.
FFMPEG h.264 raw video track stream:
SPS
00 00 00 01 67 42 c0 0d 9a 74 03 c0 11 3f 2e 02 20 00 00 03 00 20 00 00 06 51 e2 85 54
PPS
00 00 00 01 68 ce 3c 80
The rest...
00 00 00 01 65 etc....

which library are you using to get stream ? for example if you use live555 to get stream you need to put sps and pps information before frames because ffmpeg wants these information for decoding.
00 00 00 01 for h264 format
00 00 00 01 sps 00 00 00 01 pps 00 00 00 01 data frame

Related

buffer vs base64 in NodeJs?

What is the difference between base64 and buffer in NodeJs?
Input passed within the POST body. Supported input methods: raw image binary. What does raw image binary mean - base64 or buffer?
buffer: <Buffer 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 01 77 00 00 01 b6 08 06 00 00 00 84 74 c6 ef 00 00 18 34 69 43 43 50 49 43 43 20 50 72 6f 66 69 ... 275877 more bytes>,
Would this work, for the input format if I passed it to the API?

base64 is a style of storing information(where 6 bits encode your information). It's usually used for image to binary conversion (and encoding decoding).
Buffer is a storage type that uses dynamic heap and is used for ensuring conversion standards ( IPCs can be encoded in utf-8, however if you start utilizing Sockets or network packets then they will have their own encoding types).
Raw image binary means raw values. You may save them either as base64 or in the Buffer as per your convenience

Raspberry Pi Camera -- extract NAL units from Raspivid

I'm trying to extract NAL units from raw .h264 files generated by Raspivid. I'm piping the output of Raspivid to netcat as so:
Raspivid | nc -u IPaddress Port
I can receive and save the stream on a client. The .h264 file that results actually DOES play in VLC.
However, my ultimate goal is to parse the NAL units out of the file and feed them into Media Codec on Android. To do this, I need the SPS and PPS data.
The problem is that I'm not finding the corresponding NAL units when examining the Hex output of the generated file. I'm looking for "00 00 00 01 67" for SPS.
All I'm seeing are a ton of "00 00 00 01 21",
"00 00 00 01 27",
"00 00 00 01 28"
etc.
Any idea what I'm doing wrong here?
Edit: I AM using the -ih option on Raspivid so it should be inserting those values regularly.

Guys on the Pi forums helped me out. I was basing my 67 number on a blog post describing NAL units, but I didn't consider that the hex could change regardless of the last five bits still being 7. Total noob.

calculating the encoded framerate in H264

I have a video with an unknown frame rate. I need to calculate the frame rate it was encoded for. I am trying to calculate it using the data in SPS but I cannot decode it.
The bitstream for the NAL is :
67 64 00 1e ac d9 40 a0 2f f9 61 00 00 03 00 7d 00 00 17 6a 0f 16 2d 96
From an online guide (http://www.cardinalpeak.com/blog/the-h-264-sequence-parameter-set/), I could figure out its profile and level fields, but to figure out everything after the "seq_parameter_set_id" field in the table, I need to know the ue(v). Here is where I get confused. According to this page the "ue(v)" should be called with the value v=32? (why?) What exactly should I feed into the exponential-golomb function? Do I read 32 digits from the beginning of the bitstream, or from after the previously read bytes, to regard it as the "seq_parameter_set_id"?
( My ultimate goal is to decode the VUI parameters so that I can recalculate the framerate.)
Thanks!

ue = Unsigned Exponential golomb coding.
(v) = variable number of bits.
http://en.wikipedia.org/wiki/Exponential-Golomb_coding

Interpret PNG pixel data

Looking at the PNG specification, it appears that the PNG pixel data chunk starts with IDAT and ends with IEND (slightly clearer explanation here). In the middle are values that don't make sense to make sense to me.
How can I get usable RGB values from this, without using any libraries (ie from the raw binary file)?
As an example, I made a 2x2px image with 4 black rgb(0,0,0) pixels in Photoshop:
Here's the resulting data (in the raw binary input, the hex values, and the human-readable ASCII):
BINARY HEX ASCII
01001001 49 'I'
01000100 44 'D'
01000001 41 'A'
01010100 54 'T'
01111000 78 'x'
11011010 DA '\xda'
01100010 62 'b'
01100000 60 '`'
01000000 40 '#'
00000110 06 '\x06'
00000000 00 '\x00'
00000000 00 '\x00'
00000000 00 '\x00'
00000000 00 '\x00'
11111111 FF '\xff'
11111111 FF '\xff'
00000011 03 '\x03'
00000000 00 '\x00'
00000000 00 '\x00'
00001110 0E '\x0e'
00000000 00 '\x00'
00000001 01 '\x01'
10000011 83 '\x83'
11010100 D4 '\xd4'
11101100 EC '\xec'
10001110 8E '\x8e'
00000000 00 '\x00'
00000000 00 '\x00'
00000000 00 '\x00'
00000000 00 '\x00'
01001001 49 'I'
01000101 45 'E'
01001110 4E 'N'
01000100 44 'D'

You missed a rather crucial detail in both the specifications:
The official one:
.. The IDAT chunk contains the actual image data which is the output stream of the compression algorithm.
[...]
Deflate-compressed datastreams within PNG are stored in the "zlib" format.
Wikipedia:
IDAT contains the image, which may be split among multiple IDAT chunks. Such splitting increases filesize slightly, but makes it possible to generate a PNG in a streaming manner. The IDAT chunk contains the actual image data, which is the output stream of the compression algorithm.
Both state the raw image data is compressed. Looking at your data, the first 2 bytes
78 DA
contain the compression flags as specified in RFC1950. The rest of the data is compressed.
Decompressing this with a general zlib compatible routine show 14 bytes of output:
00 00 00 00 00 00 00
00 00 00 00 00 00 00
where each first byte is the PNG row filter (0 for both rows), followed by 2 RGB triplets (0,0,0), for the 2 lines of your image.
"Without using any libraries" you need 3 separate routines to:
read and parse the PNG superstructure; this provides the IDAT compressed data, as well as essential information such as width, height, and color depth;
decompress the zlib part(s) into raw binary data;
parse the decompressed data, handling Adam-7 interlacing if required, and applying row filters.
Only after performing these three steps you will have access to the raw image data. Of these, you seem to have a good grasp of step (1). Step (2) is way harder to "do" yourself; personally, I cheated and used miniz in my own PNG handling programs. Step 3, again, is merely a question of determination. All the necessary bits of information can be found on the web, but it takes a while to put everything in the right order. (Just recently I found an error in my execution of the rarely used Paeth row filter--it went unnoticed because it is fairly rarely used in 'real world' images.)
See Building a fast PNG encoder issues for a similar discussion and Trying to understand zlib/deflate in PNG files for an in-depth look into the Deflate scheme.

questions about hex code shown in hex editor

I am stuck on HEX(intel hex) format which I see in Hex Editor (Hex Editor Neo).
Ok,I know hex,decimal,binary,their addition,multiplication,their conversion.
Ex. sample.jpg
(this is a jpg file I open with Hex Editor Neo in hex format with 4 columns)
ff d8 ff e0
00 10 4a 46
49 46 00 01
01 01 00 48
00 48 00 00
ff db 00 43
00 05 03 04
04 04 03 05
04 04 04 05
05 05 06 07
0c 08 07 07
07 07 0f 0b
0b 09 0c 11
I see this(these are just some of the rows from the whole file) type of hex code.
I am interested in what they mean?
I know ff d8 ff e0 tells you jpg.
I know jpg ends with ff d9.
I want to know about other codes..I mean why they are their?.They must be having some meaning or how the conversion takes place from picture to hex.
What do you mean by "4a 46 49 46 00" and many others present there?

Some of it will be standard header information and a lot of it will be picture data. Don't forget that this is a binary file and as part of the compression algorithm the picture file will be converted into a different type of binary encoding to what it originally was. I doubt if you would be able to tell what the picture looks like by reading the binary data that relates to it :)
You can read all about the jpeg standard here.
BTW. hex is just a means of representing the binary data in a more easily understood form than binary. The data is the same - its binary data. If you opened the file in an editor supporting octal it would look different again.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008