More specifically:
I have a sequence of 32 bit unsigned RGBA integers for pixels- e.g. 640 integers per row starting at the left pixel, 480 rows per frame starting at the top row, repeat for n frames. Is there an easy way to feed this to ffmpeg (or some other encoder) without first encoding it to a common image format?
I'm assuming ffmpeg is the best tool for me to use in this case, but I'm open to suggestions (the output video format doesn't matter too much).
I know the documentation would enlighten me if I just knew the right keywords... In case I'm asking the wrong question, here's what I'm trying to do at the highest level:
I have some Actionscript code that draws and animates on the display tree, and I've wrapped it in an AIR application that draws BitmapData frame-by-frame. AIR has proved to be woefully inefficient at directly encoding this output- the best I've managed is a few frames per second, and I need to render at least 15 fps, preferably more like 100 fps, which I get out of ffmpeg when I feed it PNG images (AIR can take 1+ seconds to encode one 640x480 png... appalling). Instead of encoding inside AIR I can send the raw byte data out to an encoder or to disk as fast as it's rendered.
If you're wondering why I'm using Actionscript to render an animation or why it has to be encoded quickly, don't. Suffice it to say, the frames are computed at execution time (not stored as an animation in a .swf file, for example), I have a very large amount of video to create and limited time to do so, and using something other than Actionscript to produce the frames is not an option.
The solution I've come up with is to use x264 instead of ffmpeg.
For testing purposes, I saved frames as files: 00.bin, 01.bin, .. nn.bin, containing 640x480x4 ARGB pixel values. The command I used to verify that the approach is feasible is the following horrible hack:
cat *.bin | \
perl -e 'while (sysread(STDIN,$d,4)){print pack("N",unpack("V",$d));}' | \
x264 --demuxer raw --input-csp bgra --fps 15 --input-res 640x480 --qp 0 \
--muxer flv -o out.flv -
The ugly perl snippet in there is a hack to swap four-byte endian order, since x265 can only take BGRA and my test files contained ARGB.
In a nutshell,
Actionscript renders ARGB values into ByteArray
swap the endian to BGRA
pipe it to x264: raw demuxer, bgra colorspace, specify fps/w/h/quality
??
profit.
Related
I am encoding bitmap to H.264. Please allow me to skip the code here because other places have excellent descriptions such as this one. It would take up a lot of space. The main idea is configuring MediaCodec to do the encoding.
The encoding appears to work well. The output frames have the following H.264 NAL unit types:
7 (Sequence parameter set)
5 (Coded slice of an IDR picture)
1 (Coded slice of a non-IDR picture)
1
1
1
...
You can see it generates SPS but not PPS. My understanding is that PPS is needed for producing a valid MP4 file.
Is there a way to obtain PPS from the encoder?
I would like to know how one could generate a wavetable out of a wav file for example.
I know a wavetable can be used in web audio api with setPerdiodic wave and I know how to use it.
But what do I need to do to create my own wavetables? I read about inverse FFT, but I did find nearly nothing. I don't need any code just an idea or a formula of how to get the wavetable from an wav file to a Buffer.
There are a few constraints here and I'm not sure how good the result will be.
Your wav file source can't be too long; the PeriodicWave object
only supports arrays up to size 8192 or so.
I'm going to assume your waveform is intended to be periodic. If the
last sample and the first aren't reasonably close to each other,
there will be a hard-to-reproduce jump.
The waveform must have zero mean, so if it doesn't you should remove
the mean.
With that taken care of, select a power of two greater than the length
of your wave file (not strictly needed, but most FFTs expect powers of
two). Zero-pad the wave file if the length is not a power of two.
Then compute the the FFT. You'll either get an array of complex
numbers or two arrays. Separate these out to real and imaginary
arrays and use them for contructing the PeriodicWave.
I'd like to do some stuff with h.264 data recorded from Android phone.
My colleague told me there should be 4 bytes right after mdat wich specifies NALU size, then one byte with NALU metadata and then the raw data, and then (after NALU size), another 4 bytes with another NALU size and so on.
But I have a lot of zeros right after mdat:
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0e00000000000000000000000000000000000000000000000000000000000000
8100000000000000000000000000000000000000000000000000000000000000
65b84f0f87fa890001022e7fcc9feef3e7fabb8e0007a34000f2bbefffd07c3c
bfffff08fbfefff04355f8c47bdfd05fd57b1c67c4003e89fe1fe839705a699d
c6532fb7ecacbfffe82d3fefc718d15ffffbc141499731e666f1e4c5cce8732f
bf7eb0a8bd49cd02637007d07d938fd767cae34249773bf4418e893969b8eb2c
Before mdat atom are just ftyp mp42, isom mp42 and free atoms. All other atoms (moov, ...) are at the end of the file (that's what Android does, when it writes to socket and not to the file). But If necessary, I've got PPS and SPS from other file with same camera and encoder settings recorded just a seond before this, just to get those PPS and SPS data.
So how exactly can i get NALUs from that?
You can't. The moov atom contains information required to parse the mdat. Without it the mdat has little value. For instance, the first NALU does not need to start at the begining of the mdat, It can start anywhere within the mdat. The byte it starts at is recorded in (I believe) the stco box. If the file has audio, you will find audio and video mixed within mdat with no way to determine what is what without the chunk offsets. In addition, if the video has B frames, there is no way to determine render order without the cts, again only available in the moov. And Technically, the nalu size does not need to be 4 bytes and you cant know that without the moov. I recommend not used mp4. Use a streamable container such as ts or flv. Now if you can make some assumption about the code that is producing the file; Like the chunk offset is always the same, and there is no b frames, you can hard code these values. But is not guaranteed to work after a software update.
Recently I am reading the x264 source codes. Mostly, I concern the RC part. And I am confused about the parameters --bitrate and --vbv-maxrate. When bitrate is set, the CBR mode is used in frame level. If you want to start the MB level RC, the parameters bitrate, vbv-maxrate and vbv-bufsize should be set. But I don't know the relationship between bitrate and vbv-maxrate. What is the criterion of the real encoding result when bitrate and vbv-maxrate are both set?
And what is the recommended value for bitrate? Equals to vbv-maxrate?
Also what is the recommended value for vbv-bufsize? Half of vbv-maxrate?
Please give me some advice.
bitrate address the "target filesize" when you are doing encoding. It is understandably confusing because it applies a "budget" of certain size and then tries to apportion this budget on the frames - that is why the later parts of a movie get a smaller amount of data which results in lower video quality. For example, if you have 10 seconds of complete black images followed by 10 second of natural video - the final encoded file will be very different than if the order was the opposite.
vbv-bufsize is the buffer that has to be completed before a "transmission" would occur say in a streaming scenario. Now, let's tie this to I-frames and P-frames: the vbv-bufsize will limit the size of any of your encoded video frames - most likely the I-frame.
I am a programmer and not a good mathematician so FFT is like some black box to me, I would like t throw some data into some FFT library and get out a plottable AFR (amplitude-frequency response) data, like some software like Rightmark audio does:
http://www.ixbt.com/proaudio/behringer/3031a/fr-hf.png
Now I have a system which plays back a logarithmic swept sine (with short fade-in/fade-out to avoid sharp edges) and records the response from the audio system.
As far as I understand, I need to pad the input with zeros to 2^n, use audio samples as a real part of a complex numbers, set imaginary=0, and I'll get back from FFT the frequency bins array whith half length of input data.
But if I do not need as big frequency resolution as some seconds audio buffer give to me, then what is the right way to make, lets say, 1024 size FFT window, feed chunks of audio and get back 512 frequency points which take into account all the data I passed in? Or maybe it is not possible and I need to feed entire swept sine at once to get back all the AFR data I need?
Also is there any smoothing needed? I have seen that the raw output from FFT may be really noisy. What is the right way to avoid the noise as early as possible, so I see the noise only as it comes from the AFR itself and not from FFT calculations (like the image in the link I have given - it seems pretty smooth)?
I am a C++/C# programmer. I would be grateful for any examples which show how to process chunks of swept sine end get back AFR data. For now I have found only examples which process data in small chunks in realtime, and that is not what I need.
Window function should help you reducing the noise
All you need to do is multiply your input data by w(n) :