H264 decode always latency one frame - h.264

Encoder encode a video (IDR P P P ...) and send to decoder and display it in real time, and I noticed encoder send i th frame but decoder display i-1 th frame.
I know p frame need to be removed from ref list so it can be removed from decode picture buffer, but it would cause next p frame decode fails.
Is there way to fix one frame delay?
Becase one frame delay is about 33 ms latency as 30fps, which is unacceptable.

When you use 30 fps, the time budget for each frame, be it I or be it P, is 33ms. To encode, transmit, and decode a frame cannot possibly take less than 33ms. It probably takes more than that, even twice that.
If you need less latency, increase the frame rate.

Related

Optimizing h264 MediaFoundation encoding

I'm writing a screen capture application that uses the UWP Screen Capture API.
As a result, I get a callback each frame with a ID3D11Texture2D with the target screen or application's image, which I want to use as input to MediaFoundation's SinkWriter to create an h264 in mp4 container file.
There are two issues I am facing:
The texture is not CPU readable and attempts to map it fail
The texture has padding (image stride > pixel width * pixel format size), I assume
To solve them, my approach has been to:
Use ID3D11DeviceContext::CopyResource to copy the original texture into a new one created with D3D11_USAGE_STAGING and D3D11_CPU_ACCESS_READ flags
Since that texture too has padding, create a IMFMediaBuffer buffer wrapping it with MFCreateDXGISurfaceBuffer cast it to IMF2DBuffer and use IMF2DBuffer::ContiguousCopyTo to another IMFMediaBuffer that I create with MFCreateMemoryBuffer
So I'm basically copying around each and every frame two times, once on the GPU and once on the CPU, which works, but seems way inefficient.
What is a better way of doing this? Is MediaFoundation smart enough to deal with input frames that have padding?
The inefficiency comes from your attempt to Map rather than use the texture as video encoder input. MF H.264 encoder, which is in most cases a hardware encoder, can take video memory backed textures as direct input and this is what you want to do (setting up encoder respectively - see D3D/DXGI device managers).
Padding does not apply to texture frames. In case of padding in frames with traditional system memory data Media Foundation primitives are normally capable to process data with padding: Image Stride and MF_MT_DEFAULT_STRIDE, and other Video Format Attributes.

Can't Get DirectShow MPEG-2 Decoder to Output YV12/NV12 Progressive

I am attempting to get the MPEG-2 Decoder (aka DTV-DVD Video Decoder) to give me progressive YV12 or NV12 frames that can be uploaded to OpenGL for rendering. But what I'm seeing rendered looks like some form of uncompressed adaptive motion interlacing or else just B or P frames that don't give the full image. (The code that renders the YV12/NV12 in OpenGL works well with other sources, so that's not the problem.)
One important clue: I see one perfectly rendered frame when the movie starts and whenver it loops back to the beginning. This tells me that's the only time I'm getting a full frame of valid YV12/NV12 data.
Shortest description possible:
1) Created a custom Sample Grabber (based on CTransInPlaceFilter) so that I could get samples that have a VIDEOINFOHEADER2. This works as expected, and the sample sizes match expectations for YV12/NV12 at resolution I'm playing. (Helpful example of rolling your own Sample Graabber here.)
2) To ensure I only get progressive frames, the CheckInputType() method of my Sample Grabber to return E_FAIL if the dwInterlaceFlags field of the VIDEOINFOHEADER2 has the AMINTERLACE_IsInterlaced flag set.
3) I am setting the eAVDecVideoSoftwareDeinterlaceMode_ProgressiveDeinterlacing flag on the decoder using the ICodecAPI interface with CODECAPI_AVDecVideoSoftwareDeinterlaceMode. (If I don't do this, the decoder won't connect to my Sample Grabber because it doesn't accept interlaced frames.)
4) To debug this, I'm using the IMediaSample2 interface to get the properties of the incoming media samples in the Sample Grabber. The dwTypeSpecificFlags member of the AM_SAMPLE2_PROPERTIES struct tells me that the frames are AM_VIDEO_FLAG_INTERLEAVED_FRAME, which I believe indicates I'm getting a full frame instead of a single field. The AM_VIDEO_FLAG_I_SAMPLE bit is also set, for all frames, indicating that I'm getting full "I" frames and not "B" or "P" frames.
5) Given that all frames are "I" frames, I'd expect to see my image instead of gobbledygook as shown above. As mentioned above, the only time I see a valid image when the movie loops back around to the first frame.
6) Last thing: I do see that my samples have the AM_VIDEO_FLAG_WEAVE set. Is this "weaving" of the image the problem?
Thanks,
Mark

AS3 NetStream AppendBytes Seek issue

I'm having trouble with NetStream in AS3. The project I am working on allows users to browse a video (locally) and play it back. The issue I am having is that netStream.seek(0); from what I can tell it doesn't do anything, although I get inside a NetStatusEvent function and NetStream.Seek.Notify is triggered. I'm using NativeProcess and the following function is this makes any difference.
public function ProgressEventOutputHandler(e:ProgressEvent):void {
videoByteArray = new ByteArray();
nativeProcess.standardOutput.readBytes(videoByteArray, 0, nativeProcess.standardOutput.bytesAvailable);
netStream.appendBytes(videoByteArray);
}
Am I missing something here? I am pausing netStream before using netStream.seek(0);.
EDIT:
In an attempt to fix this issue I followed the instructions by VC.One I've done the following:
Moved videoByteArray = new ByteArray(); to my init function and also created tempVideoByteArray = new ByteArray(); in this function.
Update my ProgressEventOutputHandler function so that it no longer created a new ByteArray for videoByteArray and changed this line - nativeProcess.standardOutput.readBytes(videoByteArray, videoByteArray.length, nativeProcess.standardOutput.bytesAvailable);
I have changed nothing else and now the video will not load. If I allow a new ByteArray to be created inside the ProgressEventOutputHandler function the video does load again.
Short Version :
Try the code I pasted here: Github Snippet link
Long version :
this one's kinda long but hope it helps once and for all... Don't worry about the brick wall thing, walls are made to be smashed. To keep you inspired, check out some in-house demos from the VC:One labs using appendBytes :
MP4 Seeking Experiment : research for appendBytes frame data access and time/seek handling. Real-time frame bytes convert from MP4 to FLV format using only AS3 code.
Speed Adjust of Audio & Video : for real-time MP3 audio in video separation & effecting experiment. Requires MP4/FLV file with MP3 data in the audio track.
Synchronised Video Frames : for multiple videos displaying by the same frame number.
PS: I'll be using URLStream method as that's a more useful answer to those loading local or online files. You could change from urlstream.progressEvent to your usual nativeProcess.progressEvent.
I know FFMPEG but only used AIR for making Android apps. So for this AIR/FFMPEG connection you know more than me.
Also this answer assumes you're using FLV with MPEG H.264 video & MP3 or AAC audio.
ffmpeg -i input.mp4 -c:v copy -c:a mp3 -b:a 128k -ac 2 -ar 44100 FLV_with_MP3.flv
This assumption matters because it affects what kind of bytes we look for.
In the case of the above FLV with a H.264 video and AAC or MP3 audio we can expect the following (when seeking) :
Since this is MPEG the first video tag will hold AVC Decoder Config bytes and the first audio tag holds the Audio Specific Config bytes. This data is not actual media frames but simply packaged like an audio/video tag. These are needed for MPEG playback. The same bytes can be found in the STSD metadata entry (MOOV atom) inside an MP4 container. Now the next found video tag will (or should) be the video's actual first frame.
Video keyframe : begins 0x09 and next 11th byte is 0x17 & 12th byte is 0x01
Audio TAG AAC : begins 0x08 and next 11th byte is 0xAF & 12th byte is 0x01
Audio TAG MP3 : begins 0x08 and next 11th byte is 0x2F & 12th byte is 0xFF
1) Bytes copying and checking values :
You are looking for bytes that represent a video "tag". Apart from the Metadata tag, you can now expect "tag" to mean a container of an audio or video frame. There are two ways to get tag bytes into your "temporary byte array" (we'll name it as temp_BA ).
ReadBytes (slow) : extracts the individual byte values within a start/end range in source_BA
WriteBytes (fast) : instant duplication of a start/end range of bytes from source_BA
Readbytes explained : tells Source to read its bytes into Target. Source will read forwards up to the length from its current offset (position). Go to correct Source position before reading onwards...
source_BA.readBytes( into Target_BA, Pos within Target_BA, length of bytes required );
After the above line executes, Source position will now have moved forward to account for the new length travelled. (formula : Source new Pos = previousPos + BytesLengthRequired).
Writebytes explained : tells Target to duplicate a range of bytes from Source. Is fast since copying from already-known information (from Source). Target writes onwards from its current position...
target_BA.writeBytes( from source_BA, Pos within source_BA, length of bytes required );
After the above line executes, note that both Source and Target positions are unchanged.
Use above methods to get required tag bytes into temp_BA from a specific source_BA.position = x.
To check any byte (its value), use the methods below to update some variable of int type:
Read a one-byte value : use my_Integer = source_BA.readByte();
Read a two-byte value : use my_Integer = source_BA.readUnsignedShort();
Read a four-byte value : use my_Integer = source_BA.readUnsignedInt();
variable Number for eight-byte value : use my_Number = source_BA.readDouble();
note : Don't confuse .readByte(); which extracts a numerical value (of byte) with the similar sounding .readBytes() which copies a chunk of bytes to another byte array.
2) Finding A Video KeyFrame (or I-frame) :
[ illustration image of Video TAG with Keyframe H264/AAC ]
To find a video keyframe
From a starting offset, use a while loop to now travel [forward] through the bytes searching each byte for a one-byte value of "9" ( hex: 0x09), when found we check further ahead bytes to confirm that indeed it's a true keyframe and not just a random occurence of "9".
In the case of H.264 video codec, at the correct "9" byte position (xPos) we expect the 11th & 12th bytes ahead always to be "17" and "01" respectively.
If that is == true then we check the three Tag Size bytes and add 15 to this integer for the total length of bytes expected to be written from Source into Target ( temp_BA). We have added 15 to account for the 11 bytes before and also the 4 bytes after expected TAG DATA. These 4 bytes at tag ending are "Previous Tag Size" and this amount actually includes the 11 front bytes but not counting these end 4 bytes themselves.
We tell temp_BA to write bytes of Source (your videoByteArray) starting from pos of "9" byte (xPos) for a length of "Tag Size" + 15. You have now extracted an MPEG keyframe. example : temp_BA.writeBytes( videoByteArray, int (xPos), int (TAG_size) );
This temp_BA with tag of a Keyframe can now be appended using:
example : netStream.appendBytes( temp_BA ); //displays a single frame
note : For reading 3 bytes of Tag Size I will show a custom converting bytes_toInt() function (since processors read either 1, 2 or 4 bytes at once for integers, reading 3 bytes here is an akward request).
Searching tip : Tags always follow each other in a trail. We can seek faster by also checking if bytes are for a non-keyframe (P frame) video tag or even some audio tag. If so then we check that particular tag size and now increment our xPos to jump this new length. This way we can skip by whole tag sizes not just by single bytes. Stopping only when we have a keyframe tag.
3) Playback From A Video KeyFrame :
When you think about it, play is simply like an auto-seek going on a frame by frame basis. Where the expected speed of getting each next frame is defined by the video's encoded framerate.
So your playback function can simply be a Timer that gets X-amount of video tags (frames) every second (or 1000 milisecs). You do that as example my_Timer = new Timer ( video_FPS ). When the timer runs and reaches each FPS slice of a second it will run the append_PLAY(); function which in turn runs a get_frame_Tag(); function.
NS.seek(0) : Puts NetStream into "seek mode". (the number doesn't matter but must exist in the command). Any "ahead frames" buffer is cleared and they'll be no (image) frame updates until..
RESET_SEEK : Ends the "seek mode" and now allows image updates. The first tag you append after using the RESET_SEEK command must be a tag with a video keyframe. (for audio-only this can be any tag since technically all audio tags are audio keyframes)
END_SEQUENCE : (for MPEG H.264) Plays out any remaining "ahead frames" (drains the buffer). Once drained you can now append any type of video tag. Remember H.264 expects forward-moving timestamps, If you see f**ked up pixels then your next tag timestamp is wrong (too high or too low). If you appending just one frame (poster image?) you could use END_SEQUEMCE to drain the buffer and display that one frame (without waiting for buffer to fill up to x-amount of frames first)...
The play function acts as a middle-man function to manage things without cluttering the get frame function with If statements etc. Managing things means for example checking that there are enough bytes downloaded to even begin getting a frame according to Tag Size.
4) Source Code For A Working Example :
Code is too long.. see this link below:
https://gist.github.com/Valerio-Charles-VC1/657054b773dba9ba1cbc
Hope it helps. VC

How to use cache on Movieclip with createJS and Flash CC

Hi all I want to ask something. I've made a game with Flash CC and createJS. it's a Drag and drop game (3 object for drag, and 3 object for drop) and a lot of vector movieclip object. But when I run it in mobile, the game looks like have a performance issues. I've read some article that talk about caching the object. But I'm really dont know anything about cache and don't know how to use it on an object like movieclip. Do you have any explanation or solution or maybe a tutorial how to use cache function? Thank you very much.
From the docs:
Draws the display object into a new canvas, which is then used for subsequent draws. For complex content that does not change frequently (ex. a Container with many children that do not move, or a complex vector Shape), this can provide for much faster rendering because the content does not need to be re-rendered each tick. The cached display object can be moved, rotated, faded, etc freely, however if its content changes, you must manually update the cache by calling updateCache() or cache() again. You must specify the cache area via the x, y, w, and h parameters. This defines the rectangle that will be rendered and cached using this display object's coordinates.
http://createjs.com/Docs/EaselJS/classes/DisplayObject.html#method_cache
So, you don't want to cache a playing MovieClip (you would have to update the cache every frame, which is slow). However, you could cache elements in the MC that are just being transformed.
For example, an animation of a walking character, with complex vector shapes for the arms, legs, head, and body that are being transformed (scaled, rotated, translated) to create the walk animation. You wouldn't cache the character MC, but you could cache the body parts themselves.

Netstream and step() or seek()?

I'm on an AS3 project, playing a video (H264). I want, for some special reasons, to go to a certain position.
a) I try it with NetStream.seek(). There it only goes to keyframes. In my current setting, this means, i can find a position every 1 second. (for a better resolution, i'd have to encode the movie with as many keyframes as possible, aka every frame a keyframe)
this is definetly not my favourite way, because I don't want to reencode all the vids.
b) I try it with NetStream.step(). This should give me the opportunity to step slowly from frame to frame. But in the documentation it says:
This method is available only when data is streaming from Flash Media Server 3.5.3 or higher and when NetStream.inBufferSeek is true.
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStream.html#step()
Does this mean, it is not possible with Air for Desktop? When I try it, nothing works.
Any suggestions, how to solve this problem?
Greetings & Thank you!
Nicolas
Flash video can only be advanced by seconds unless you have Flash Media Server hosting your video. Technically, that means that you can have it working as intended in Air, however, the video would have to be streaming (silly adobe...).
You have two options:
1) Import the footage as a movieclip. The Flash IDE has a wizard for this, and if you're developing exclusively in non-FlashIDE environment, you can convert and export as an external asset such as an SWF or SWC. This would then be embedded or runtime loaded into your app giving you access to the per-frame steppable methods of MovieClip. This, however, does come with some audio syncing issues (iirc). Also, scrubbing backwards is not an MC's forté.
2) Write your own video object that loads an image sequence and displays each frame in order. You'd have to setup your own audio syncing abilities, but it might be the most direct solution apart from FLVComponent or NetStream.
I've noticed that flash player 9 scrubs nice and smooth but in players 10+ I get this no scrub problem.
My fix, was to limit frequency the calls to the seek function to <= 200ms. This fixed scrubbing but is much less smooth as player 9. Perhaps because of the "Flash video can only be advanced by seconds" limitation? I used a timer to tigger the function that calls seek() for the video.
private var scrubInterval:Timer = new Timer(200);
private function videoScrubberTouch():void {
_ns.pause();
var bounds:Rectangle = new Rectangle(0,0,340,0);
scrubInterval.addEventListener(TimerEvent.TIMER, scrubTimeline);
scrubInterval.start();
videoThumb.startDrag(false, bounds);
}
private function scrubTimeline(e:TimerEvent):void {
var amt:Number = Math.floor((videoThumb.x / 340) * duration);
trace("SCRUB duration: "+duration+" videoThumb.x: "+videoThumb.x+" amt "+amt);
_ns.seek(amt);
}
Please check this Demo link (or get the SWF file to test outside of browser via desktop Flash Player).
Note: Demo requires FLV with H.264 video codec and AAC or MP3 audio codec.
The source code for that is here: Github link
In the above demo there is (bytes-based) seeking and frame by frame stepping. The functions you want to study mainly are:
Append_SEEK ( position amount ) - This will got to the specified position in bytes and search for the nearest available keyframe.
get_frame_TAG - This will extract a tag holding one frame of data. Audio can be in frames too but lets assume you have video-only. That function is your opportunity to adjust timestamps. When it's run it will also append the tag (so each "get_frame_TAG" is also a "frame step").
For example : You have a 25fps video, you want the third-frame at 4 seconds into playback...
1000 milisecs / 25 fps = 40 units for each timestamp. So 4000 ms == 4 secs + add the 40 x 3rd frame == an expected timestamp of 4120.
So getting that frame means... First find a keyframe. Then step through each frame checking the timestamps that represent a frame you want. If it isnt then change it to the same as most recent keyframe timestamp (this forces Flash to fast-forward through the frames to keep things in sync as it assumes the frame [with smaller than expected timestamp] should have been played by that time). You can "hide" the video object during this process if you don't like the look of fast-forwarding.