How to step frame-by-frame without skipping any, with HLS playback h.264 mpeg-ts across browsers? - cross-browser

I don't seem to be able to get to certain frames like 00:00:04:24 with
// javascript
videojs("player").currentTime(4.96); // for 25 fps video
And this is happening at every second of any video with closed GOP length 25.
All I wanted to achieve was to step frame-by-frame (without skipping any) through the video when my users feel the need to.
Things I'm using:
videojs 5.11.9 / 5.19.1
videojs-contrib-hls 4.0.3 / 5.4.1
videojs-framebyframe
Chrome 57.0.2987.133 (64-bit) / Firefox 51.0.1 (32-bit)
Windows 7 (64-bit)
Details (Chrome)
I have the timecode overlaid on the video I'm working with. With luck, I could pause the video at the 'missing' frames, meaning the frames are there. For example, 00:00:03:24. If I call currentTime() now
videojs("player").currentTime(); // returns **4.002128**
Then if I call
videojs("player").currentTime(4.002128);
That brings me to 00:00:04:00. The same goes with all of these calls which stayed at 00:00:04:00.
videojs("player").currentTime(4); // should be **00:00:03:23**
videojs("player").currentTime(4.04); // should be **00:00:03:24**
videojs("player").currentTime(4.08); // should be **00:00:04:00**, correct
But this
videojs("player").currentTime(3.999999999999999);
will bring me to 00:00:03:23, effectively skipping 00:00:03:24 entirely.
Curiously enough, when I changed the GOP length to 50, the same thing happens only every 2 seconds. Meaning I could jump to 00:00:02:24, 00:00:04:24, but still not 00:00:03:24.
The layout of a closed GOP length 25 (GOP M=2, N=25) looks like this for every second:
IBPBPBPBPBPBPBPBPBPBPBPBP
and for a closed GOP length 50 (GOP M=2, N=50) looks like this for every second:
IBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPP
Which brings me to suspect the P-frame at the end of each GOP is acting up.
With limited knowledge of how truly GOP works myself, I searched for other possible layout of the I/P/B frames. Then I saw open GOP. Using ffmpeg I re-encode the video with -flags -cgop as below
open GOP M=5, N=25
IBBBBPBBBBPBBBBPBBBBPBBBB
or open GOP M=2, N=50
IBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPBPB
Both of these open GOP videos have no problem jumping to any frame. With one other problem...
Problem with the open GOP 'solution'
The HLS playlist, instead of only downloading the required/playing portion of the video, now tries to download one clip after another right from the start. Maybe because now the B-frame at the end of each GOP asks for the next GOP?
Searching for more
While typing out this question, I searched for more and came across this post at bugs.chromium.org. People in that post suggested 2 things:
to re-encode the video w/o B-frames for Chrome
IE11, Safari8, Firefox34 back then seek frames accurately
Here goes testing the first suggestion.
closed GOP M=1, N=25
IPPPPPPPPPPPPPPPPPPPPPPPP
and closed GOP M=1, N=50
IPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
Both worked! (not so fast, doesn't work on Firefox)
Details (Firefox)
Remember the second suggestion from that post at bugs.chromium.org? At first, when I tried Firefox with any GOP length 25 video, they were all good (at seeking frame accurately). But it has similar problem as Chrome when dealing with GOP length 50 (both when M=1 and M=2) and GOP length 25 (M=1).
videojs("player").currentTime(1.96); // seeks to **00:00:01:24** correctly
videojs("player").currentTime(2.04); // seeks to **00:00:02:01** correctly
but this
videojs("player").currentTime(2); // should be **00:00:02:00**
gave me 00:00:02:00 or 00:00:02:01 randomly (biased towards the latter, while repeatedly calling only this line).
One easier way to spot this is to continuously step 25 frames ahead with videojs-framebyframe. You would get 00:00:02:01, 00:00:03:01, 00:00:04:01, and then suddenly a 00:00:05:00. Stop there and step 1-frame back and forth and you might not be able to get back to 00:00:05:00 every time.
edited portion starts
More searching data before the questions
Came across this other question. It's almost two years old. With a still working downloadable sample video of variable GOP length.
Its layout (first few GOPs)
IBPBPBPBPPPPPPPPPP
IPPP
IPPBPBPBPPBPPPPPPPP
IBPPPPPBPP
IPBPPPP
IBPPPPPPBPPBPPBPBPPPPBPBP
Here's what I use to show the layout of frame types
// command prompt or bash
ffprobe -show_frames -select_streams 0 video.ts | awk -f iframe.awk
and here's the small awk program (iframe.awk)
BEGIN {
FS="=";i="";
}
/pict_type/{
if (match($2, "I")) {
print i;
i=$2;
} else {
i=i$2;
}
}
END {
print i;
}
edited portion ends
Questions
Sorry for the long post, and thanks for reading it through.
Does this mean there are things needed to be fixed? (bugs?)
Did videojs play a part in creating this bug?
Is writing our own player going to help?
With current browsers/scripts releases, is it possible to seek frame-by-frame with HTML5 videojs HLS player across browsers (well, Chrome and Firefox at least) with one format?
(added question) Any agrees or dis-agrees, at all?
(added question) Is this a bad question as a whole? Too long?
Appreciate any sort of help and/or pointers to solve the problem! Thanks!

Related

Can't set the frame rate when recording a video with VecVideoRecorder

I have a working RL model and set up that produce a video for me - however becuase the model is reasonably good, the videos are very short (reach a desitination therfore better = shorter)
Is there a way to drop the frame rate of the video output? I know it can be done with a gif. And that it can be done with ffmpeg but I can't workout how to pass it down.
I've dropped the fps in my environment from 50>10 expecting the video to be 5 times as long but that didn't work.
Save me stackoverflow you're my only hope. (appart from posting on github)
When you say that you dropped the fps from 50 to 10, I assume you changed env.metadata.video.frames_per_second, which initializes to 50 (for gym versions less than 0.22.0):
{
'render.modes': ['human', 'rgb_array'],
'video.frames_per_second': 50
}
Starting with gym 0.22.0, VideoRecorder class gets the frames_per_sec to write the video with:
env.metadata.get("render_fps", 30)
Setting env.metadata["render_fps"] = 4 should result in a slowed down video.

Chrome: Wrong sound when changing the audio source for Audio element and MediaStreamAudioDestinationNode

I have a app where I play different code-generated sounds. I place these sounds in a AudioBufferSourceNode.
I allow the the user to choose what output device to play the sound through, so I use a MediaStreamAudioDestinationNode with its stream used as the source for an Audio Element. This way when the user chooses an audio output to play the sound to, I set the Sink Id of the Audio element to the requested audio output.
So I have AudioBufferSourceNode -> some Audio Graph (gain nodes, etc) -> MediaStreamAudioDestinationNode -> Audio element.
When I Play the first sound, it sound fine. But when I create a new source and connect it to the same MediaStreamAudioDestinationNode, the sound is played with the wrong pitch.
I created a Fiddle that shows the problem.
Is this a bug, or am I doing something wrong?
The problem was identified based on the OP Chrome Ticket.
It seems to come from the lack of sync between AudioElement and its source AudioNode (AudioBufferSourceNode, OscillatorNode, etc.) when you pause the source and play it back again.
The solution is to always call AudioElement.pause() and AudioElement.start() alongside your source stop and start.
https://jsfiddle.net/k1r7o0xj/3/
It's possible to dynamically change your graph layout by using .connect() and .disconnect(), even when audio is playing or sent through a stream (which could even be streamed over WebRTC).
I couldn't find a reference in the spec, so I'm pretty sure this is taken for granted.
For example, if you have two AudioBufferSourceNodes bufferSource1 and bufferSource2, and a MediaStreamAudioDestinationNode streamDestination:
bufferSource1.connect(streamDestination);
//do some other things here, and after some time, switch to bufferSource2:
//(streamDestination doesn't need to be explicitly specified here)
bufferSource1.disconnect(streamDestination);
bufferSource2.connect(streamDestination);
Example in action.
Edit 1:
Proper implementation:
According to the Editors Draft on the Audio Output API, it is planned/will be possible to choose a custom audio output device for the AudioContext as well (by means of new AudioContext({ sinkId: requestedSinkId });). I couldn't find any info on the progress, and even found a related discussion which the asker apparently read already. According to this and (many) other references, it doesn't seem te be an easy task, but it's planned for WA V1.
Edit:
That section has been removed from the API Draft, but you can still find it in an older version.
Current workaround:
I played around with your workaround (using a MediaStreamAudioDestinationNode and Audio object), and it seems to be related to nothing being connected. I modified my example to toggle a single buffer (similar to your example but with an AudioBufferSourceNode), and observed a similar frequency drop. However, when using a GainNode inbetween and setting it's gain.value to either 0 or 1, the frequency drops disappeared (this isn't gonna be the solution if you want to create and connect new AudioBuffers dynamically).

Play same base64 data concurrently multiple times

I am making an all-client-side audio/music editor. I have created a few tones mathematically that are stored as base64 in the <audio> src-attribute. I can play DIFFERENT tones at the same time BUT, I can only play ONE instance of ONE specific tone at the same time.
For example clicking the key to play C like crazy will sound very awkward since the C that was playing gets stopped and the new C starts. I would like there to be possibility to play several C tones at the same time!
Now I guess this could be made by having by simple copying the audio element (one or more times) and make the keypress, sort of, cycle through them. For example if the first C tone is playing and the key to play C is clicked, then play the second C audio element, and so on and so forth.
That would work... but since I am using base64 in the source I would also have to have that copied.
<audio id="C1"><source src="data:audio/wav;base64,audio_data"></source></audio>
<audio id="C2"><source src="data:audio/wav;base64,audio_data"></source></audio>
If "audio_data" would be really long then the html would become humongous and also I think that the browser would not understand that both are actually the exactly same data, so it would be come very unoptimized.
So to the concrete question: Is there a way to play the same base64 data several times at the same time without the need of copying the whole src-attribute with the base64 string in it? (Since my application is all-client so I have not the ability to save the data to a sound-file on a server)
See a simple example. It might not work in other browsers than Firefox because I have not tested:
https://jsfiddle.net/tx3hpptL/

How does Youtube's HTML5 video player control buffering?

I was watching a youtube video and I decided to investigate some parts of its video player. I noticed that unlike most HTML5 video I have seen, Youtube's video player does not do a normal video source and instead utilizes a blob url as the source.
Previously I have tested HTML5 videos and I found that the server starts streaming the whole video from the start and buffers in the background the complete rest of the video. This means that if your video is 300 megs, all 300 megs will be downloaded. If you seek to the middle, it will start downloading from the seek position all the way to the end.
Youtube does not work this way (at least in chrome). Instead it manages to control buffering so it only buffers a certain amount while paused. It also seems to only buffer the relevant pieces, so if you skip around it will make sure not to buffer pieces that are unlikely to be watched.
In my attempts to investigate how this worked, I noticed the video src tag has a value of blob:http%3A//www.youtube.com/ee625eee-2802-49b2-a13f-eb374d551d54, which pointed me to blobs, which then led me to typed arrays. Using those two resources I am able to load a mp4 video into a blob and display it in a HTML5 video tag.
However, what I am now stuck on is how Youtube deals with the pieces. Looking at the network traffic it appears to sends requests to http://r6---sn-p5q7ynee.c.youtube.com/videoplayback which returns binary video data back in chunks of 1.1mb. It also seems worth noting that most normal requests due to HTML5 video requests seem to receive a 206 response code back while it streams, yet youtube's playvideo calls get a 200 back.
I tried to attempt to only load a range of bytes (via setting the Range http header) which unfortunately failed (I'm assuming because there was no meta-data for the video coming with the video).
At this point I'm stuck on figuring out how Youtube accomplishes this. I came up with several ideas though none of which I am completely sold on:
1) Youtube is sending down self contained video and audio chunks with each /videoplayback call. This seems like a pretty heavy burden on the upload side and it seems like it would be difficult to stitch these together to make it appear like it's one seemless video. Also, the video tag seems to think it's one full video, judging from calling $('video').duration and $('video').currentTime, which leads me to believe that the video tag thinks it's a single video file. Finally, the vidoe src tag never changes which makes me believe it is working with a singular blob and not switching out blobs.
2) Youtube constructs an empty blob pre-sized to the full video array and updates the blob with pieces as it downloads it. It would then make sure the user has not gotten too close to the last downloaded piece (to prevent the user from entering an undownloaded section of the blob). The problem that I see with this that I don't see any way to dynamically update a blob through javascript (although maybe I'm just having trouble googling for it)
3) Youtube downloads the meta data and then starts constructing the blob in order by appending the video pieces as it downloads them. The problem I see with this method is I don't understand how it would handle seeks in post-buffered territory.
Maybe I"m just missing an obvious answer that's right in front of me. Anyone have any ideas?
edit: I just thought of a fourth option. Another idea is they might use the file API to write the binary chunks to a file and use that file to stream off of. The file API seems to have the ability to seek to specific positions, therefore allowing you to fill a video with empty bytes and fill them in as they are received. This would definitely accommodate video seeking as well.
Okay, so few things you need to know is that YouTube is based on this great open source Project. It behaves different for every browser and if your browser supports more intensive decoding like WEBM it will use that to save Google's bandwidth. Also if you look at this Demo
Then you will find a section which downloads the entire video into a thing called "offline storage". I know chrome has it and some other browsers not every in some cases they do have to use the entire video source instead of a blob. So that blob is streaming depending on the user interaction with the video. Yes the video is just 1 file and they have metadata for that video like a little database that tells the time of the video and the points at which chunks can be divided in.
You can find out more by reading the Project's documentation. I really recommend you have a look at the demo.
When you look at the AppData of GoogleChrome, while playing a youtube video, you will see that it buffers in segmented files. The videos uploaded to youtube are segmented, which is why you can't perfectly pinpoint a timeframe in the first click on the bar if that timeframe is outside of the current segment.
The amount of segments depends on the length of the video, and the time from which you start and stop playing back the video.
When you are linked to a timeframe of a video, it will simply skip the buffering of the segments that come before that timeframe.
Unfortunately I don't know much about the coding for video playback, but I hope this points you in the right direction.
there is a canvas element in the page ,Maybe This Will Help
http://html5doctor.com/video-canvas-magic/
we knew the video is been segmented,the question is how to stitch them together.i think the real video element doesn't do the play work,it support the datasource,and draw the seagments each frame to the canvas element。
var v = document.getElementById('v');
var canvas = document.getElementById('c');
v.addEventListener('play', function(){
if(v.paused || v.ended) return false;
c.drawImage(v,0,0,w,h);
setTimeout(draw,20,v,c,w,h);
},false);
Youtube is using this feature only in browsers that support Media Source Extensions so it is up to the browser decide about all the rest because of this feature.

Netstream and step() or seek()?

I'm on an AS3 project, playing a video (H264). I want, for some special reasons, to go to a certain position.
a) I try it with NetStream.seek(). There it only goes to keyframes. In my current setting, this means, i can find a position every 1 second. (for a better resolution, i'd have to encode the movie with as many keyframes as possible, aka every frame a keyframe)
this is definetly not my favourite way, because I don't want to reencode all the vids.
b) I try it with NetStream.step(). This should give me the opportunity to step slowly from frame to frame. But in the documentation it says:
This method is available only when data is streaming from Flash Media Server 3.5.3 or higher and when NetStream.inBufferSeek is true.
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStream.html#step()
Does this mean, it is not possible with Air for Desktop? When I try it, nothing works.
Any suggestions, how to solve this problem?
Greetings & Thank you!
Nicolas
Flash video can only be advanced by seconds unless you have Flash Media Server hosting your video. Technically, that means that you can have it working as intended in Air, however, the video would have to be streaming (silly adobe...).
You have two options:
1) Import the footage as a movieclip. The Flash IDE has a wizard for this, and if you're developing exclusively in non-FlashIDE environment, you can convert and export as an external asset such as an SWF or SWC. This would then be embedded or runtime loaded into your app giving you access to the per-frame steppable methods of MovieClip. This, however, does come with some audio syncing issues (iirc). Also, scrubbing backwards is not an MC's forté.
2) Write your own video object that loads an image sequence and displays each frame in order. You'd have to setup your own audio syncing abilities, but it might be the most direct solution apart from FLVComponent or NetStream.
I've noticed that flash player 9 scrubs nice and smooth but in players 10+ I get this no scrub problem.
My fix, was to limit frequency the calls to the seek function to <= 200ms. This fixed scrubbing but is much less smooth as player 9. Perhaps because of the "Flash video can only be advanced by seconds" limitation? I used a timer to tigger the function that calls seek() for the video.
private var scrubInterval:Timer = new Timer(200);
private function videoScrubberTouch():void {
_ns.pause();
var bounds:Rectangle = new Rectangle(0,0,340,0);
scrubInterval.addEventListener(TimerEvent.TIMER, scrubTimeline);
scrubInterval.start();
videoThumb.startDrag(false, bounds);
}
private function scrubTimeline(e:TimerEvent):void {
var amt:Number = Math.floor((videoThumb.x / 340) * duration);
trace("SCRUB duration: "+duration+" videoThumb.x: "+videoThumb.x+" amt "+amt);
_ns.seek(amt);
}
Please check this Demo link (or get the SWF file to test outside of browser via desktop Flash Player).
Note: Demo requires FLV with H.264 video codec and AAC or MP3 audio codec.
The source code for that is here: Github link
In the above demo there is (bytes-based) seeking and frame by frame stepping. The functions you want to study mainly are:
Append_SEEK ( position amount ) - This will got to the specified position in bytes and search for the nearest available keyframe.
get_frame_TAG - This will extract a tag holding one frame of data. Audio can be in frames too but lets assume you have video-only. That function is your opportunity to adjust timestamps. When it's run it will also append the tag (so each "get_frame_TAG" is also a "frame step").
For example : You have a 25fps video, you want the third-frame at 4 seconds into playback...
1000 milisecs / 25 fps = 40 units for each timestamp. So 4000 ms == 4 secs + add the 40 x 3rd frame == an expected timestamp of 4120.
So getting that frame means... First find a keyframe. Then step through each frame checking the timestamps that represent a frame you want. If it isnt then change it to the same as most recent keyframe timestamp (this forces Flash to fast-forward through the frames to keep things in sync as it assumes the frame [with smaller than expected timestamp] should have been played by that time). You can "hide" the video object during this process if you don't like the look of fast-forwarding.