Explain the result of Microsoft video API json for Face detection and tracking - json

How to plot the detected faces on to the video frames from the result json of face detection and tracking. I mean, how to calculate the frame number for particular event in the Json file.

This give some details in case you hadn't seen it.
In essence, the video is divided into one or more fragments, and each fragment is divided in to intervals. There will one event per interval. The times and durations of fragments and intervals are expressed in ticks, which you can convert to time by dividing by the timescale. You can map frames to/from times using the framerate.

Related

Downsampling possible within RTSP/RTP?

I have a media server serving several cameras.
I'd like for the server to downsample the data from, say, 20 fps to 1 fps.
Obviously I could do this by decoding and recoding the video frames - however, the server is a little resource constrained. I notice that if I simply drop RTP UDP packets, the output is not so good - I see both tearing and junk in the images (at least with a opencv/ffmpeg client).
Is it possible to downsample within an RTP stream by dropping more carefully chosen frames/packets, to avoid junk and tearing in the output? (Currently I'm able to extract RTP|H264 raw data chunks on the server, but am not running them through a full codec).
An H.264 stream consists of different frame types: I (or IDR), P and B.
I (or IDR) frames are a full pictures and can be decoded without any other frames.
So you could filter out P and B frames and only pass on I frames.
Your resulting frame rates depends on the I (or IDR) frame frequency of the original stream. I am guessing you get somewhere between 0.1 to 2 fps.

Deconstructing slot machine reels?

I have a movie of slot machine game play. How to extract only movie frames when the reels are stopped? During spinning game shows fake symbols, which are not part of the game mathematics. Until now I am doing it manually (by screen shots), which takes too much time and it will be nice to be automated.
I know how to do image processing of single images and I get segments of symbols for each reel. Can you suggest me an algorithm with which to connect different segments and to deconstruct original strips? It is like a puzzle solving, but without clear information for the number of pieces and how exactly they match.

Reduce FPS on OSMF stream - Issue with MPEG-2 header

I've been searching all over and can't find a solution. I have a 25 FPS video that I'm playing on OSMF, but OSMF insists on playing with 29-31 FPS. This causes the video to play ~15% faster than real time. The result is extremely noticeable if you open the same video in VLC and play it side by side.
The problem comes in when I try to do a live stream. It will eat through the buffer and catch up to real time then the stream crashes because there's no new video waiting.
I've tried tracing the code to find out where the frames are actually output to the screen, but I hit a dead end at the SWC file. I also have tried searching online but I can't find anything about limiting the FPS - everyone is just interested in increasing it.
I'd rather play at 15 FPS and drop 10 frames per second than catch up to real time and crash tragically.
Edit - after an entire weekend spent staring at this issue I've made some incredible headway. First and foremost, the only way to limit the FPS in OSMF is by sending a custom FLV header with the timestamp set appropriately (1000 / FPS difference between each frame)
Realizing this I could solve this issue I'm having temporarily by manually setting the timestamps based on an internal counter. Each time a frame is processed set timestamp = last_timestamp + 40;. The problem is that I don't know if video will always be 25 FPS. Some day I may have 30 FPS or even 60 FPS video streams. To make this more robust I decided to decode the MPEG-2 header (read the PTS value) and convert it to an FLV header.
Now here's the issue… This video file (theoretically 25 FPS) plays perfectly in QuickTime. As a result I know the headers are fine because an expensive piece of software with billions of dollars behind it properly calculated the frame rate. But when I read the PTS from the header (as per this SO post) and divide by 90 (convert 90Khz clock to millisecond timestamp) each timestamp is 33 or 34 milliseconds apart - the 29~31 FPS I was getting.
So why is the PTS giving me timestamps that are 33-34 milliseconds apart when I know the video is 25 FPS (40 milliseconds apart)? More importantly, how is QuickTime reading the MPEG-2 header so that everything plays correctly?
First, to answer my original question:
Reducing FPS in OSMF
You must create your own implementation of the file handler which parses the video header and modifies the timestamps. There is no way to tell OSMF to play in "slow motion" or "fast forward". For an example of creating your own file handler, look at the FLVParser class. Notice that there are separate classes for parsing video and audio tags. The headers must be updated in EACH of these to ensure that video and audio play back in sync.
When a file is passed to the video parser, each timestamp is considered relative to the first. So if the first timestamp is 1234 then this will be set as "time zero" and all future timestamps will be relative to this. This is important. If you skip a video tag and the first timestamp you send is from later in the video, it will use the wrong value for "time zero" and things will not be sync'd properly.
This bring us to…
My issue
First and foremost, the durations in the M3U8 did not match up with the sum of the differences of timestamps. Starting at the first timestamp and labeling it "time zero", then looking at the last time stamp and subtracting time zero from it, the resulting time span was not equal to the expected duration of the TS file.
It turns out when VLC or QuickTime encounter this situation (the sum of PTS values does not equal the duration of the video) they generate new headers on the fly. Take the duration, divide by the number of frames, there's your new offset between PTS values.
That was my first problem. Once that was solved I was no longer gaining a second every 10 seconds, but instead gaining a second every 2 minutes. Turns out I was also loading the next TS file a bit too early (off-by-one error) which was causing me to drop a packet from each TS. This led to losing one frame every 10 seconds.
Additional Info
I ran into another issue with FPS shortly after this which I have found a solution for. Anyone experiencing OSMF playing videos too quickly, I urge you to grep your code for bufferTimeMax.
When bufferTimeMax > 0 and bufferLength >= bufferTimeMax, audio plays faster until bufferLength reaches bufferTime. If a live stream is video-only, video plays faster until bufferLength reaches bufferTime.
Depending on how much playback is lagging (the difference between bufferLength and bufferTime), Flash Player controls the rate of catch-up between 1.5% and 6.25%. If the stream contains audio, faster playback is achieved by frequency domain downsampling which minimizes audible distortion.
6.25% = 0.975 seconds per 15 (gaining a second every 10~20)
You would need to set the FPS manually.
In Flash Professional, you can change the framerate in Document Properties.
In Flash Builder, you can set the framerate using metadata, as described here and here.
Thus...
[SWF(frameRate='15')]

Obtain the result ByteArray of the current playing sounds

I am developing an AIR application for desktop that simulate a drum set. Pressing the keyboard will result in a corresponding drum sound played in the application. I have placed music notes in the application so the user will try to play a particular song.
Now I want to record the whole performance and export it to a video file, say flv. I have already succeed in recording the video using this encoder:
http://www.zeropointnine.com/blog/updated-flv-encoder-alchem/
However, this encoder does not have the ability to record sound automatically. I need to find a way to get the sound in ByteArray at that particular frame, and pass it to the encoder. Each frame may have different Sound objects playing at the same time, so I need to get the bytes of the final sound.
I am aware that SoundMixer.computeSpectrum() can return the current sound in bytes. However, the ByteArray returned has a fixed length of 512, which does not fit in the requirement of the encoder. After a bit of testing, with a sample rate 44khz 8 bit stero, the encoder expects the audio byte data array to have a length of 5880. The data returned by SoundMixer.computeSpectrum() is much much shorter than the encoder required.
My application is running at 60FPS, and recording at 15FPS.
So my question is: Is there any way I can obtain the audio bytes in the current frame, which is mixed by more than one Sound objects, and has the data length enough for the encoder to work? If there is no API to do that, I will have to mix the audio and get the result bytes by myself, how can that be done?

AS3: How to access pixel data efficiently?

I'm working a game.
The game requires entities to analyse an image and head towards pixels with specific properties (high red channel, etc.)
I've looked into Pixel Bender, but this only seems useful for writing new colors to the image. At the moment, even at a low resolution (200x200) just one entity scanning the image slows to 1-2 Frames/second.
I'm embedding the image and instance it as a Bitmap as a child of the stage. The 1-2 FPS situation is using BitmapData.getPixel() (on each pixel) with a distance calculation beforehand.
I'm wondering if there's any way I can do this more efficiently... My first thought was some sort of spatial partioning coupled with splitting the image up into many smaller pieces.
I also feel like Pixel Bender should be able to help somehow, however I've had little experience with it.
Cheers for any help.
Jonathan
Let us call the pixels which entities head towards "attractors" because they attract the entities.
You describe a low frame rate due to scanning for attractors. This indicates that you may possibly be scanning an image at every frame. You don't specify whether the image scanned is static or changes as frequently as, e.g., a video input. If the image is changing with every frame, so that you must re-calculate attractors somehow, then what you are attempting is real-time computer vision with the ABC Virtual Machine, please see below.
If you have an unchanging image, then the most important optimization you can make is to scan the image one time only, then save a summary (or "memoization") of the locations of the attractors. At each rendering frame, rather than scan the entire image, you can search the list or array of known attractors. When the user causes the image to change, you can recalculate from scratch, or update your calculations incrementally -- as you see fit.
If you are attempting to do real-time computer vision with ActionScript 3, I suggest you look at the new vector types of Flash 10.1 and also that you look into using either abcsx to write ABC assembly code, or use Adobe's Alchemy to compile C onto the Flash runtime. ABC is the byte code of Flash. In other words, reconsider the use of AS3 for real-time computer vision.
BitmapData has a getPixels method (notice it's plural). It returns a byte array of all the pixels which can be iterated much faster than a for loop with a call to getPixel inside, nested inside another for loop . Unfortunately, bytearrays are, as their name implies, 1 dimensional arrays of bytes, so iterating each pixel(4 bytes) requires using a for loop, not a foreach loop. You can access each pixel's color channel individually by default, but this sounds like what you want (find pixels with a "high red channel"), so you won't have to bitwise-and each pixel value to isolate a particular channel.
I read somewhere that getPixel is very slow, so that's where I figured you'd save the most. I could be wrong, so it'd be worth timing it.
I would say Heath Hunnicutt's anwser is a good one. If the image doesnt change just store all the color values in a vector. or byteArray of whatever and use it as a lookup table so you don't need to call getPixel() every frame.