What governs playback speed when encoding with Android's MediaCodec + mp4parser? - android-mediacodec

I'm trying to record, encode and finally create a short movie on Android (using API 16) with a combination of MediaCodec and Mp4Parser (to encapsulate into .mp4).
Everything is working just fine, except for the duration of the .mp4: its always 3 seconds long - and runs at about twice the 'right' speed.
The input to encoder is 84 frames (taken 100ms apart).
The last frame sets the 'end of stream' flag.
I set the presentation time for each frame on queueInputBuffer
I've tried to tweak every conceivable parameter - but nothing seems to make a difference - the film is always 3 seconds long - and always played way too fast.
So what governs the playback seepd? how do I generate a film with 'natuarl' speed?

I figured it out: When encapsulating with mp4parser (needed if you target API<18), you need to set the speed in mp4parser's API. The presentation time you provide to queueInputBuffer appearently make no difference if you're not using Android's built-in muxer (available only from API18).
I stumbled on this question on github, which indicates the following is required:
new H264TrackImpl(new FileDataSourceImpl(rawDataFile), "eng", 100, 10);
the last two parameters (timeScale and frameTick) set the playback speed to 'noraml'.

Related

Chrome Dev tools .har file for _webSocketTraffic has a "time" field - what does it mean?

I am trying to understand the websocketTraffic data exported from my Chrome dev tools. An example looks like this:
{
'type': 'receive',
'time': 1640291138.212745,
'opcode': 1,
'data': '<r xmlns=\'urn:xmpp:sm:3\'/>',
}
I see a "time" field but I actually cant find anything about what it means except this from the spec (http://www.softwareishard.com/blog/har-12-spec/):
time [number] - Total elapsed time of the request in milliseconds. This is the sum of all timings available in the timings object (i.e. not including -1 values) .
Is this really milliseconds, down to the millionth of a millisecond? I am trying to see how much time has elapsed between two WS events, so any insight would be very helpful. Thanks
Disclaimer:
This answer is not backed by official docs. However, I studied this problem for quite some time now, and my solution seems to make sense.
Answer:
Move the dot 3 places to the right, (i.e 1640291138.212745 -> 1640291138212.745) and you will get the actual time. Try to run this
new Date(1640291138212.745).toISOString()
and see if it fits your startedDateTime in the parent WebSocket entry in your har.
Probably Chrome saves the "time" field as seconds since epoch, instead of milliseconds since epoch. So "moving the dot 3 places to the right" actually means to multiply by a 1000 and that means converting to milliseconds.

Issue with delta time server-side

How can I make server-side game run the same on every machine, because when I use server's delta time it works different on every computer/phone.
Would something called 'fixed timestep' help me ?
Yes fixed timestep can help you. But also simple movement with a delta can help you.
Fixed timestep commonly using with a physics because sometimes physics needs to be update more often (120-200hz)than game's render method.
However you can still use fixed timestep without physic.
You just need to interpolate your game objects with
lerp(oldValue, newValue, accumulator / timestep);
In your case probably small frame rate differences causes unexpected results.
To avoid that you should use movement depends delta.
player.x+=5*60*delta;//I assume your game is 60 fps
Instead of
player.x+=5;
So last delta will be only difference between machines.And its negligible since delta difference between 60 and 58 fps is only ~0.0005 secs

understanding getByteTimeDomainData and getByteFrequencyData in web audio

The documentation for both of these methods are both very generic wherever I look. I would like to know what exactly I'm looking at with the returned arrays I'm getting from each method.
For getByteTimeDomainData, what time period is covered with each pass? I believe most oscopes cover a 32 millisecond span for each pass. Is that what is covered here as well? For the actual element values themselves, the range seems to be 0 - 255. Is this equivalent to -1 - +1 volts?
For getByteFrequencyData the frequencies covered is based on the sampling rate, so each index is an actual frequency, but what about the actual element values themselves? Is there a dB range that is equivalent to the values returned in the returned array?
getByteTimeDomainData (and the newer getFloatTimeDomainData) return an array of the size you requested - its frequencyBinCount, which is calculated as half of the requested fftSize. That array is, of course, at the current sampleRate exposed on the AudioContext, so if it's the default 2048 fftSize, frequencyBinCount will be 1024, and if your device is running at 44.1kHz, that will equate to around 23ms of data.
The byte values do range between 0-255, and yes, that maps to -1 to +1, so 128 is zero. (It's not volts, but full-range unitless values.)
If you use getFloatFrequencyData, the values returned are in dB; if you use the Byte version, the values are mapped based on minDecibels/maxDecibels (see the minDecibels/maxDecibels description).
Mozilla 's documentation describes the difference between getFloatTimeDomainData and getFloatFrequencyData, which I summarize below. Mozilla docs reference the Web Audio
experiment ; the voice-change-o-matic. The voice-change-o-matic illustrates the conceptual difference to me (it only works in my Firefox browser; it does not work in my Chrome browser).
TimeDomain/getFloatTimeDomainData
TimeDomain functions are over some span of time.
We often visualize TimeDomain data using oscilloscopes.
In other words:
we visualize TimeDomain data with a line chart,
where the x-axis (aka the "original domain") is time
and the y axis is a measure of a signal (aka the "amplitude").
Change the voice-change-o-matic "visualizer setting" to Sinewave to
see getFloatTimeDomainData(...)
Frequency/getFloatFrequencyData
Frequency functions (GetByteFrequencyData) are at a point in time; i.e. right now; "the current frequency data"
We sometimes see these in mp3 players/ "winamp bargraph style" music players (aka "equalizer" visualizations).
In other words:
we visualize Frequency data with a bar graph
where the x-axis (aka "domain") are frequencies or frequency bands
and the y-axis is the strength of each frequency band
Change the voice-change-o-matic "visualizer setting" to Frequency bars to see getFloatFrequencyData(...)
Fourier Transform (aka Fast Fourier Transform/FFT)
Another way to think about "time domain vs frequency" is shown the diagram below, from Fast Fourier Transform wikipedia
getFloatTimeDomainData gives you the chart on on the top (x-axis is Time)
getFloatFrequencyData gives you the chart on the bottom (x-axis is Frequency)
a Fast Fourier Transform (FFT) converts the Time Domain data into Frequency data, in other words, FFT converts the first chart to the second chart.
cwilso has it backwards.
the time data array is the longer one (fftSize), and the frequency data array is the shorter one (half that, frequencyBinCount).
fftSize of 2048 at the usual sample rate of 44.1kHz means each sample has 1/44100 duration, you have 2048 samples at hand, and thus are covering a duration of 2048/44100 seconds, which 46 milliseconds, not 23 milliseconds. The frequencyBinCount is indeed 1024, but that refers to the frequency domain (as the name suggests), not the time domain, and it the computation 1024/44100, in this context, is about as meaningful as adding your birth date to the fftSize.
A little math illustrating what's happening: Fourier transform is a 'vector space isomorphism', that is, a mapping going bijectively (i.e., reversible) between 2 vector spaces of the same dimension; the 'time domain' and the 'frequency domain.' The vector space dimension we have here (in both cases) is fftSize.
So where does the 'half' come from? The frequency domain coefficients 'count double'. Either because they 'actually are' complex numbers, or because you have the 'sin' and the 'cos' flavor. Or, because you have a 'magnitude' and a 'phase', which you'll understand if you know how complex numbers work. (Those are 3 ways to say the same in a different jargon, so to speak.)
I don't know why the API only gives us half of the relevant numbers when it comes to frequency - I can only guess. And my guess is that those are the 'magnitude' numbers, and the 'phase' numbers are thrown out. The reason that this is my guess is that in applications, magnitude is far more important than phase. Still, I'm quite surprised that the API throws out information, and I'd be glad if some expert who actually knows (and isn't guessing) can confirm that it's indeed the magnitude. Or - even better (I love to learn) - correct me.

How to detect local maxima and curve windows correctly in semi complex scenarios?

I have a series of data and need to detect peak values in the series within a certain number of readings (window size) and excluding a certain level of background "noise." I also need to capture the starting and stopping points of the appreciable curves (ie, when it starts ticking up and then when it stops ticking down).
The data are high precision floats.
Here's a quick sketch that captures the most common scenarios that I'm up against visually:
One method I attempted was to pass a window of size X along the curve going backwards to detect the peaks. It started off working well, but I missed a lot of conditions initially not anticipated. Another method I started to work out was a growing window that would discover the longer duration curves. Yet another approach used a more calculus based approach that watches for some velocity / gradient aspects. None seemed to hit the sweet spot, probably due to my lack of experience in statistical analysis.
Perhaps I need to use some kind of a statistical analysis package to cover my bases vs writing my own algorithm? Or would there be an efficient method for tackling this directly with SQL with some kind of local max techniques? I'm simply not sure how to approach this efficiently. Each method I try it seems that I keep missing various thresholds, detecting too many peak values or not capturing entire events (reporting a peak datapoint too early in the reading process).
Ultimately this is implemented in Ruby and so if you could advise as to the most efficient and correct way to approach this problem with Ruby that would be appreciated, however I'm open to a language agnostic algorithmic approach as well. Or is there a certain library that would address the various issues I'm up against in this scenario of detecting the maximum peaks?
my idea is simple, after get your windows of interest you will need find all the peaks in this window, you can just compare the last value with the next , after this you will have where the peaks occur and you can decide where are the best peak.
I wrote one simple source in matlab to show my idea!
My example are in wave from audio file :-)
waveFile='Chick_eco.wav';
[y, fs, nbits]=wavread(waveFile);
subplot(2,2,1); plot(y); legend('Original signal');
startIndex=15000;
WindowSize=100;
endIndex=startIndex+WindowSize-1;
frame = y(startIndex:endIndex);
nframe=length(frame)
%find the peaks
peaks = zeros(nframe,1);
k=3;
while(k <= nframe - 1)
y1 = frame(k - 1);
y2 = frame(k);
y3 = frame(k + 1);
if (y2 > 0)
if (y2 > y1 && y2 >= y3)
peaks(k)=frame(k);
end
end
k=k+1;
end
peaks2=peaks;
peaks2(peaks2<=0)=nan;
subplot(2,2,2); plot(frame); legend('Get Window Length = 100');
subplot(2,2,3); plot(peaks); legend('Where are the PEAKS');
subplot(2,2,4); plot(frame); legend('Peaks in the Window');
hold on; plot(peaks2, '*');
for j = 1 : nframe
if (peaks(j) > 0)
fprintf('Local=%i\n', j);
fprintf('Value=%i\n', peaks(j));
end
end
%Where the Local Maxima occur
[maxivalue, maxi]=max(peaks)
you can see all the peaks and where it occurs
Local=37
Value=3.266296e-001
Local=51
Value=4.333496e-002
Local=65
Value=5.049438e-001
Local=80
Value=4.286804e-001
Local=84
Value=3.110046e-001
I'll propose a couple of different ideas. One is to use discrete wavelets, the other is to use the geographer's concept of prominence.
Wavelets: Apply some sort of wavelet decomposition to your data. There are multiple choices, with Daubechies wavelets being the most widely used. You want the low frequency peaks. Zero out the high frequency wavelet elements, reconstruct your data, and look for local extrema.
Prominence: Those noisy peaks and valleys are of key interest to geographers. They want to know exactly which of a mountain's multiple little peaks is tallest, the exact location of the lowest point in the valley. Find the local minima and maxima in your data set. You should have a sequence of min/max/min/max/.../min. (You might want to add an arbitrary end points that are lower than your global minimum.) Consider a min/max/min sequence. Classify each of these triples per the difference between the max and the larger of the two minima. Make a reduced sequence that replaces the smallest of these triples with the smaller of the two minima. Iterate until you get down to a single min/max/min triple. In your example, you want the next layer down, the min/max/min/max/min sequence.
Note: I'm going to describe the algorithmic steps as if each pass were distinct. Obviously, in a specific implementation, you can combine steps where it makes sense for your application. For the purposes of my explanation, it makes the text a little more clear.
I'm going to make some assumptions about your problem:
The windows of interest (the signals that you are looking for) cover a fraction of the entire data space (i.e., it's not one long signal).
The windows have significant scope (i.e., they aren't one pixel wide on your picture).
The windows have a minimum peak of interest (i.e., even if the signal exceeds the background noise, the peak must have an additional signal excess of the background).
The windows will never overlap (i.e., each can be examined as a distinct sub-problem out of context of the rest of the signal).
Given those, you can first look through your data stream for a set of windows of interest. You can do this by making a first pass through the data: moving from left to right, look for noise threshold crossing points. If the signal was below the noise floor and exceeds it on the next sample, that's a candidate starting point for a window (vice versa for the candidate end point).
Now make a pass through your candidate windows: compare the scope and contents of each window with the values defined above. To use your picture as an example, the small peaks on the left of the image barely exceed the noise floor and do so for too short a time. However, the window in the center of the screen clearly has a wide time extent and a significant max value. Keep the windows that meet your minimum criteria, discard those that are trivial.
Now to examine your remaining windows in detail (remember, they can be treated individually). The peak is easy to find: pass through the window and keep the local max. With respect to the leading and trailing edges of the signal, you can see n the picture that you have a window that's slightly larger than the actual point at which the signal exceeds the noise floor. In this case, you can use a finite difference approximation to calculate the first derivative of the signal. You know that the leading edge will be somewhat to the left of the window on the chart: look for a point at which the first derivative exceeds a positive noise floor of its own (the slope turns upwards sharply). Do the same for the trailing edge (which will always be to the right of the window).
Result: a set of time windows, the leading and trailing edges of the signals and the peak that occured in that window.
It looks like the definition of a window is the range of x over which y is above the threshold. So use that to determine the size of the window. Within that, locate the largest value, thus finding the peak.
If that fails, then what additional criteria do you have for defining a region of interest? You may need to nail down your implicit assumptions to more than 'that looks like a peak to me'.

Unlimited Map Dimensions for a game in AS3

Recently I've been planning out how I would run a game with an environment/map that is capable of unlimited dimensions (unlimited being a loose terms as there's obviously limitations on how much data can be stored in memory, etc). I've achieved this using a "grid" that contains level data stored as a String that can be converted to a 2D Array that would represent objects and their properties.
Here's an example of two objects stored as a String:
"game.doodads.Tree#200#10#terrain$game.mobiles.Player#400#400#mobiles"
The "grid" is a 3D Array, of which the contents would represent the x/y coordinate of the grid cell. The grid cells would be, say, 600x600.
An example of this "grid" Array would be as follows:
var grid:Array = [[["leveldata: 0,0"],["leveldata 0,1"]],
[["leveldata: 1,0"],["leveldata 1,1"]]];
The environment will handle loading a grid square and it's 8 surrounding squares based on a given point. ie the position of the Player. It would have a function along the lines of
function loadCells(xp:int, yp:int):void
This would also handle the unloading of the previously loaded cells that are no longer close enough to be required. In the unload process, the data at grid[x][y] would be overwritten with the new data, which is created by looping through the objects in that cell and appending each new set of data to the grid cell data.
Everything works fine in terms of when you move in a direction, cells are unloaded/saved and new cells are loaded. The problem is this:
Say this is a large city infested by zombies. If you walk three grid squares in any direction and return, everything is as you left it. I'm struggling to find a way to at least simulate all objects still moving around and doing their thing. It looks silly when you for example throw a grenade, walk away, return and the grenade still hasn't detonated.
I've considered storing a timestamp on each object when I unload the level, and when it's initialized there's a loop that runs it's "step" function amount of times. Problem here is obviously that when you come back 5 minutes later, 20 zombies are going to try and step 248932489 times and the game will crash.
Ideas?
I don't know AS3 but let me try give you some tips.
It seems you want to make a seamless world since you load / unload cells as a player moves. That's a good idea. What you should do here is deciding what data a cell should load / unload( or, even further, what data a cell should hold or handle ).
For example, the grenade should not be unloaded by the cell as it needs to be updated even after the player leaves the cell. It's NOT a good idea for a cell to manage all game objects just because they are located in the cell. Instead, the player object could handle the grenade as an owner. Or, there could be one EntityManager which handles all game entities like grenades or zombies.
With this idea, you can update your 20 zombies even when they are in an unloaded zone (their location does not matter anymore) instead of calling update() 248932489 tiems at once. However, do you really need to keep updating the zombies? Perhaps, unloading all of them and spawning new zombies with the number of the latest active zombies in the cell would work. It depends on your game design but, usually, you don't have to update entities which are not visible or far enough from the player. Hope it helps. Good luck! :D
Interesting problem. Of course, game cannot simulate unlimited environment precisely. If you left some zone for a few minutes, you don't need every step of zombies (or any actors there) to be simulated precisely. Every game has its own simplications. Maybe approximate simulation will help you. For example, if survivor was in heavily infested zone for a long time, your simulator could decide that he turned into zombie, without computing every step of the process. Or, if horde was rampant in some part of the city, this part should be damaged randomly.
I see two methods as to how you could handle this issue.
first: you have the engine keep any active cells loaded, active means there are object-initiated events involving cell-owned objects OR player-owned objects (if you have made such a distinction that is). Each time such a cell is excluded from normal unloading behaviour it is assigned a number and if memory happens to run out the cells which have the lowest number will be unloaded. Clearly, this might be the hardest method code-wise but still it might be the only one truly doing what you desire.
second: use very tiny cells and let the engine keep a narrow path loaded. The cells migth then be 100x100 instead of 600x600 and 36 of them do the work one bigger cell would do ( risk: more cluttter code-wise) then each cell your player actually traverses ( not all that have been loaded to produce a natural visibility-range ) is kept in memory including every cell that has player-owned objects in it for a limited amount of time ( i.e. 5 minutes ).
I believe you can find out how to check for these conditions upon unloading yourself and hope to have been of help to you.