Generate random variable in real-time without state - language-agnostic

I want a function which takes, as input, the number of seconds elapsed since the last time it was called, and returns true or false for whether an event should have happened in that time period. I want it such that it will fire, on average, once per X time passed, say 5 seconds. I also am interested if it's possible to do without any state, which the answer from this question used.
I guess to be fully accurate it would have to return an integer for the number of events that should've happened, in the case of it being called once every 10*X times or something like that, so bonus points for that!

It sounds like you're describing a Poisson process, with the mean number of events in a given time interval is given by the Poisson distribution with parameter lambda=1/X.
The way to use the expression on the latter page is as follows, for a given value of lambda, and the parameter value of t:
Calculate a random number between zero and one; call this p
Calculate Pr(k=0) (ie, exp(-lambda*t) * (lambda*t)**0 / factorial(0))
If this number is bigger than p, then the number of simulated events is 0. END
Otherwise, calculate Pr(k=1) and add it to Pr(k=0).
If this number is bigger than p, then the answer is 1. END
...and so on.
Note that, yes, this can end up with more than one event in a time period, if t is large compared with 1/lambda (ie X). If t is always going to be small compared to 1/lambda, then you are very unlikely to get more than one event in the period, and so the algorithm is simplified considerably (if p < exp(-lambda*t), then 0, else 1).
Note 2: there is no guarantee that you will get at least one event per interval X. It's just that it'll average out to that.
(the above is rather off the top of my head; test your implementation carefully)

Asssume some event type happens on average once per 10 seconds, and you want to print a simulated list of timestamps on which the events happened.
A good method would be to generate a random integer in the range [0,9] each 1 second. If it is 0 - fire the event for this second. Of course you can control the resolution: You can generate a random integer in the range [0,99] each 0.1 second, and if it comes 0 - fire the event for this DeciSecond.
Assuming there is no dependency between events, there is no need to keep state.
To find out how many times the event happens in a given timeslice - just generate enough random integers - according to the required resolution.
Edit
You should use high resolution (at least 20 randoms per period of one event) for the simulation to be valid.

Related

Picking JSON objects out of array based on their value

Perhaps I think about this wrong, but here is a problem:
I have NSMutableArray all full of JSON objects. Each object look like this, here are 2 of them for example:
{
player = "Lorenz";
speed = "12.12";
},
{
player = "Firmino";
speed = "15.35";
}
Okay so this is fine, this is dynamic info I get from webserver feed. Now what I want though is lets pretend there are 22 such entries, and the speeds vary.
I want to have a timer going that starts at 1.0 seconds and goes to 60.0 seconds, and a few times a second I want it to grab all the players whose speed has just been passed. So for instance if the timer goes off at 12.0 , and then goes off again at 12.5, I want it to grab out all the player names who are between 12.0 and 12.5 in speed, you see?
The obvious easy way would be to iterate over the array completely every time that the timer goes off, but I would like the timer to go off a LOT, like 10 times a second or more, so that would be a fairly wasteful algorithm I think. Any better ideas? I could attempt to alter the way data comes from the webserver but don't feel that should be necessary.
NOTE: edited to reflect a corrected understanding that the number in 1 to 60 is incremented continously across that range rather than being a random number in that interval.
Before you enter the timer loop, you should do some common preprocessing:
Convert the speeds from strings to numeric values upfront for fast comparison without having to parse each time. This is O(1) for each item and O(n) to process all the items.
Put the data in an ordered container such as a sorted list or sorted binary tree. This will allow you to easily find elements in the target range. This is O(n log n) to sort all the items.
On the first iteration:
Use binary search to find the start index. This is O(log n).
Use binary search to find the end index, using the start index to bound the search.
On subsequent iterations:
If each iteration increases by a predictable amount and the step between elements in the list is likewise a predictable amount, then just maintain a pointer and increment as per Pete's comment. This would make each iteration cost O(1) (just stepping ahead by a fixed amount).
If the steps between iterations and/or the entries in the list are not predictable, then do a binary search as in the initial case. If the values are monotonically increasing (as I now understand the problem to be stating), even if they are unpredictable, you can incorporate this into your binary search algorithm by maintaining an index as in the other case, but instead of resuming iteration directly from there, if the values are unpredictable, instead use the remembered index to set a lower bound on the binary search so that you narrow the region being searched. This would make each iteration cost O(log m), where "m" are the remaining elements to be considered.
Overall, this produces an algorithm that is no worse than O((N + I) log N) where "I" is the number of iterations compared to the previous algorithm that was O(I * N) (and shifts most of the computation outside of the loop, rather than inside the loop).
A modern computer can do billions of operations per second. Even if your timer goes off 1000 times per second, and your need to process 1000 entries, you will still be fine with a naive approach.
But to answer the question, the best approach would be to sort the data first based on speed, and then have an index of the last player whose speed was already passed. At the beginning the pointer, obviously, points at the first player. Then every time your timer goes off, you will need to process some continuous chunk of players starting at that index. Something along the lines of (in pseudocode):
global index = 0;
sort(players); // sort on speed
onTimer = function(currentSpeed) {
while (index < players.length && players[index].speed < currentSpeed) {
processPlayer(players[index]);
++ index;
}
}

Zabbix trigger expression - detect a drop and stay in problem state

I have this trigger that fires upon a match of the rule below:
{monitoring:test.item.change(0)}<-100
When my graph goes down by over 100 units, an event gets created. The event should switch to OK status when the graph goes back up. The graph has different average values at different times of day and besides, the item is a trapper value, which does not support flexible intervals. My problem is this; when the graph falls by over 100 units, let's say from 300 to 10, a PROBLEM situation is created. At the next interval, if the value is still low (e.g 13), Zabbix creates an OK event, because although the value is still low, the expression does not return true because the graph hasn't gone down by a further 100 units. Any ideas on how I could fix this? I have been trying to use
{{monitoring:test.item.avg(1800)}-{monitoring:test.item.last(0)}>100}
but Zabbix wouldn't take that expression. This is supposed to compare the last value of test.item to the average value of the past 30 minutes and raise an alert when the difference exceeds 100.
This, I believe, would sort out my problem situation of a false OK status when the graph remains at a low value.
EDIT: I think I have cracked it. Zabbix has accepted the below expression:
{monitoring:test.item.avg(1800)}-{monitoring:test.item.last(0)}>100
I think you'll soon realize that expression won't solve your targeted behavior and will keep on flapping between PROBLEM and OK.
You have just shifted the 'did a -100 change occurred' check between 'the last and previous last' values
to 'the last and the average of the last half an hour'.
Checking if either there was an abrupt change OR
if the value is still too low will probably better mimic your expected scenario,
{monitoring:test.item.last(0)}>100 | {monitoring:test.item.max(#2)}<20
max(#2)<20 checks if the maximum of the last 2 values is bellow 20.
EDIT: After reading your comment maybe this approach (after some tweaking for your expected values) will better serve you,
({monitoring:test.item.avg(1800)}<10 & {monitoring:test.item.avg(1800)}-{monitoring:test.item.last(0)}>20) | ({monitoring:test.item.avg(1800)}>100 & {monitoring:test.item.avg(1800)}-{monitoring:test.item.last(0)}>100)
This way, you'll better fit your trigger for the different volumes during the day.

Generate (Poisson?) random variable in real-time

I have a program running in real-time, with variable framerate, e.g. can be 15 fps, can be 60fps. I want an event to happen, on average, once every 5 seconds. Each frame, I want to call a function which takes the time since last frame as input, and returns True on average once every 5 seconds of elapsed-time given it's called. I figure something to do with Poisson distribution.. how would I do this?
It really depends what distribution you want to use, all you specified was the mean. I would, like you said, expect that a Poisson distribution would suit your needs nicely but you also put "uniform random variable" in the title which is a different distribution, anyway let's just go with the former.
So if a Poisson distribution is what you want, you can generate samples pretty easily using the cumulative density function. Just follow the pseudocode here: Generating Poisson RVs, with 5 seconds being your value for lambda. Let's call this function Poisson_RN().
The algorithm at this point is pretty simple.
global float next_time = current_time()
boolean function foo()
if (next_time < current_time())
next_time = current_time() + Poisson_RN();
return true;
return false;
A random variable which generates true/false outcomes in fixed proportions with independent trials is called a Geometric random variable. In any time frame, generate true with probability 1/(5*fps) and in the long run you will get an average of one true per 5 seconds.

Coder's block: How to fire timer at intervals, compensating for early/late firing

I'm having a silly-yet-serious case of coder's block. Please help me work through it so my brain stops hurting and refusing to answer my questions.
I want to fire a timer at intervals up to a final time. For example, if t = 0, my goal is 100, and my interval is 20, I want to fire at 0, 20, 40, 60, 80, and 100.
The timer is not precise, and may fire early or late. If it first fires at 22, I want to fire again in 18. If it first fires at 19, I want to fire in 21. All I know when the timer fires is the current time, goal time, and firing interval. What do I do?
Edit: Sorry, I wasn't too specific about what the heck I'm actually asking. I'm trying to figure out what kind of math (probably involving taking the modulus of something) needs to be done to calculate the delay until the next firing. Ideally, I also want the timer to by matched to the end time — so if I start the timer initially at 47, it schedules itself to fire at 60 and not at 67, so the last firing will still be at 100.
If the primitive functionality you have is "schedule X to fire once at time T", then your procedure handling X should know the time T0 at which it was supposed to fire (the time T1 at which it actually fired is not needed) as well as the desired firing interval DT and schedule itself for time T0+DT. If the primitive is "fire D from now", then it should schedule for D = T0+DT-T1 (if that's negative then it needs to schedule itself immediately again but record that the scheduled time and the "was supposed to fire at" time are different so it can keep compensating on following firings).
Somebody already mentioned that .NET's Timer does this for you; so does Python's sched stdlib module; so, I'm sure, do many other languages / frameworks / libraries. But in the end you can build it if needed on top of either of the single-scheduling primitives above (one for an absolute time or one for a relative delta from now) as long as you keep track of desired as well as actual firing times!_)
I would use the system clock to check your interval. For example if you know that your interval is every 20 minutes, fire off the first interval, check what the time was, and adjust the next interval start time.
If your language/platform's underlying timers don't do what you want, then it's usually best to implement timers in terms of "target times", which means the absolute time at which you want the timer to fire next. If you platform asks for an "absolute time", then you give it the target time. If it asks for a "relative time" (or, like sleep, a duration), then it is of course target_time - current_time.
The quick way to calculate each target time in turn is:
When you first set up the timer, calculate the "interval" (which might have to be a floating-point value, assuming that won't cripple performance) and also the "target time" of the first timer fire (again, you might need fractions). Record both, and set your underlying timer mechanism, whatever that is.
When the timer fires, work out the next target time by adding the interval to the previous target time.
The problem with that approach is that you might get some very tiny accumulating errors as you add the interval to the target time (or not so tiny, if you haven't used floats).
So the longer and more accurate way is to store the very first start time, the target finishing time, and the number of firings (n). Then you recalculate the target time for each new firing in turn, which makes sure that you don't get cumulative rounding errors. The formula for that is:
target(k) = start + ((target_end - start) * k) / n
Of if you prefer:
target(k) = (k/n) * end + (1-k/n) * start
Where the firings of the timer are k=1, 2, 3, ... n. I was going to make it 0-based, then realised that was daft ;-)
The last thing you have to wrestle with when implementing timers is the difference between "wall clock" time, and real elapsed time as measured by your hardware clock. Wall clock time can suddenly jump forwards or backwards (either by an hour if your wall clock is affected by daylight savings, or by any amount if the system's clock is adjusted or corrected). Real time always increases (as long as it doesn't wrap). Which you want your timer to respect depends on the intended purpose. If you want to know when your last bus leaves, you want a timer firing daily according to wall clock time, but most commonly you care about real time elapsed. A good timer API has options for these kinds of things.
Build a table listing the desired fire times, say 10:00, 10:20, 10:40, 11:00, and 11:20.
If your timer function takes an absolute time, the rest is trivial. Set it to fire at each of the desired times. If for whatever reason you can only set one timer at a time, okay, set it to fire at the first desired time. When that event happens, set it to fire again at the next time in the table, without regard to what time it is now. Each time through, pick up the next time until you're done.
If your timer function only accepts an interval, no big deal either. Find the difference between the desired time and the current time, and set it to fire at that interval. Like if the first time is 10:00 and it's now 9:23, set it to fire in 10:00 minus 9:23 equal 37 minutes. Then when that happens, set the interval to the next desired time minus the current time. If it really fired at 10:02, then the interval is 10:20 minus 10:02 equals 18 minutes. Etc.
You probably should check for the possibility that the next fire time has already passed. If the process can take longer than the interval you might run past it, and even if not, the system might have been down. If a fire time is missed, you may want to do catch up runs, or just skip it and go to the next desired time, depending on the details of your app.
If you can't keep the entire table -- like it goes on to infinity -- then just keep the next fire time. Each time through the process, add a fixed amount to the next fire time, without regard to when the current process ran. Then calculate the interval based on the current time. Like if you have a desired interval of 20 minutes going on forever starting at 10:00, and it's now 9:23, you set the first interval to 37 minutes. Say that actually happens at 9:59. You set the next fire time to 10:00 plus 20 minutes equals 10:20, i.e. base it on the goal time rather than the actual time. Then calculate the interval to the next fire time based on the current time, i.e. 10:20 minus 9:59 equals 21 minutes. Etc.

Reconstructing state from time series data events

For a particular project, we acquire data for a number of events and collect variables about those events at the same time. After the data has been collected, we perform a user-customizable analysis on said data to determine whatever it is that the user is interested in.
The data is collected in a form similar to this:
Timestamp Event
0 x = 0
0 y = 1
3 Event A occurred
3 x = 1
4 Event A occurred
4 x = 2
9 Event B occurred
9 y = 2
9 x = 0
To understand the entire state at any time, the most straightforward approach is to walk over the entire set of data. For example, if I start at time 0, and "analyze" until timestamp 5, I know that at that point x = 2, y = 1, and Event A has occurred twice. That's a really simple example. The user might be (and often is) interested in the time between events, say from A to B, and they might specify the first occurrence of A, then B, or the last occurrence of A, then B (respectively, 9-3 = 6 or 9-4 = 5). Like I said, this is easy to analyze when you're walking over the entire set.
Now, we need to adapt the model to analyze an arbitrary window of time. If we look at 0-N, that's the easy case. But if I look at 1-5, for instance, I have no notion of y unless I begin at 0 and know that y was initially 1 and did not change in the window 1-5.
Our approach is to essentially create a dictionary of variables, and run callbacks on events. If one analysis was "What is x when Event A occurs and time is > 3" then we would run that callback on the first Event A, and it would immediately return because time is not greater than 3. It would run again at 4, and and it would report that x was 1 at t=4.
To adapt to the "time-windowing", I think I am going to (in the background) tack on additional conditions to the analysis. If their analysis is just "What is x when Event A occurs", and the current window is 1-5, then I will change it to "What is x when Event A occurs and time >= 1 and time <= 5". Then if the next window is 6-10, I can readjust the condition as necessary.
My main question is: what pattern does this fit? We are obviously not the first people to approach a problem like this, but I have not been able to find how others have approached it. I probably just don't know what exactly to search on Google. Is there any other approach besides keeping a dictionary of the entire global state for looking up a single state at a given time? Note also that the data could have several, maybe tens of thousands of records, so the fewer iterations over the data set, the better.
I think your best approach here would be to take periodic "snapshots" of the full state data, say every 1000 samples (for example), along with recording the deltas. When you're storing your data as offsets from some original value (aka deltas), you don't have any choice but to reconstruct the full data starting with the original values. Storing periodic snapshots will lessen the amount of reconstruction you have to do - the design tradeoff is between low storage requirements but long reconstruction time on the one hand, and higher storage requirements but shorter reconstruction time on the other.
MPEGs, for example, store each frame as the differences between the current frame and the previous frame. Ordinarily, this would force an MPEG to be viewed from the beginning, but the format also periodically stores full frames so that the decoder doesn't have to backtrack all the way to the beginning of the file.
You can search by time in Log(N), and you can have a feeling about how many updates ares acceptable... hence here's my solution:
Pick a number, N, of updates that are acceptable in order to return a result. 256 might be good, given the scales you've mentioned so far.
Every N records, commit an entry of all state to a dictionary, with a timestamp.
Now, you have a tradeoff, dictionary size against speed. N->\infty is regular searching. N<-1 is your current solution, N anywhere else will require less memory, but be slower.
Your implementation is now (for time X):
Log(n) search of subsampled global dictionary to timestamp before X, (timestamped as Y).
Log(n) search of eventlist to timestamp Y, and perform less than N updates.
Picking N as a power of two will even allow you to do some nice shift tricks to do a rounded-down integer divide nice and fast.