Algorithm for online approximation of a slowly-changing, real valued function

Algorithm for online approximation of a slowly-changing, real valued function - language-agnostic

I'm tackling an interesting machine learning problem and would love to hear if anyone knows a good algorithm to deal with the following:
The algorithm must learn to approximate a function of N inputs and M outputs
N is quite large, e.g. 1,000-10,000
M is quite small, e.g. 5-10
All inputs and outputs are floating point values, could be positive or negative, likely to be relatively small in absolute value but no absolute guarantees on bounds
Each time period I get N inputs and need to predict the M outputs, at the end of the time period the actual values for the M outputs are provided (i.e. this is a supervised learning situation where learning needs to take place online)
The underlying function is non-linear, but not too nasty (e.g. I expect it will be smooth and continuous over most of the input space)
There will be a small amount of noise in the function, but signal/noise is likely to be good - I expect the N inputs will expain 95%+ of the output values
The underlying function is slowly changing over time - unlikely to change drastically in a single time period but is likely to shift slightly over the 1000s of time periods range
There is no hidden state to worry about (other than the changing function), i.e. all the information required is in the N inputs
I'm currently thinking some kind of back-propagation neural network with lots of hidden nodes might work - but is that really the best approach for this situation and will it handle the changing function?

With your number of inputs and outputs, I'd also go for a neural network, it should do a good approximation. The slight change is good for a back-propagation technique, it should not have to 'de-learn' stuff.

I think stochastic gradient descent (http://en.wikipedia.org/wiki/Stochastic_gradient_descent) would be a straight forward first step, it will probably work nicely given the operating conditions you have.

I'd also go for an ANN. Single layer might do fine since your input space is large. You might wanna give it a shot before adding a lot of hidden layers.
#mikera What is it going to be used for? Is it an assignment in a ML course?

Related

How much w and b values are adjusted in every trial of functioning? Is there any effective number of networks?

I am having trouble with understanding deep learning.
I see that deep learning is basically about inductive process, and so the function must be adjusted enough until it hits the right target.
But I can not figure out how much those w and b values should be changed in each trials. Is there any rule for the adjustment?
If there is not, then is there any trick? like, some formulas those are normally used.
And, do more networks always perform better?
I understand that single layer can not hit as many target as multiple layer does, but I don't know if 3-layer is better than 2-layer.

The changes of w and b are based on the gradient of them.
You can calculate the gradient by taking derivative from the error (depending on the loss function). As you decrease the gradient, the error also decreases.
The maximum change of your gradient is gradient/gradient_magnitude * total_error/gradient_magnitude.
When you increase or decrease your function by its unit gradient, the output will be increased or decreased about the magnitude of its gradient. For that reason, the maximum changes of w and b are gradient*err/mag^2.
However, changing gradients to their limits is not recommended because problem of local minimum could occur. Therefore, learning rate or dropout algorithms are usually implemented.
The method above is not the only way to adjust the factors. Genetic algorithm, RBM, or reinforced learning methods could be implemented to replace or help above method.

gnuRadio Dual Tone detection

I am trying to come up with an efficient way to characterize two narrowband tones separated by about 900kHz (one at around 100kHZ and one at around 1MHz once translated to baseband). They don't move much in freq over time but may have amplitude variations we want to monitor.
Each tone is roughly about 100Hz wide and we are required to characterize these two beasts over long periods of time down to a resolution of about 0.1 Hz. The samples are coming in at over 2M Samples/sec (TBD) to adequately acquire the highest tone.
I'm trying to avoid (if possible) doing brute force >2MSample FFTs on the data once a second to extract frequency domain data. Is there an efficient approach? Something akin to performing two (much) smaller FFTs around the bands of interest? Ive looked at Goertzel and chirp z methods but I am not certain it helps save processing.

Something akin to performing two (much) smaller FFTs around the bands of interest
There is, it's called Goertzel, and is kind of the FFT for single bins, and you already have looked at it. It will save you CPU time.
Anyway, there's no reason to do a 2M-point FFT; first of all, you only want a resolution of about 1/20 the sampling rate, hence, a 20-point FFT would totally do, and should be pretty doable for your CPU at these low rates; since you don't seem to care about phase of your tones, FFT->complex_to_mag.
However, there's one thing that you should always do: look at your signal of interest, and decimate down to the rate that fits exactly that. Since GNU Radio's filters are implemented cleverly, the filter itself will only run at the decimated rate, and you can spend the CPU cycles saved on a better filter.
Because a direct decimation from 2MHz to 100Hz (decimation: 20000) will really have an ugly filter length, you should do this multi-rated:
I'd try first decimating by 100, and then in a second step by 100, leaving you with 200Hz observable spectrum. The xlating fir filter blocks will let you use a simple low-pass filter (use the "Low-Pass Filter Taps" block to define a variable that contains such taps) as a band-selector.

Best-case Running-time to solve an NP-Complete problem?

What is the fastest algorithm that exists up with to solve a particular NP-Complete problem? For example, a naive implementation of travelling salesman is O(n!), but with dynamic programming it can be done in O(n^2 * 2^n). Is there any perhaps "easier" NP-Complete problem that has a better running time?
I'm curious about exact solutions, not approximations.

[...] with dynamic programming it can be done in O(n^2 * 2^n). Is there any perhaps "easier" NP-Complete problem that has a better running time?
Sort of. You can get rid of any polynomial factor by creating an artificial problem that encodes the same solution in a polynomially larger input. As long as the input is only polynomially larger, the resulting problem is still NP-complete. Since the complexity is by definition the function that maps input size to running time, if the input size grows the function gets into lower O classes.
So, the same algorithm running on TSP with the input padded with n^2 useless bits, will have complexity O(1 * 2^sqrt(n)).

A characteristic of the NP-Complete problems is that any of the problems in NP can be mechanically transformed into any of the NP-Complete problems in, at most, polynomial time.
Therefore, whatever the best solution for any given NP-Complete problem is, it is automatically a similarly-good solution for any other NP problem.
Given that dynamic programming can solve Traveling Salesman Problem in 2^n time and 2^n space, the same must be true of all other NP problems [well, plus the time to apply the transformation, I guess - so it could be 2^(n+1)].

Generally you cannot find the best solution for the generic Travelling Salesman problem without trying all combinations (there might be negative distances, etc).
By adding additional restrictions and loosening the requirement of getting the best solution, you can speed up things quite a bit.
For instance you can get polynomial executable time if the distances in the problem obey the "it is not longer to go directly from A to B than going from A to C to B" (i.e. a shortcut is never longer), and you can live with the result maximally being 1.5 times the optimal value. See http://en.wikipedia.org/wiki/Travelling_salesman_problem#Metric_TSP

Benchmarking: When can I stop making measurements?

I have a series of functions that are all designed to do the same thing. The same inputs produce the same outputs, but the time that it takes to do them varies by function. I want to determine which one is 'fastest', and I want to have some confidence that my measurement is 'statistically significant'.
Perusing Wikipedia and the interwebs tells me that statistical significance means that a measurement or group of measurements is different from a null hypothesis by a p-value threshold. How would that apply here? What is the null hypothesis between function A being faster than function B?
Once I've got that whole setup defined, how do I figure out when to stop measuring? I'll typically see that a benchmark is run three times, and then the average is reported; why three times and not five or seven? According to this page on Statistical Significance (which I freely admit I do not understand fully), Fisher used 8 as the number of samples that he needed to measure something with 98% confidence; why 8?

I would not bother applying statistics principles to benchmarking results. In general, the term "statistical significance" refers to the likelihood that your results were achieved accidentally, and do not represent an accurate assessment of the true values. In statistics, as a result of simple probability, the likelihood of a result being achieved by chance decreases as the number of measurements increases. In the benchmarking of computer code, it is a trivial matter to increase the number of trials (the "n" in statistics) so that the likelihood of an accidental result is below any arbitrary threshold you care to define (the "alpha" or level of statistical significance).
To simplify: benchmark by running your code a huge number of times, and don't worry about statistical measurements.
Note to potential down-voters of this answer: this answer is somewhat of a simplification of the matter, designed to illustrate the concepts in an accessible way. Comments like "you clearly don't understand statistics" will result in a savage beat-down. Remember to be polite.

You are asking two questions:
How do you perform a test of statistical significance that the mean time of function A is greater than the mean time of function B?
If you want a certain confidence in your answer, how many samples should you take?
The most common answer to the first question is that you either want to compute a confidence interval or perform a t-test. It's not different than any other scientific experiment with random variation. To compute the 95% confidence interval of the mean response time for function A simply take the mean and add 1.96 times the standard error to either side. The standard error is the square root of the variance divided by N. That is,
95% CI = mean +/- 1.96 * sqrt(sigma2/N))
where sigma2 is the variance of speed for function A and N is the number of runs you used to calculate mean and variance.
Your second question relates to statistical power analysis and the design of experiments. You describe a sequential setup where you are asking whether to continue sampling. The design of sequential experiments is actually a very tricky problem in statistics, since in general you are not allowed to calculate confidence intervals or p-values and then draw additional samples conditional on not reaching your desired significance. If you wish to do this, it would be wiser to set up a Bayesian model and calculate your posterior probability that speed A is greater than speed B. This, however, is massive overkill.
In a computing environment it is generally pretty trivial to achieve a very small confidence interval both because drawing large N is easy and because the variance is generally small -- one function obviously wins.
Given that Wikipedia and most online sources are still horrible when it comes to statistics, I recommend buying Introductory Statistics with R. You will learn both the statistics and the tools to apply what you learn.

The research you site sounds more like a highly controlled environment. This is purely a practical answer that has proven itself time and again to be effective for performance testing.
If you are benchmarking code in a modern, multi-tasking, multi-core, computing environment, the number of iterations required to achieve a useful benchmark goes up as the length of time of the operation to be measured goes down.
So, if you have an operation that takes ~5 seconds, you'll want, typically, 10 to 20 iterations. As long as the deviation across the iterations remains fairly constant, then your data is sound enough to draw conclusions. You'll often want to throw out the first iteration or two because the system is typically warming up caches, etc...
If you are testing something in the millisecond range, you'll want 10s of thousands of iterations. This will eliminate noise caused by other processes, etc, firing up.
Once you hit the sub-millisecond range -- 10s of nanoseconds -- you'll want millions of iterations.
Not exactly scientific, but neither is testing "in the real world" on a modern computing system.
When comparing the results, consider the difference in execution speed as percentage, not absolute. Anything less than about 5% difference is pretty close to noise.

Do you really care about statistical significance or plain old significance? Ultimately you're likely to have to form a judgement about readability vs performance - and statistical significance isn't really going to help you there.
A couple of rules of thumb I use:
Where possible, test for enough time to make you confident that little blips (like something else interrupting your test for a short time) won't make much difference. Usually I reckon 30 seconds is enough for this, although it depends on your app. The longer you test for, the more reliable the test will be - but obviously your results will be delayed :)
Running a test multiple times can be useful, but if you're timing for long enough then it's not as important IMO. It would alleviate other forms of error which made a whole test take longer than it should. If a test result looks suspicious, certainly run it again. If you see significantly different results for different runs, run it several more times and try to spot a pattern.

The fundamental question you're trying to answer is how likley is it that what you observe could have happened by chance? Is this coin fair? Throw it once: HEADS. No it's not fair it always comes down heads. Bad conclusion! Throw it 10 times and get 7 Heads, now what do you conclude? 1000 times and 700 heads?
For simple cases we can imagine how to figure out when to stop testing. But you have a slightly different situation - are you really doing a statistical analysis?
How much control do you have of your tests? Does repeating them add any value? Your computer is deterministic (maybe). Eistein's definition of insanity is to repeat something and expect a different outcome. So when you run your tests do you get repeatable answers? I'm not sure that statistical analyses help if you are doing good enough tests.
For what you're doing I would say that the first key thing is to make sure that you really are measuring what you think. Run every test for long enough that any startup or shutdown effects are hidden. Useful performance tests tend to run for quite extended periods for that reason. Make sure that you are not actually measuing the time in your test harness rather than the time in your code.
You have two primary variables: how many iterations of your method to run in one test? How many tests to run?
Wikipedia says this
In addition to expressing the
variability of a population, standard
deviation is commonly used to measure
confidence in statistical conclusions.
For example, the margin of error in
polling data is determined by
calculating the expected standard
deviation in the results if the same
poll were to be conducted multiple
times. The reported margin of error is
typically about twice the standard
deviation.
Hence if your objective is to be sure that one function is faster than another you could run a number of tests of each, compute the means and standard deviations. My expectation is that if your number of iterations within any one test is high then the standard deviation is going to be low.
If we accept that defintion of margin of error, you can see whether the two means are further apart than their total margin's of error.

How should I start designing an AI algorithm for an artillery warfare game?

Here's the background... in my free time I'm designing an artillery warfare game called Staker (inspired by the old BASIC games Tank Wars and Scorched Earth) and I'm programming it in MATLAB. Your first thought might be "Why MATLAB? There are plenty of other languages/software packages that are better for game design." And you would be right. However, I'm a dork and I'm interested in learning the nuts and bolts of how you would design a game from the ground up, so I don't necessarily want to use anything with prefab modules. Also, I've used MATLAB for years and I like the challenge of doing things with it that others haven't really tried to do.
Now to the problem at hand: I want to incorporate AI so that the player can go up against the computer. I've only just started thinking about how to design the algorithm to choose an azimuth angle, elevation angle, and projectile velocity to hit a target, and then adjust them each turn. I feel like maybe I've been overthinking the problem and trying to make the AI too complex at the outset, so I thought I'd pause and ask the community here for ideas about how they would design an algorithm.
Some specific questions:
Are there specific references for AI design that you would suggest I check out?
Would you design the AI players to vary in difficulty in a continuous manner (a difficulty of 0 (easy) to 1 (hard), all still using the same general algorithm) or would you design specific algorithms for a discrete number of AI players (like an easy enemy that fires in random directions or a hard enemy that is able to account for the effects of wind)?
What sorts of mathematical algorithms (pseudocode description) would you start with?
Some additional info: the model I use to simulate projectile motion incorporates fluid drag and the effect of wind. The "fluid" can be air or water. In air, the air density (and thus effect of drag) varies with height above the ground based on some simple atmospheric models. In water, the drag is so great that the projectile usually requires additional thrust. In other words, the projectile can be affected by forces other than just gravity.

In a real artillery situation all these factors would be handled either with formulas or simply brute-force simulation: Fire an electronic shell, apply all relevant forces and see where it lands. Adjust and try again until the electronic shell hits the target. Now you have your numbers to send to the gun.
Given the complexity of the situation I doubt there is any answer better than the brute-force one. While you could precalculate a table of expected drag effects vs velocity I can't see it being worthwhile.
Of course a game where the AI dropped the first shell on your head every time wouldn't be interesting. Once you know the correct values you'll have to make the AI a lousy shot. Apply a random factor to the shot and then walk to towards the target--move it say 30+random(140)% towards the true target each time it shoots.
Edit:
I do agree with BCS's notion of improving it as time goes on. I said that but then changed my mind on how to write a bunch of it and then ended up forgetting to put it back in. The tougher it's supposed to be the smaller the random component should be.

Loren's brute force solution is appealing as because it would allow easy "Intelligence adjustments" by adding more iterations. Also the adjustment factors for the iteration could be part of the intelligence as some value will make it converge faster.
Also for the basic system (no drag, wind, etc) there is a closed form solution that can be derived from a basic physics text. I would make the first guess be that and then do one or more iteration per turn. You might want to try and come up with an empirical correction correlation to improve the first shot (something that will make the first shot distributions average be closer to correct)

Thanks Loren and BCS, I think you've hit upon an idea I was considering (which prompted question #2 above). The pseudocode for an AIs turn would look something like this:
nSims; % A variable storing the numbers of projectile simulations
% done per turn for the AI (i.e. difficulty)
prevParams; % A variable storing the previous shot parameters
prevResults; % A variable storing some measure of accuracy of the last shot
newParams = get_new_guess(prevParams,prevResults);
loop for nSims times,
newResults = simulate_projectile_flight(newParams);
newParams = get_new_guess(newParams,newResults);
end
fire_projectile(newParams);
In this case, the variable nSims is essentially a measure of "intelligence" for the AI. A "dumb" AI would have nSims=0, and would simply make a new guess each turn (based on results of the previous turn). A "smart" AI would refine its guess nSims times per turn by simulating the projectile flight.
Two more questions spring from this:
1) What goes into the function get_new_guess? How should I adjust the three shot parameters to minimize the distance to the target? For example, if a shot falls short of the target, you can try to get it closer by adjusting the elevation angle only, adjusting the projectile velocity only, or adjusting both of them together.
2) Should get_new_guess be the same for all AIs, with the nSims value being the only determiner of "intelligence"? Or should get_new_guess be dependent on another "intelligence" parameter (like guessAccuracy)?

A difference between artillery games and real artillery situations is that all sides have 100% information, and that there are typically more than 2 opponents.
As a result, your evaluation function should consider which opponent it would be more urgent to try and eliminate. For example, if I have an easy kill at 90%, but a 50% chance on someone who's trying to kill me and just missed two shots near me, it's more important to deal with that chance.
I think you would need some way of evaluating the risk everyone poses to you in terms of ammunition, location, activity, past history, etc.

I'm now addressing the response you posted:
While you have the general idea I don't believe your approach will be workable--it's going to converge way too fast even for a low value of nSims. I doubt you want more than one iteration of get_new_guess between shells and it very well might need some randomizing beyond that.
Even if you can use multiple iterations they wouldn't be good at making a continuously increasing difficulty as they will be big steps. It seems to me that difficulty must be handled by randomness.
First, get_initial_guess:
To start out I would have a table that divides the world up into zones--the higher the difficulty the more zones. The borders between these zones would have precalculated power for 45, 60 & 75 degrees. Do a test plot, if a shell smacks terrain try again at a higher angle--if 75 hits terrain use it anyway.
The initial shell should be fired at a random power between the values given for the low and high bounds.
Now, for get_new_guess:
Did the shell hit terrain? Increase the angle. I think there will be a constant ratio of how much power needs to be increased to maintain the same distance--you'll need to run tests on this.
Assuming it didn't smack a mountain, note if it's short or long. This gives you a bound. The new guess is somewhere between the two bounds (if you're missing a bound, use the value from the table in get_initial_guess in it's place.)
Note what percentage of the way between the low and high bound impact points the target is and choose a power that far between the low and high bound power.
This is probably far too accurate and will likely require some randomizing. I've changed my mind about adding a simple random %. Rather, multiple random numbers should be used to get a bell curve.

Another thought: Are we dealing with a system where only one shell is active at once? Long ago I implemented an artillery game where you had 5 barrels, each with a fixed reload time that was above the maximum possible flight time.
With that I found myself using a strategy of firing shells spread across the range between my current low bound and high bound. It's possible that being a mere human I wasn't using an optimal strategy, though--this was realtime, getting a round off as soon as the barrel was ready was more important than ensuring it was aimed as well as possible as it would converge quite fast, anyway. I would generally put a shell on target on the second salvo and the third would generally all be hits. (A kill required killing ALL pixels in the target.)
In an AI situation I would model both this and a strategy of holding back some of the barrels to fire more accurate rounds later. I would still fire a spread across the target range, the only question is whether I would use all barrels or not.

I have personally created such a system - for the web-game Zwok, using brute force. I fired lots of shots in random directions and recorded the best result. I wouldn't recommend doing it any other way as the difference between timesteps etc will give you unexpected results.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008