Most effecient way to compute a series of moves in peg solitaire - language-agnostic

Given an arbitary peg solitaire board configuration, what is the most effecient way to compute any series of moves that results in the "end game" position.
For example, the standard starting position is:
..***..
..***..
*******
***O***
*******
..***..
..***..
And the "end game" position is:
..OOO..
..OOO..
OOOOOOO
OOO*OOO
OOOOOOO
..OOO..
..OOO..
Peg solitare is described in more detail here: Wikipedia, we are considering the "english board" variant.
I'm pretty sure that it is possible to solve any given starting board in just a few secconds on a reasonable computer, say an P4 3Ghz.
Currently this is my best strategy:
def solve:
for every possible move:
make the move.
if we haven't seen a rotation or flip of this board before:
solve()
if solved: return
undo the move.

The wikipedia article you link to already mentions that there only 3,626,632 possible board positions, so it it easy for any modern computer to do an exhaustive search of the space.
Your algorithm above is right, the trick is implementing the "haven't seen a rotation or flip of this board before", which you can do using a hash table. You probably don't need the "undo the move" line as a real implementation would pass the board state as an argument to the recursive call so you would use the stack for storing the state.
Also, it is not clear what you might mean by "efficient".
If you want to find all sequences of moves that lead to a winning stage then you need to do the exhaustive search.
If you want to find the shortest sequence then you could use a branch-and-bound algorithm to cut off some search trees early on. If you can come up with a good static heuristic then you could try A* or one of its variants.

Start from the completed state, and walk backwards in time. Each move is a hop that leaves an additional peg on the board.
At every point in time, there may be multiple unmoves you can make, so you'll be generating a tree of moves. Traverse that tree (either depth-first or breadth-) stopping any branch when it reaches the starting state or no longer has any possible moves. Output the list of paths that led to the original starting state.

Related

what is the principle of readout and teacher forcing?

These days I study something about RNN and teacher forcing. But there is one point that I can't figure out. What is the principle of readout and teacher forcing? How can we feeding the output(or ground truth) of RNN from the previous time step back to the current time step, by using the output as features together with the input of this step, or using the output as this step's cell state? I have read some paper but still it confused me.o(╯□╰)o. Hoping someone can answer for me。
Teacher forcing is the act of using the the ground truth as the input for each time step, rather than the output of the network, the following is some pseudo code to describe the situation.
x = inputs --> [0:n]
y = expected_outputs --> [1:n+1]
out = network_outputs --> [1:n+1]
teacher_forcing():
for step in sequence:
out[step+1] = cell(x[step])
As you can see rather than feeding the output of the network at the previous time step as input it instead provides the ground truth.
It was originally motivated to avoid BPTT (back propagation through time) for models that dont contain hidden to hidden connections, eg GRU's (gated recurrent units).
It can also be used in a training regime, the idea being that from the beginning of the train to the end you slowly decrease the amount of teacher forcing. This has been shown to have regularizing effects on the network. The paper linked here has further reading or the Deep Learning Book is good too.

Explain process noise terminology in Kalman Filter

I am just learning Kalman filter. In the Kalman Filter terminology, I am having some difficulty with process noise. Process noise seems to be ignored in many concrete examples (most focused on measurement noise). If someone can point me to some introductory level link that described process noise well with examples, that’d be great.
Let’s use a concrete scalar example for my question, given:
x_j = a x_j-1 + b u_j + w_j
Let’s say x_j models the temperature within a fridge with time. It is 5 degrees and should stay that way, so we model with a = 1. If at some point t = 100, the temperature of the fridge becomes 7 degrees (ie. hot day, poor insulation), then I believe the process noise at this point is 2 degrees. So our state variable x_100 = 7 degrees, and this is the true value of the system.
Question 1:
If I then paraphrase the phrase I often see for describing Kalman filter, “we filter the signal x so that the effects of the noise w are minimized “, http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html if we minimize the effects of the 2 degrees, are we trying to get rid of the 2 degree difference? But the true state at is x_100 == 7 degrees. What are we doing to the process noise w exactly when we Kalmen filter?
Question 2:
The process noise has a variance of Q. In the simple fridge example, it seems easy to model because you know the underlying true state is 5 degrees and you can take Q as the deviation from that state. But if the true underlying state is fluctuating with time, when you model, what part of this would be considered state fluctuation vs. “process noise”. And how do we go about determining a good Q (again example would be nice)?
I have found that as Q is always added to the covariance prediction no matter which time step you are at, (see Covariance prediction formula from http://greg.czerniak.info/guides/kalman1/) that if you select an overly large Q, then it doesn’t seem like the Kalman filter would be well-behaved.
Thanks.
EDIT1 My Interpretation
My interpretation of the term process noise is the difference between the actual state of the system and the state modeled from the state transition matrix (ie. a * x_j-1). And what Kalman filter tries to do, is to bring the prediction closer to the actual state. In that sense, it actually partially "incorporate" the process noise into the prediction through the residual feedback mechanism, rather than "eliminate" it, so that it can predict the actual state better. I have not read such an explanation anywhere in my search, and I would appreciate anyone commenting on this view.
In Kalman filtering the "process noise" represents the idea/feature that the state of the system changes over time, but we do not know the exact details of when/how those changes occur, and thus we need to model them as a random process.
In your refrigerator example:
the state of the system is the temperature,
we obtain measurements of the temperature on some time interval, say hourly,
by looking the thermometer dial. Note that you usually need to
represent the uncertainties involved in the measurement process
in Kalman filtering, but you didn't focus on this in your question.
Let's assume that these errors are small.
At time t you look at the thermometer, see that it says 7degrees;
since we've assumed the measurement errors are very small, that means
that the true temperature is (very close to) 7 degrees.
Now the question is: what is the temperature at some later time, say 15 minutes
after you looked?
If we don't know if/when the condenser in the refridgerator turns on we could have:
1. the temperature at the later time is yet higher than 7degrees (15 minutes manages
to get close to the maximum temperature in a cycle),
2. Lower if the condenser is/has-been running, or even,
3. being just about the same.
This idea that there are a distribution of possible outcomes for the real state of the
system at some later time is the "process noise"
Note: my qualitative model for the refrigerator is: the condenser is not running, the temperature goes up until it reaches a threshold temperature a few degrees above the nominal target temperature (note - this is a sensor so there may be noise in terms of the temperature at which the condenser turns on), the condenser stays on until the temperature
gets a few degrees below the set temperature. Also note that if someone opens the door, then there will be a jump in the temperature; since we don't know when someone might do this, we model it as a random process.
Yeah, I don't think that sentence is a good one. The primary purpose of a Kalman filter is to minimize the effects of observation noise, not process noise. I think the author may be conflating Kalman filtering with Kalman control (where you ARE trying to minimize the effect of process noise).
The state does not "fluctuate" over time, except through the influence of process noise.
Remember, a system does not generally have an inherent "true" state. A refrigerator is a bad example, because it's already a control system, with nonlinear properties. A flying cannonball is a better example. There is some place where it "really is", but that's not intrinsic to A. In this example, you can think of wind as a kind of "process noise". (Not a great example, since it's not white noise, but work with me here.) The wind is a 3-dimensional process noise affecting the cannonball's velocity; it does not directly affect the cannonball's position.
Now, suppose that the wind in this area always blows northwest. We should see a positive covariance between the north and west components of wind. A deviation of the cannonball's velocity northwards should make us expect to see a similar deviation to westward, and vice versa.
Think of Q more as covariance than as variance; the autocorrelation aspect of it is almost incidental.
Its a good discussion going over here. I would like to add that the concept of process noise is that what ever prediction that is made based on the model is having some errors and it is represented using the Q matrix. If you note the equations in KF for prediction of Covariance matrix (P_prediction) which is actually the mean squared error of the state being predicted, the Q is simply added to it. PPredict=APA'+Q . I suggest, it would give a good insight if you could find the derivation of KF equations.
If your state-transition model is exact, process noise would be zero. In real-world, it would be nearly impossible to capture the exact state-transition with a mathematical model. The process noise captures that uncertainty.

How to find the Shortest Path between all the nodes in a graph without having a pre-defined start or end points?

What I want to get is: the path which connect all the points in my graph, but without having to tell the algorithm where to start and where to finish.
It need to use the driving direction in google-maps api but without setting a start or end point.
It is not the TSP problem because I don't have a "start city" and I don't have to get back to the "start city" neither.
As expressed in this question: Find the shortest path in a graph which visits certain nodes,
I could just use permutation because I have a few nodes, but the problem is that I need to analyze several groups of this few nodes So I would like the function to be the less time consuming posible.
NOTE: Im not looking for a Minimum Spaning Tree as this one neither: https://math.stackexchange.com/questions/130863/connecting-all-points-on-a-plane-with-shortest-path-possible
I want a path which tell me you will save gas if you go first here, then overthere, then overthere, and finally there.
Question: is there any library which can help me with that? Or is it a know problem that has already an exact answer? How could I solve it?
It sounds like you want an all pairs shortest path algorithm. This is the class of shortest path algorithms that attempt to compute the shortest path (or the length of the shortest path) between every pair of vertices in the graph.
These is a well-known problem, and solutions exist. Here's some reading material that describes other possible algorithms. There might be implementations of Johnson's algorithm for your chosen language and development environment.
Keep in mind, this is an expensive problem, computationally speaking.
If I understand you correctly, you want 1 route to visit all the nodes, without a predefined start/end and you want that to be minimal. A possible solution could be to modify your graph a bit to allow a travelling salesman algorithm to get a complete tour.
You start with your graph and add 1 extra node E. You connect that node to all other nodes in your graph and set the cost of all those edges to a very high constant M. You then unleash a travelling salesman algorithm on that graph which will give you a path P starting at E, passing all nodes and returning to E. If you remove the 2 edges in P that connected E to the rest of your path you will have what you were looking for.
A quick intuitive proof that it is indeed what you were looking for: Suppose it's not the cheapest way to connect all nodes. Let's call the supposedly better path Q. Q and P both connect all nodes in your original graph. The end points of Q would be A and B. Both of these would be connected to node E with an edge of cost M. If you would add those 2 edges to Q, you would get a better TSP solution than P, which is not possible as P was the best.
As you are using google map, your particular instance of TSP might satisfy the Triangle inequality.
Are you really speaking of distances or travel time ?
In the case of distances:
try Googling: "triangle traveling salesman problem"
IMPORTANT: The result is a very good approximation of the best result with guaranteed uper bound, not always the best.
One way to go would be using (self-organized) kohonen networks.
Assume you have n cities on a map (works the same in any dimension).
Take a chain of n connected "neurons" and place it randomly on the map.
Then you do several iterations, one iteration contains:
choose any city. (e.g. go through them in a ordered fashion)
determine the "closest" neuron, call it x. (e.g. euclidian distance)
Move this x closer to the city (e.g. take the direction vector from the neuron to the city and multiply it with a learning rate 0
Move neighbors of this neuron also towards this city (but less than in 3., dependend of distance from the neighbors to the "current closest" neuron x)
One can choose various functions in step 2, 3 and 4.
Notice also that this might not give the globally shortest path since it depends on where the start chain is located and different other things. For this on may consider doing several runs with different starting conditions or (depending of the problem) one can help a bit with pre-knowlege.
I hope this helps to complete this question for further readers...

How to detect local maxima and curve windows correctly in semi complex scenarios?

I have a series of data and need to detect peak values in the series within a certain number of readings (window size) and excluding a certain level of background "noise." I also need to capture the starting and stopping points of the appreciable curves (ie, when it starts ticking up and then when it stops ticking down).
The data are high precision floats.
Here's a quick sketch that captures the most common scenarios that I'm up against visually:
One method I attempted was to pass a window of size X along the curve going backwards to detect the peaks. It started off working well, but I missed a lot of conditions initially not anticipated. Another method I started to work out was a growing window that would discover the longer duration curves. Yet another approach used a more calculus based approach that watches for some velocity / gradient aspects. None seemed to hit the sweet spot, probably due to my lack of experience in statistical analysis.
Perhaps I need to use some kind of a statistical analysis package to cover my bases vs writing my own algorithm? Or would there be an efficient method for tackling this directly with SQL with some kind of local max techniques? I'm simply not sure how to approach this efficiently. Each method I try it seems that I keep missing various thresholds, detecting too many peak values or not capturing entire events (reporting a peak datapoint too early in the reading process).
Ultimately this is implemented in Ruby and so if you could advise as to the most efficient and correct way to approach this problem with Ruby that would be appreciated, however I'm open to a language agnostic algorithmic approach as well. Or is there a certain library that would address the various issues I'm up against in this scenario of detecting the maximum peaks?
my idea is simple, after get your windows of interest you will need find all the peaks in this window, you can just compare the last value with the next , after this you will have where the peaks occur and you can decide where are the best peak.
I wrote one simple source in matlab to show my idea!
My example are in wave from audio file :-)
waveFile='Chick_eco.wav';
[y, fs, nbits]=wavread(waveFile);
subplot(2,2,1); plot(y); legend('Original signal');
startIndex=15000;
WindowSize=100;
endIndex=startIndex+WindowSize-1;
frame = y(startIndex:endIndex);
nframe=length(frame)
%find the peaks
peaks = zeros(nframe,1);
k=3;
while(k <= nframe - 1)
y1 = frame(k - 1);
y2 = frame(k);
y3 = frame(k + 1);
if (y2 > 0)
if (y2 > y1 && y2 >= y3)
peaks(k)=frame(k);
end
end
k=k+1;
end
peaks2=peaks;
peaks2(peaks2<=0)=nan;
subplot(2,2,2); plot(frame); legend('Get Window Length = 100');
subplot(2,2,3); plot(peaks); legend('Where are the PEAKS');
subplot(2,2,4); plot(frame); legend('Peaks in the Window');
hold on; plot(peaks2, '*');
for j = 1 : nframe
if (peaks(j) > 0)
fprintf('Local=%i\n', j);
fprintf('Value=%i\n', peaks(j));
end
end
%Where the Local Maxima occur
[maxivalue, maxi]=max(peaks)
you can see all the peaks and where it occurs
Local=37
Value=3.266296e-001
Local=51
Value=4.333496e-002
Local=65
Value=5.049438e-001
Local=80
Value=4.286804e-001
Local=84
Value=3.110046e-001
I'll propose a couple of different ideas. One is to use discrete wavelets, the other is to use the geographer's concept of prominence.
Wavelets: Apply some sort of wavelet decomposition to your data. There are multiple choices, with Daubechies wavelets being the most widely used. You want the low frequency peaks. Zero out the high frequency wavelet elements, reconstruct your data, and look for local extrema.
Prominence: Those noisy peaks and valleys are of key interest to geographers. They want to know exactly which of a mountain's multiple little peaks is tallest, the exact location of the lowest point in the valley. Find the local minima and maxima in your data set. You should have a sequence of min/max/min/max/.../min. (You might want to add an arbitrary end points that are lower than your global minimum.) Consider a min/max/min sequence. Classify each of these triples per the difference between the max and the larger of the two minima. Make a reduced sequence that replaces the smallest of these triples with the smaller of the two minima. Iterate until you get down to a single min/max/min triple. In your example, you want the next layer down, the min/max/min/max/min sequence.
Note: I'm going to describe the algorithmic steps as if each pass were distinct. Obviously, in a specific implementation, you can combine steps where it makes sense for your application. For the purposes of my explanation, it makes the text a little more clear.
I'm going to make some assumptions about your problem:
The windows of interest (the signals that you are looking for) cover a fraction of the entire data space (i.e., it's not one long signal).
The windows have significant scope (i.e., they aren't one pixel wide on your picture).
The windows have a minimum peak of interest (i.e., even if the signal exceeds the background noise, the peak must have an additional signal excess of the background).
The windows will never overlap (i.e., each can be examined as a distinct sub-problem out of context of the rest of the signal).
Given those, you can first look through your data stream for a set of windows of interest. You can do this by making a first pass through the data: moving from left to right, look for noise threshold crossing points. If the signal was below the noise floor and exceeds it on the next sample, that's a candidate starting point for a window (vice versa for the candidate end point).
Now make a pass through your candidate windows: compare the scope and contents of each window with the values defined above. To use your picture as an example, the small peaks on the left of the image barely exceed the noise floor and do so for too short a time. However, the window in the center of the screen clearly has a wide time extent and a significant max value. Keep the windows that meet your minimum criteria, discard those that are trivial.
Now to examine your remaining windows in detail (remember, they can be treated individually). The peak is easy to find: pass through the window and keep the local max. With respect to the leading and trailing edges of the signal, you can see n the picture that you have a window that's slightly larger than the actual point at which the signal exceeds the noise floor. In this case, you can use a finite difference approximation to calculate the first derivative of the signal. You know that the leading edge will be somewhat to the left of the window on the chart: look for a point at which the first derivative exceeds a positive noise floor of its own (the slope turns upwards sharply). Do the same for the trailing edge (which will always be to the right of the window).
Result: a set of time windows, the leading and trailing edges of the signals and the peak that occured in that window.
It looks like the definition of a window is the range of x over which y is above the threshold. So use that to determine the size of the window. Within that, locate the largest value, thus finding the peak.
If that fails, then what additional criteria do you have for defining a region of interest? You may need to nail down your implicit assumptions to more than 'that looks like a peak to me'.

How to implement dead reckoning when turning is involved?

"Dead reckoning is the process of estimating one's current position based upon a previously determined position and advancing that position based upon known or estimated speeds over elapsed time, and course." (Wikipedia)
I'm currently implementing a simple server that makes use of dead reckoning optimization, which minimizes the updates required by making logical assumptions on both the clients and the server.
The objects controlled by users can be said to be turning, or not turning. This presents an issue with dead reckoning (the way I see it.)
For example, say you have point A in time defined by [position, velocity, turning: left/right/no]. Now you want point B after t amount of time. When not turning, the new position is easy to extrapolate. The resulting direction is also easy to extrapolate. But what about when these two factors are combined? The direction of the velocity will be changing along a curve as the object is turning over t amount of time.
Should I perhaps go with another solution (such as making the client send an update for every new direction rather than just telling the server "I'm turning left now")?
This is in a 2D space, by the way, for the sake of simplicity.
For simplicity let's say that your vehicles have a turning radius r that's independant of speed. So to compute the new position given the initial coords and the time:
compute the distance (that's velocity * time)
compute how much you turned (that's distance / (2*pi*r))
add that arc to the original position.
The last steps needs elaboration.
Given the angle a computed in step 2, if you started at (0,0) with a due north heading (i.e. pi/2 radians) and are turning left then your new positions is: (rcos(a)-1, rsin(a)).
If your original heading was different, say it was "b", then simply rotate the new position accordingly, i.e. multiply by this rotation matrix:
[ cos b , -sin b ]
[ sin(b), cos(b) ]
Finally, add the initial position and you're done. Now you only need to send an update if you change the velocity or turning direction.
Well, I think "turning: left/right/no" is insufficient to determine position B - you also need to know the arc at which the turn is being made. If you are turning left along a circular path of radius 1, you will end up at a different place than if you are turning along a circular path of radius 10, even though your initial position, velocity, and direction of turn will all be the same.
If making the client send an update for every new direction and treating them as linear segments is an option, that is going to be a much easier calculation to make. You can simply treat each new report from the client as a vector, and sum them. Calculating a bunch of curves is going to be more complex.