Will alpha-beta pruning remove randomness in my solution with minimax? - tic-tac-toe

Existing implementation:
In my implementation of Tic-Tac-Toe with minimax, I look for all boxes where I can get best result and chose 1 of them randomly, so that the same solution isn't displayed each time.
For ex. if the returned list is [1, 0 , 1, -1], at some point, I will randomly chose between the two highest values.
Question about Alpha-Beta Pruning:
Based on what I understood, when the algorithm finds that it is winning from one path, it would no longer need to look for other paths that might/ might not lead to a winning case.
So will this, like I feel, cause the earliest possible box that leads to the best solution to be displayed as the result and seem the same each time? For example at the time of first move, all moves lead to a draw. So will the 1st box be selected each time?
How can I bring randomness to the solution like with the minimax solution? One way that I thought about now could be to randomly pass the indices to the alpha-beta algorithm. So the result will be the first best solution in that randomly sorted list of positions.
Thanks in advance. If there is some literature on this, I'd be glad to read it.
If someone could post some good reference for aplha-beta pruning, That'll be excellent as I had a hard time understanding how to apply it.

To randomly pick among multiple best solutions (all equal) in alpha-beta pruning, you can modify your evaluation function to add a very small random number whenever you evaluate a game state. You should just make sure that the magnitude of that random number is never greater than the true difference between the evaluations of two states.
For example, if the true evaluation function for your game state can only return values -1, 0, and 1, you could add a randomly generated number in the range [0.0, 0.01] to the evaluation of every game state.
Without this, alpha-beta pruning doesn't necessarily find only one solution. Consider this example from wikipedia. In the middle, you see that two solutions with an evaluation of 6 were found, so it can find more than one. I do actually think it will still find all moves leading to optimal solutions at the root node, but not actually find all solutions deep down in the tree. Suppose, in the example image, that the pruned node with score of 9 in the middle actually had a score of 6. It would still get pruned there, so that particular solution wouldn't be found, but the move from root node leading to it (the middle move at root) would still be found. So, eventually, you would be able to reach it.
Some interesting notes:
This implementation would also work in minimax, and avoid the need to store a list of multiple (equally good) solutions
In more complex games than Tic Tac Toe, where you cannot search the complete state space, adding a small random number for the max player and deducting a small random number for the min player like this may actually slightly improve your heuristic evaluation function. The reason for this is as follows. Suppose in state A you have 5 moves available, and in state B you have 10 moves available, which all result in the same heuristic evaluation score. Intuitively, the successors of state B may be slightly better, because you had more moves available; in many games, having more moves available means that you are in a better position. Because you generated 10 random numbers for the 10 successors of state B, it is also a bit more likely that the highest generated random number is among those 10 (instead of the 5 numbers generated for successors of A)

Related

OpenAI gym GuessingGame-v0 possible solutions

I have been struggling to solve the GuessingGame-v0 environment which is part of the OpenAI gym.
In the environment each episode a random number within a range is selected and the agent must "guess" what this random number is. The agent is only provided with the observation of whether the guess was too large or too small.
After researching how to frame the problem I think it may be possible to frame the problem as a Hidden Markov Model, but I am unsure of how to do this.
Each episode the randomly selected number changes and because of this I don't know how the model won't have to change each episode as the goal state is continually shifting.
I could not find any resources on the environment or any environments similar to it other than the documentation provided by OpenAI which I did not find useful.
I would greatly appreciate any assistance on how to solve this environment.
I'm putting this as an answer so people don't have to read through the list of comments.
You need a program that can simply cycle through:
generate the random number
agent guesses a number (within the allowable guess range)
test whether the number is within 1%.
if the number is within 1%, stop the iteration, maybe print the guess at this point
if the iteration is at step 200, stop the iteration and maybe produce some out that gives the final guessed number and the fact it is not within 1%
if not 200 steps or 1%: a) if the number is too high, record the guess and that it is too high, or b) if the number is too low, record the guess and that it is too low. Iterate through that number bound. Repeat until either the 1% or 200 steps criterion is reached.
Another thought for you: do you need a starting low number and a starting high number?
There are a number of ways in which to implement this solution. There is also a range of programming software in which the solution can be implemented. The particular software you use is probably the one with which you are most familiar.
Good luck!

Can I find price floors and ceilings with cuda

Background
I'm trying to convert an algorithm from sequential to parallel, but I am stuck.
Point and Figure Charts
I am creating point and figure charts.
Decreasing
While the stock is going down, add an O every time it breaks through the floor.
Increasing
While the stock is going up, add an X every time it breaks through the ceiling.
Reversal
If the stock reverses direction, but the change is less than a reversal threshold (3 units) do nothing. If the change is greater than the reversal threshold, start a new column (X or O)
Sequential vs Parallel
Sequentially, this is pretty straight forward. I keep a variable for the floor and ceiling. If the current price breaks through the floor or ceiling, or changes more than the reversal threshold, I can take the appropriate action.
My question is, is there a way to find these reversal point in parallel? I'm fairly new to thinking in parallel, so I'm sorry if this is trivial. I am trying to do this in CUDA, but I have been stuck for weeks. I have tried using the finite difference algorithms from NVidia. These produce local max / min but not the reversal points. Small fluctuations produce numerous relative max / min, but most of them are trivial because the change is not greater than the reversal size.
My question is, is there a way to find these reversal point in parallel?
one possible approach:
use thrust::unique to remove periods where the price is numerically constant
use thrust::adjacent_difference to produce 1st difference data
use thrust::adjacent_difference on 1st difference data to get the 2nd difference data, i.e the points where there is a change in the sign of the slope.
use these points of change in sign of slope to identify separate regions of data - build a key vector from these (e.g. with a prefix sum). This key vector segments the price data into "runs" where the price change is in a particular direction.
use thrust::exclusive_scan_by_key on the 1st difference data, to produce the net change of the run
Wherever the net change of the run exceeds a threshold, flag as a "reversal"
Your description of what constitutes a reversal may also be slightly unclear. The above method would not flag a reversal on certain data patterns that you might classify as a reversal. I suspect you are looking beyond a single run as I have defined it here. If that is the case, there may be a method to address that as well - with more steps.

gnuRadio Dual Tone detection

I am trying to come up with an efficient way to characterize two narrowband tones separated by about 900kHz (one at around 100kHZ and one at around 1MHz once translated to baseband). They don't move much in freq over time but may have amplitude variations we want to monitor.
Each tone is roughly about 100Hz wide and we are required to characterize these two beasts over long periods of time down to a resolution of about 0.1 Hz. The samples are coming in at over 2M Samples/sec (TBD) to adequately acquire the highest tone.
I'm trying to avoid (if possible) doing brute force >2MSample FFTs on the data once a second to extract frequency domain data. Is there an efficient approach? Something akin to performing two (much) smaller FFTs around the bands of interest? Ive looked at Goertzel and chirp z methods but I am not certain it helps save processing.
Something akin to performing two (much) smaller FFTs around the bands of interest
There is, it's called Goertzel, and is kind of the FFT for single bins, and you already have looked at it. It will save you CPU time.
Anyway, there's no reason to do a 2M-point FFT; first of all, you only want a resolution of about 1/20 the sampling rate, hence, a 20-point FFT would totally do, and should be pretty doable for your CPU at these low rates; since you don't seem to care about phase of your tones, FFT->complex_to_mag.
However, there's one thing that you should always do: look at your signal of interest, and decimate down to the rate that fits exactly that. Since GNU Radio's filters are implemented cleverly, the filter itself will only run at the decimated rate, and you can spend the CPU cycles saved on a better filter.
Because a direct decimation from 2MHz to 100Hz (decimation: 20000) will really have an ugly filter length, you should do this multi-rated:
I'd try first decimating by 100, and then in a second step by 100, leaving you with 200Hz observable spectrum. The xlating fir filter blocks will let you use a simple low-pass filter (use the "Low-Pass Filter Taps" block to define a variable that contains such taps) as a band-selector.

Project Euler 298 - there must be a correct answer? (only pastebinned code)

Project Euler has a paging file problem (though it's disguised in other words).
I tested my code(pastebinned so as not to spoil it for anyone) against the sample data and got the same memory contents+score as the problem. However, there is nowhere near a consistent grouping of scores. It asks for the expected difference in scores after 50 turns. A random sampling of scores:
1.50000000
1.78000000
1.64000000
1.64000000
1.80000000
2.02000000
2.06000000
1.56000000
1.66000000
2.04000000
I've tried a few of those as answers, but none of them have been accepted... I know some people have succeeded, so I'm really confused - what the heck am I missing?
Your problem likely is that you don't seem to know the definition of Expected Value.
You will have to run the simulation multiple times and for each score difference, maintain the frequency of that occurence and then take the weighted mean to get the expected value.
Of course, given that it is Project Euler problem, there is probably a mathematical formula which can be used readily.
Yep, there is a correct answer. To be honest, Monte Carlo can theoretically come close in on the expect value given the law of large numbers. However, you won't want to try it here. Because practically each time you run the simu, you will have a slightly different result rounded to eight decimal places (And I think this setting does exactly deprive anybody of any chance of even thinking to use Monte Carlo). If you are lucky, you will have one simu that delivers the answer after lots of trials, given that you have submitted all the previous and failed. I think, captcha is the second way that euler project let you give up any brute-force approach.
Well, agree with Moron, you have to figure out "expected value" first. The principle of this problem is, you have to find a way to enumerate every possible "essential" outcomes after 50 rounds. Each outcome will have its own |L-R|, so sum them up, you will have the answer. No need to say, brute-force approach fails in most of the case, especially in this case. Fortunately, we have dynamic programming (dp), which is fast!
Basically, dp saves the computation results in each round as states and uses them in the next. Thus it avoids repeating the same computation over and over again. The difficult part of this problem is to find a way to represent a state, that is to say, how you would like to save your temp results. If you have solved problem 290 in dp, you can get some hints there about how to understand the problem and formulate a state.
Actually, that isn't the most difficult part for the mind. The hardest mental piece is whether you realize that some memory statuses of the two players are numerically different but substantially equivalent. For example, L:12345 R:12345 vs L:23456 R:23456 or even vs L:98765 R:98765. That is due to the fact that the call is random. That is also why I wrote possible "essential" outcomes. That is, you can summarize some states into one. And only by doing so, your program can finish in reasonal time.
I would run your simulation a whole bunch of times and then do a weighted average of the | L- R | value over all the runs. That should get you closer to the expected value.
Just submitting one run as an answer is really unlikely to work. Imagine it was dice roll expected value. Roll on dice, score a 6, submit that as expected value.

What statistics can be maintained for a set of numerical data without iterating?

Update
Just for future reference, I'm going to list all of the statistics that I'm aware of that can be maintained in a rolling collection, recalculated as an O(1) operation on every addition/removal (this is really how I should've worded the question from the beginning):
Obvious
Count
Sum
Mean
Max*
Min*
Median**
Less Obvious
Variance
Standard Deviation
Skewness
Kurtosis
Mode***
Weighted Average
Weighted Moving Average****
OK, so to put it more accurately: these are not "all" of the statistics I'm aware of. They're just the ones that I can remember off the top of my head right now.
*Can be recalculated in O(1) for additions only, or for additions and removals if the collection is sorted (but in this case, insertion is not O(1)). Removals potentially incur an O(n) recalculation for non-sorted collections.
**Recalculated in O(1) for a sorted, indexed collection only.
***Requires a fairly complex data structure to recalculate in O(1).
****This can certainly be achieved in O(1) for additions and removals when the weights are assigned in a linearly descending fashion. In other scenarios, I'm not sure.
Original Question
Say I maintain a collection of numerical data -- let's say, just a bunch of numbers. For this data, there are loads of calculated values that might be of interest; one example would be the sum. To get the sum of all this data, I could...
Option 1: Iterate through the collection, adding all the values:
double sum = 0.0;
for (int i = 0; i < values.Count; i++) sum += values[i];
Option 2: Maintain the sum, eliminating the need to ever iterate over the collection just to find the sum:
void Add(double value) {
values.Add(value);
sum += value;
}
void Remove(double value) {
values.Remove(value);
sum -= value;
}
EDIT: To put this question in more relatable terms, let's compare the two options above to a (sort of) real-world situation:
Suppose I start listing numbers out loud and ask you to keep them in your head. I start by saying, "11, 16, 13, 12." If you've just been remembering the numbers themselves and nothing more, and then I say, "What's the sum?", you'd have to think to yourself, "OK, what's 11 + 16 + 13 + 12?" before responding, "52." If, on the other hand, you had been keeping track of the sum yourself while I was listing the numbers (i.e., when I said, "11" you thought "11", when I said "16", you thought, "27," and so on), you could answer "52" right away. Then if I say, "OK, now forget the number 16," if you've been keeping track of the sum inside your head you can simply take 16 away from 52 and know that the new sum is 36, rather than taking 16 off the list and them summing up 11 + 13 + 12.
So my question is, what other calculations, other than the obvious ones like sum and average, are like this?
SECOND EDIT: As an arbitrary example of a statistic that (I'm almost certain) does require iteration -- and therefore cannot be maintained as simply as a sum or average -- consider if I asked you, "how many numbers in this collection are divisible by the min?" Let's say the numbers are 5, 15, 19, 20, 21, 25, and 30. The min of this set is 5, which divides into 5, 15, 20, 25, and 30 (but not 19 or 21), so the answer is 5. Now if I remove 5 from the collection and ask the same question, the answer is now 2, since only 15 and 30 are divisible by the new min of 15; but, as far as I can tell, you cannot know this without going through the collection again.
So I think this gets to the heart of my question: if we can divide kinds of statistics into these categories, those that are maintainable (my own term, maybe there's a more official one somewhere) versus those that require iteration to compute any time a collection is changed, what are all the maintainable ones?
What I am asking about is not strictly the same as an online algorithm (though I sincerely thank those of you who introduced me to that concept). An online algorithm can begin its work without having even seen all of the input data; the maintainable statistics I am seeking will certainly have seen all the data, they just don't need to reiterate through it over and over again whenever it changes.
First, the term that you want here is online algorithm. All moments (mean, standard deviation, skew, etc.) can be calculated online. Others include the minimum and maximum. Note that median and mode can not be calculated online.
To consistently maintain the high/low you store your data in sorted order. There are algorithms for maintaining data structures which preserves ordering.
Median is trivial if the data is ordered.
If the data is reduced slightly to a frequency table, you can maintain mode. If you keep your data as a random, flat list of values, you can't easily compute mode in the presence of change.
The answers to this question on online algorithms might be useful. Regarding the usability for your needs, I'd say that while some online algorithms can be used for estimating summary statistics with partial data, others may be used to maintain them from a data flow just as you like.
You might also want to look at complex event processing (or CEP), which is used for tracking and analysing real time data, for example in finance or web commerce. The only free CEP product I know of is Esper.
As Jason says, you are indeed describing an online algorithm. I've also seen this type of computation referred to as the Accumulator Pattern, whether the loop is implemented explicitly or by recursion.
Not really a direct answer to your question, but for many statistics that are not online statistics you can usually find some rules to calculate by iteration only part of the time, and cache the correct value the rest of the time. Is this possibly good enough for you?
For high value for example:
public void Add(double value) {
values.Add(value);
if (value > highValue)
highValue = value;
}
public void Remove(double value) {
values.Remove(value);
if (value.WithinTolerance(highValue))
highValue = RecalculateHighValueByIteration();
}
It's not possible to maintain high or low with constant-time add and remove operations because that would give you a linear-time sorting algorithm. You can use a search tree to maintain the data in sorted order, which gives you logarithmic-time minimum and maximum. If you also keep subtree sizes and the count, it's simple to find the median too.
And if you just want to maintain the high or low in the presence of additions and removals, look into priority queues, which are more efficient for that purpose than search trees.
If you don't know the exact size of the dataset in advance, or if it is potentially unlmited, or you just want some ideas, you should definitely look into techniques used in Streaming Algorithms.
It does sound (even after your 2nd edit) that you are describing on-line algorithms, with the additional requirement that you want to allow "delete" operations. An example of this are the "sketch algorithms" used for finding frequent items in a stream.