I am working on a simple game of Tic Tac Toe code for C. I have most of the code finished, but I want the AI to never lose.
I have read about the minimax algorithm, but I don't understand it. How do I use this algorithm to enable the computer to either Win or Draw but never lose?
The way to approach this sort of problem is one of exploring possible futures. Usually (for a chess or drafts AI) you'd consider futures a certain number of moves ahead but because tic tac toe games are so short, you can explore to the end of the game.
Conceptual implementation
So you create a branching structure:
The AI imagines itself making every legal move
The AI them imagines the user making each legal move they can make after each of its legal move
Then the AI imagines each of its next legal moves
etc.
Then, going from the most branched end (the furthest forward in time) the player whose turn it is (AI or user) chooses which future is best for it (win, lose or draw) at each branching point. Then it hands over to the player higher up the tree (closer to the present); each time choosing the best future for the player whose imaginary turn it is until finally you're at the first branching point where the AI can see futures which play out towards it losing, drawing and winning. It chooses a future where it wins (or if unavailable draws).
Actual implementation
Note that conceptually this is what is happening but it's not necessary to create the whole tree, then judge it like this. You can just as easily work though the tree getting to the furthest points in time and choosing then.
Here, this approach works nicely with a recursive function. Each level of the function polls all its branches; passing the possible future to them and returns -1,0,+1; choosing the best score for the current player at each point. The top level chooses the move without actually knowing how each future pans out, just how well they pan out.
Pseudo code
I assume in this pseudo code that +1 is AI winning, 0 is drawing, -1 is user losing
determineNextMove(currentStateOfBoard)
currentBestMove= null
currentBestScore= - veryLargeNumber
for each legalMove
score=getFutureScoreOfMove(stateOfBoardAfterLegalMove , AI’sMove)
if score>currentBestScore
currentBestMove=legalMove
currentBestScore=score
end
end
make currentBestMove
end
getFutureScoreOfMove(stateOfBoard, playersTurn)
if no LegalMoves
return 1 if AI wins, 0 if draw, -1 if user wins
end
if playersTurn=AI’sTurn
currentBestScore= - veryLargeNumber //this is the worst case for AI
else
currentBestScore= + veryLargeNumber //this is the worst case for Player
end
for each legalMove
score=getFutureScoreOfMove(stateOfBoardAfterLegalMove , INVERT playersTurn)
if playersTurn ==AI’sTurn AND score>currentBestScore //AI wants positive score
currentBestScore=score
end
if playersTurn ==Users’sTurn AND score<currentBestScore //user wants negative score
currentBestScore=score
end
end
return currentBestScore
end
This pseudo code doesn't care what the starting board is (you call this function every AI move with the current board) and doesn't return what path the future will take (we can't know if the user will play optimally so this information is useless), but it will always choose the move that goes towards the optimum future for the AI.
Considerations for larger problems
In this case where you explore to the end of the game, it is obvious which the best possible future is (win, lose or draw), but if you're only going (for example) five moves into the future, you'd have to find some way of determining that; in chess or drafts piece score is the easiest way to do this with piece position being a useful enhancement.
I have been doing such a thing about 5 years ago. I've made a research. In tic tac toe it doesn't take long, you just need to prepare patterns for first two or three moves.
You need to check how to play:
Computer starts first.
Player starts first.
There are 9 different start positions:
But actually just 3 of them are different (others are rotated).
So after that you will see what should be done after some specific moves, I think you don't need any algorithms in this case because tic tac toe ending is determining by first moves. So in this case you will need a few if-else or switch statements and random generator.
tic tac toe belong to group of games, which won't be lost if you know how to play, so for such a games you do not need to use trees and modified sorting algorithms. To write such algorithm you need just a few functions:
CanIWin() to check if computer has 2 in a row and possible to win.
ShouldIBlock() to check if player do not have 2 in a row and need to block it.
Those two functions must be called in this order, if it returns true you need either to win or not to let player win.
After that you need to do other calculations for move.
One exclusive situation is when computer starts the game. You need to chose cell which belongs to the biggest amount of different directions (there are 8 of them - 3 horizontal, vertical and 2 diagonal). In such a algorithm computer will always choose center because it has 4 directions, you should add small possibility to choose second best option to make game a bit more attractive.
So when you reach situation where are some chosen parts of the board and computer have to move you need to rate every free cell. (If first or second function returned true you have to take an action before reaching this place!!!). So to rate cell you need to count how many open directions left on every cell, also you need to block at least one opponent direction.
After that you will have a few possible cells to put your mark. So you need to check for necessary moves sequence, because you will have a few options and it may be that one of them lead you into loosing. So after that you will have set and you can randomly choose move, or to choose the one with biggest score.
I have to say similar thing, as said at the beginning of post. Bigger games do not have perfect strategy and, lets say chess are much based on patterns, but also on forward thinking strategy (for such a thing is using patricia trie). So to sum up you do not need difficult algorithms, just a few functions to count how much you gain and opponent loses with move.
Make a subsidiary program to predict the cases with which the user can win. Then you can say your ai to do the things that the user has to do to win.
Related
I think the title says it. A "game" takes a number of moves to complete, at which point a total score is computed. The goal is to maximize this score, and there are no rewards provided for specific moves during the game. Is there an existing algorithm that is geared toward this type of problem?
EDIT: By "continuously variable" reward, I mean it is a floating point number, not a win/loss binary. So you can't, for example, respond to "winning" by reinforcing the moves made to get there. All you have is a number. You can rank different runs in order of preference, but a single result is not especially meaningful.
First of all, in my opinion, the title of your question seems a little confusing when you talk about "continuously variable reward". Maybe you could clarify this aspect.
On the other hand, without taking into account the previous point, it looks your are talking about the temporal credit-assigment problem: How do you distribute credit for a sequence of actions which only obtain a reward (positive or negative) at the end of the sequence?
E.g., a Tic-tac-toe game where the agent doesn't recive any reward until the game ends. In this case, almost any RL algorithm tries to solve the temporal credit-assigment problem. See, for example, Section 1.5 of Sutton and Barto RL book, where they explain the working principles of RL and its advantages over other approaches using as example a Tic-tac-toe game.
I am trying to use multiple ultrasonic sensor to develop an estimate of a profile by facing them on the same side and placing them 1 or 2cm away. The problem is a) I have trigger them at the same time b) I cannot change the frequency of the sound. Since the profile is pretty rough, the sound from the one transducer might be sensed by the another transducer also due to scattering. How will I differentiate which echo belongs to which transducer?
Existing implementation:
In my implementation of Tic-Tac-Toe with minimax, I look for all boxes where I can get best result and chose 1 of them randomly, so that the same solution isn't displayed each time.
For ex. if the returned list is [1, 0 , 1, -1], at some point, I will randomly chose between the two highest values.
Question about Alpha-Beta Pruning:
Based on what I understood, when the algorithm finds that it is winning from one path, it would no longer need to look for other paths that might/ might not lead to a winning case.
So will this, like I feel, cause the earliest possible box that leads to the best solution to be displayed as the result and seem the same each time? For example at the time of first move, all moves lead to a draw. So will the 1st box be selected each time?
How can I bring randomness to the solution like with the minimax solution? One way that I thought about now could be to randomly pass the indices to the alpha-beta algorithm. So the result will be the first best solution in that randomly sorted list of positions.
Thanks in advance. If there is some literature on this, I'd be glad to read it.
If someone could post some good reference for aplha-beta pruning, That'll be excellent as I had a hard time understanding how to apply it.
To randomly pick among multiple best solutions (all equal) in alpha-beta pruning, you can modify your evaluation function to add a very small random number whenever you evaluate a game state. You should just make sure that the magnitude of that random number is never greater than the true difference between the evaluations of two states.
For example, if the true evaluation function for your game state can only return values -1, 0, and 1, you could add a randomly generated number in the range [0.0, 0.01] to the evaluation of every game state.
Without this, alpha-beta pruning doesn't necessarily find only one solution. Consider this example from wikipedia. In the middle, you see that two solutions with an evaluation of 6 were found, so it can find more than one. I do actually think it will still find all moves leading to optimal solutions at the root node, but not actually find all solutions deep down in the tree. Suppose, in the example image, that the pruned node with score of 9 in the middle actually had a score of 6. It would still get pruned there, so that particular solution wouldn't be found, but the move from root node leading to it (the middle move at root) would still be found. So, eventually, you would be able to reach it.
Some interesting notes:
This implementation would also work in minimax, and avoid the need to store a list of multiple (equally good) solutions
In more complex games than Tic Tac Toe, where you cannot search the complete state space, adding a small random number for the max player and deducting a small random number for the min player like this may actually slightly improve your heuristic evaluation function. The reason for this is as follows. Suppose in state A you have 5 moves available, and in state B you have 10 moves available, which all result in the same heuristic evaluation score. Intuitively, the successors of state B may be slightly better, because you had more moves available; in many games, having more moves available means that you are in a better position. Because you generated 10 random numbers for the 10 successors of state B, it is also a bit more likely that the highest generated random number is among those 10 (instead of the 5 numbers generated for successors of A)
I am implementing a small grid based, turn based strategy in the lines of Final Fantasy tactics.
Do you have any ideas on how i can approach the target selection, movement and skill selection process?
I am considering having the decisions disconnected, but all these 3 decisions are largely coupled.
(eg. i can't decide where to move unless i know who i am going to attack, and what range the skill i will use has, and vice versa, i can't decide who to attack unless i know how many turns it will take me to reach each target)
I want to move towards a unified system, but trying out things from Potential field research used in a manner like in the Killzone 1 AI has me getting stuck on local maximums.
=== Update 1
I am currently trying to use potential fields / influence maps to generate the data i take decisions upon.
I have no idea how to handle having many skills, and skills that don't do damage but rather buff/debuff or alter the world.
Someone elsewhere suggested using Monte Carlo Tree Search, used currently in Go games.
I believe the space my actors will be using is not good for it, as many many moves in the game don't result in a position from which you can attack and affect the world (i am in a world bigger than final fantasy tactics)
In final fantasy tactics it might be applied successfully, although the branching factor is much bigger than that of 9x9 Go (from what i understand)
===
Thanks in advance, Xtapodi.
ps.1 - A problem is that to know accurately how far an enemy is i would need to pathfind to him, because although the enemy is near, an impassable cliff might be separating us which takes 4 turns to go around. Or worse, a unit is blocking the way on lets say a bridge so there is actually no way to reach him.
One approach I've used is to do a two-pass system.
First, find out where your unit can go. Use A* or whatever to flag out the terrain to see how far the unit can move this turn.
Once you know that, step through your available tactics (melee attack, heal friendly unit, whatever), and assign a fitness function for all available uses of the tactic. If you pass in the flagged terrain, you can very quickly determine what your space of possible tactics are.
This gives you a list of available tactics and their fitness functions for each move. Select the best one or randomize from the top. If there aren't any tactics available, repeat the process with flagging the terrain for two moves, and so on.
What I mean by fitness function is to decide on the "value" of performing the tactic on a certain unit or location. For instance, your "heal a friendly unit" tactical decision phase might step through all friendly units. If a friendly unit is within range (i.e., is reachable from a location your unit can reach), add it to the list of possible tactics and give it a fitness rating equal to, say, 100 * (1.0 - unit health), where unit health ranges from 0 to 1. Thus, healing a character down to only 10% health remaining would be worth 90 points, while a unit only down 5% would only be worth 5, and the unit wouldn't even consider healing an undamaged unit. Special units (i.e., "protect the boss" scenario units required to retain victory conditions) could be given a higher base number, so that they are given more attention by friendly units.
Similarly, your "melee attack" decision phase would step through all reachable enemy units, compute the likely damage, and compare that to the unit's health. Give each unit a "desirability" to attack, and multiply it by the percentage of remaining health you'd likely do, and you've got a pretty detailed fitness function that favors eliminating units when you can, but still goes after high-value targets.
Using a process like this, you'll get a list of options like "Move to location A and heal friendly unit B : 50 points", "Move to location C and attack hostile unit D : 15 points", etc. Suddenly, it's really easy to choose a tactic.
Further detail may be added by multiplying the fitness of the tactic by a fitness for the path you'd have to take to implement it. For instance, if the place you'd have to move to in order to heal a friendly unit puts you in severe danger (i.e., standing on a lava space or something), you might factor that in by multiplying the fitness of that tactic by .2 or so, so that the unit may still consider it, but only if it's really important. All this takes is writing an algorithm to assess the fitness of a given location, and could be as simple as a pre-computed "terrain desirability" number or as complex as maintaining "threat maps" of enemy units.
The hard part, of course, is finding the right measures to make the engine smart. But that's the fun part of your system to tweak.
If the terrain where the battle occurs are pre-determined, or not too wide, there is an article on terrain reasonning in FPS that can be used as a basis for a turn-based game.
In short, you pre-calculate for each cell of the map a set of values, such as suitability for shooting in a given direction, protection, visibility... and so on. the AI can then use these values to choose a correct action. For exemple, fighter will walk as quickly as possible toward ennemy, using protection if available, while thief will take a path where visibility from ennemy direction as low as possible, with the goal of attacking from flank or rear.
if the terrain is randomized and/or too wide, the pre-calcul can be to long to be useful, however.
regards
Guillaume
A good question the answers can be all over the place. Personally, I don't have a lot of experience with this but I would set a strategy around concept not distance.
You are going to create a state machine for each NPC. It will be predicting a character to attack via some settings.
For example a NPC would be flagged as Attack weakest or Attack Strongest or Attack Most Injured. Then I would attempt to position them such that they can damage there desired target.
If you also have healers you can do the same thing in reverse for the healer target.
Target changing will be an important part of this system too. So you will want to think about that. A simple version is to reevaluate changing target a given percentage of the turns.
And finally, I would add random chance into the system. For example a character could be set as follows
Attack Weakest .25
Attack Strongest .50
Attack Most Injured .25
Change target .1
When it's time to attack. You generate a random number from 0-1. If it's under you Change targets you change target by generating another random number of what target to attack.
You can begin to factor distance into your system by augmenting the attack mode percentages.
For example if it would take 3 turns to attack the most injured. Decrease it's percentage of being targeted by dividing that value by 3 and distributing the difference to the other two possibilities.
Here's the background... in my free time I'm designing an artillery warfare game called Staker (inspired by the old BASIC games Tank Wars and Scorched Earth) and I'm programming it in MATLAB. Your first thought might be "Why MATLAB? There are plenty of other languages/software packages that are better for game design." And you would be right. However, I'm a dork and I'm interested in learning the nuts and bolts of how you would design a game from the ground up, so I don't necessarily want to use anything with prefab modules. Also, I've used MATLAB for years and I like the challenge of doing things with it that others haven't really tried to do.
Now to the problem at hand: I want to incorporate AI so that the player can go up against the computer. I've only just started thinking about how to design the algorithm to choose an azimuth angle, elevation angle, and projectile velocity to hit a target, and then adjust them each turn. I feel like maybe I've been overthinking the problem and trying to make the AI too complex at the outset, so I thought I'd pause and ask the community here for ideas about how they would design an algorithm.
Some specific questions:
Are there specific references for AI design that you would suggest I check out?
Would you design the AI players to vary in difficulty in a continuous manner (a difficulty of 0 (easy) to 1 (hard), all still using the same general algorithm) or would you design specific algorithms for a discrete number of AI players (like an easy enemy that fires in random directions or a hard enemy that is able to account for the effects of wind)?
What sorts of mathematical algorithms (pseudocode description) would you start with?
Some additional info: the model I use to simulate projectile motion incorporates fluid drag and the effect of wind. The "fluid" can be air or water. In air, the air density (and thus effect of drag) varies with height above the ground based on some simple atmospheric models. In water, the drag is so great that the projectile usually requires additional thrust. In other words, the projectile can be affected by forces other than just gravity.
In a real artillery situation all these factors would be handled either with formulas or simply brute-force simulation: Fire an electronic shell, apply all relevant forces and see where it lands. Adjust and try again until the electronic shell hits the target. Now you have your numbers to send to the gun.
Given the complexity of the situation I doubt there is any answer better than the brute-force one. While you could precalculate a table of expected drag effects vs velocity I can't see it being worthwhile.
Of course a game where the AI dropped the first shell on your head every time wouldn't be interesting. Once you know the correct values you'll have to make the AI a lousy shot. Apply a random factor to the shot and then walk to towards the target--move it say 30+random(140)% towards the true target each time it shoots.
Edit:
I do agree with BCS's notion of improving it as time goes on. I said that but then changed my mind on how to write a bunch of it and then ended up forgetting to put it back in. The tougher it's supposed to be the smaller the random component should be.
Loren's brute force solution is appealing as because it would allow easy "Intelligence adjustments" by adding more iterations. Also the adjustment factors for the iteration could be part of the intelligence as some value will make it converge faster.
Also for the basic system (no drag, wind, etc) there is a closed form solution that can be derived from a basic physics text. I would make the first guess be that and then do one or more iteration per turn. You might want to try and come up with an empirical correction correlation to improve the first shot (something that will make the first shot distributions average be closer to correct)
Thanks Loren and BCS, I think you've hit upon an idea I was considering (which prompted question #2 above). The pseudocode for an AIs turn would look something like this:
nSims; % A variable storing the numbers of projectile simulations
% done per turn for the AI (i.e. difficulty)
prevParams; % A variable storing the previous shot parameters
prevResults; % A variable storing some measure of accuracy of the last shot
newParams = get_new_guess(prevParams,prevResults);
loop for nSims times,
newResults = simulate_projectile_flight(newParams);
newParams = get_new_guess(newParams,newResults);
end
fire_projectile(newParams);
In this case, the variable nSims is essentially a measure of "intelligence" for the AI. A "dumb" AI would have nSims=0, and would simply make a new guess each turn (based on results of the previous turn). A "smart" AI would refine its guess nSims times per turn by simulating the projectile flight.
Two more questions spring from this:
1) What goes into the function get_new_guess? How should I adjust the three shot parameters to minimize the distance to the target? For example, if a shot falls short of the target, you can try to get it closer by adjusting the elevation angle only, adjusting the projectile velocity only, or adjusting both of them together.
2) Should get_new_guess be the same for all AIs, with the nSims value being the only determiner of "intelligence"? Or should get_new_guess be dependent on another "intelligence" parameter (like guessAccuracy)?
A difference between artillery games and real artillery situations is that all sides have 100% information, and that there are typically more than 2 opponents.
As a result, your evaluation function should consider which opponent it would be more urgent to try and eliminate. For example, if I have an easy kill at 90%, but a 50% chance on someone who's trying to kill me and just missed two shots near me, it's more important to deal with that chance.
I think you would need some way of evaluating the risk everyone poses to you in terms of ammunition, location, activity, past history, etc.
I'm now addressing the response you posted:
While you have the general idea I don't believe your approach will be workable--it's going to converge way too fast even for a low value of nSims. I doubt you want more than one iteration of get_new_guess between shells and it very well might need some randomizing beyond that.
Even if you can use multiple iterations they wouldn't be good at making a continuously increasing difficulty as they will be big steps. It seems to me that difficulty must be handled by randomness.
First, get_initial_guess:
To start out I would have a table that divides the world up into zones--the higher the difficulty the more zones. The borders between these zones would have precalculated power for 45, 60 & 75 degrees. Do a test plot, if a shell smacks terrain try again at a higher angle--if 75 hits terrain use it anyway.
The initial shell should be fired at a random power between the values given for the low and high bounds.
Now, for get_new_guess:
Did the shell hit terrain? Increase the angle. I think there will be a constant ratio of how much power needs to be increased to maintain the same distance--you'll need to run tests on this.
Assuming it didn't smack a mountain, note if it's short or long. This gives you a bound. The new guess is somewhere between the two bounds (if you're missing a bound, use the value from the table in get_initial_guess in it's place.)
Note what percentage of the way between the low and high bound impact points the target is and choose a power that far between the low and high bound power.
This is probably far too accurate and will likely require some randomizing. I've changed my mind about adding a simple random %. Rather, multiple random numbers should be used to get a bell curve.
Another thought: Are we dealing with a system where only one shell is active at once? Long ago I implemented an artillery game where you had 5 barrels, each with a fixed reload time that was above the maximum possible flight time.
With that I found myself using a strategy of firing shells spread across the range between my current low bound and high bound. It's possible that being a mere human I wasn't using an optimal strategy, though--this was realtime, getting a round off as soon as the barrel was ready was more important than ensuring it was aimed as well as possible as it would converge quite fast, anyway. I would generally put a shell on target on the second salvo and the third would generally all be hits. (A kill required killing ALL pixels in the target.)
In an AI situation I would model both this and a strategy of holding back some of the barrels to fire more accurate rounds later. I would still fire a spread across the target range, the only question is whether I would use all barrels or not.
I have personally created such a system - for the web-game Zwok, using brute force. I fired lots of shots in random directions and recorded the best result. I wouldn't recommend doing it any other way as the difference between timesteps etc will give you unexpected results.