I am implementing a small grid based, turn based strategy in the lines of Final Fantasy tactics.
Do you have any ideas on how i can approach the target selection, movement and skill selection process?
I am considering having the decisions disconnected, but all these 3 decisions are largely coupled.
(eg. i can't decide where to move unless i know who i am going to attack, and what range the skill i will use has, and vice versa, i can't decide who to attack unless i know how many turns it will take me to reach each target)
I want to move towards a unified system, but trying out things from Potential field research used in a manner like in the Killzone 1 AI has me getting stuck on local maximums.
=== Update 1
I am currently trying to use potential fields / influence maps to generate the data i take decisions upon.
I have no idea how to handle having many skills, and skills that don't do damage but rather buff/debuff or alter the world.
Someone elsewhere suggested using Monte Carlo Tree Search, used currently in Go games.
I believe the space my actors will be using is not good for it, as many many moves in the game don't result in a position from which you can attack and affect the world (i am in a world bigger than final fantasy tactics)
In final fantasy tactics it might be applied successfully, although the branching factor is much bigger than that of 9x9 Go (from what i understand)
===
Thanks in advance, Xtapodi.
ps.1 - A problem is that to know accurately how far an enemy is i would need to pathfind to him, because although the enemy is near, an impassable cliff might be separating us which takes 4 turns to go around. Or worse, a unit is blocking the way on lets say a bridge so there is actually no way to reach him.
One approach I've used is to do a two-pass system.
First, find out where your unit can go. Use A* or whatever to flag out the terrain to see how far the unit can move this turn.
Once you know that, step through your available tactics (melee attack, heal friendly unit, whatever), and assign a fitness function for all available uses of the tactic. If you pass in the flagged terrain, you can very quickly determine what your space of possible tactics are.
This gives you a list of available tactics and their fitness functions for each move. Select the best one or randomize from the top. If there aren't any tactics available, repeat the process with flagging the terrain for two moves, and so on.
What I mean by fitness function is to decide on the "value" of performing the tactic on a certain unit or location. For instance, your "heal a friendly unit" tactical decision phase might step through all friendly units. If a friendly unit is within range (i.e., is reachable from a location your unit can reach), add it to the list of possible tactics and give it a fitness rating equal to, say, 100 * (1.0 - unit health), where unit health ranges from 0 to 1. Thus, healing a character down to only 10% health remaining would be worth 90 points, while a unit only down 5% would only be worth 5, and the unit wouldn't even consider healing an undamaged unit. Special units (i.e., "protect the boss" scenario units required to retain victory conditions) could be given a higher base number, so that they are given more attention by friendly units.
Similarly, your "melee attack" decision phase would step through all reachable enemy units, compute the likely damage, and compare that to the unit's health. Give each unit a "desirability" to attack, and multiply it by the percentage of remaining health you'd likely do, and you've got a pretty detailed fitness function that favors eliminating units when you can, but still goes after high-value targets.
Using a process like this, you'll get a list of options like "Move to location A and heal friendly unit B : 50 points", "Move to location C and attack hostile unit D : 15 points", etc. Suddenly, it's really easy to choose a tactic.
Further detail may be added by multiplying the fitness of the tactic by a fitness for the path you'd have to take to implement it. For instance, if the place you'd have to move to in order to heal a friendly unit puts you in severe danger (i.e., standing on a lava space or something), you might factor that in by multiplying the fitness of that tactic by .2 or so, so that the unit may still consider it, but only if it's really important. All this takes is writing an algorithm to assess the fitness of a given location, and could be as simple as a pre-computed "terrain desirability" number or as complex as maintaining "threat maps" of enemy units.
The hard part, of course, is finding the right measures to make the engine smart. But that's the fun part of your system to tweak.
If the terrain where the battle occurs are pre-determined, or not too wide, there is an article on terrain reasonning in FPS that can be used as a basis for a turn-based game.
In short, you pre-calculate for each cell of the map a set of values, such as suitability for shooting in a given direction, protection, visibility... and so on. the AI can then use these values to choose a correct action. For exemple, fighter will walk as quickly as possible toward ennemy, using protection if available, while thief will take a path where visibility from ennemy direction as low as possible, with the goal of attacking from flank or rear.
if the terrain is randomized and/or too wide, the pre-calcul can be to long to be useful, however.
regards
Guillaume
A good question the answers can be all over the place. Personally, I don't have a lot of experience with this but I would set a strategy around concept not distance.
You are going to create a state machine for each NPC. It will be predicting a character to attack via some settings.
For example a NPC would be flagged as Attack weakest or Attack Strongest or Attack Most Injured. Then I would attempt to position them such that they can damage there desired target.
If you also have healers you can do the same thing in reverse for the healer target.
Target changing will be an important part of this system too. So you will want to think about that. A simple version is to reevaluate changing target a given percentage of the turns.
And finally, I would add random chance into the system. For example a character could be set as follows
Attack Weakest .25
Attack Strongest .50
Attack Most Injured .25
Change target .1
When it's time to attack. You generate a random number from 0-1. If it's under you Change targets you change target by generating another random number of what target to attack.
You can begin to factor distance into your system by augmenting the attack mode percentages.
For example if it would take 3 turns to attack the most injured. Decrease it's percentage of being targeted by dividing that value by 3 and distributing the difference to the other two possibilities.
Related
I was studying about the markov property in reinforcement learning, which is supposed to be one of the important assumptions of this field. In that it says, that while considering the probability of the future, we consider only the present state and actions and not that of the past. An important corollary that arises when we consider the probability of the present state given future state/action, the future state/action can't be ignored as it has valuable information in the computation of the present probability.
I do not understand this second statement. From the point of view of the future event, the present event seems to be the past for this future event. Then why are we considering this past event?
Let's focus on these two sentences individually. The Markov Property (which should apply in your problem, but in reality doesn't have to) says that the current state is all you need to look at to make your decision (e.g. a "screenshot" -aka observation- of the chess board is all you need to look at to make an optimal action). On the other hand, if you need to look at some old state (or observation) to understang something that is not implied in your current state, then the Markov property is not satisfied (e.g. you can't usually use a single frame of a videogame as a state, since you may be missing info regarding the velocity and acceleration of some moving objects. This is also why people use frame-stacking to "solve" video games using RL).
Now, regarding the future events which seems to be considered as past events: when the agent takes an action, it moves from one state to another. Remember that in RL you want to maximize the cumulative reward, that is the sum of all the rewards long-term. This also mean that you basically want to take action even sacrifying instantaneous "good" reward if this means obtaining better "future" (long-term) reward (e.g. sometimes you don't want to take the enemy queen if this allows the enemy to check-mate you in the next move). This is why in RL we try to estimate value-functions (state and/or action). State value-functions is a value assigned to a state which should represent how good is being in that state in a long-term perspective.
How is an agent supposed to know the future reward (aka calculate these value functions)? By exploring a lot of states and taking random actions (literally trial and error). Therefore, when an agent is in a certain "state1" and has to choose between taking action A and action B, he will NOT choose the one that has given him the best instantaneous reward, but the one which has made him get better rewards "long-term", that is the action with the bigger action-value, which will take into account not only the instantaneous rewards he gets from the transition from state1 to the next state, but also the value-function of that next state!
Therefore, future events in that sentence may seem to be considered as past events because estimating the value function require that you have been in those "future states" a lot of times during past iterations!
Hope I've been helpful
I think the title says it. A "game" takes a number of moves to complete, at which point a total score is computed. The goal is to maximize this score, and there are no rewards provided for specific moves during the game. Is there an existing algorithm that is geared toward this type of problem?
EDIT: By "continuously variable" reward, I mean it is a floating point number, not a win/loss binary. So you can't, for example, respond to "winning" by reinforcing the moves made to get there. All you have is a number. You can rank different runs in order of preference, but a single result is not especially meaningful.
First of all, in my opinion, the title of your question seems a little confusing when you talk about "continuously variable reward". Maybe you could clarify this aspect.
On the other hand, without taking into account the previous point, it looks your are talking about the temporal credit-assigment problem: How do you distribute credit for a sequence of actions which only obtain a reward (positive or negative) at the end of the sequence?
E.g., a Tic-tac-toe game where the agent doesn't recive any reward until the game ends. In this case, almost any RL algorithm tries to solve the temporal credit-assigment problem. See, for example, Section 1.5 of Sutton and Barto RL book, where they explain the working principles of RL and its advantages over other approaches using as example a Tic-tac-toe game.
I am working on a simple game of Tic Tac Toe code for C. I have most of the code finished, but I want the AI to never lose.
I have read about the minimax algorithm, but I don't understand it. How do I use this algorithm to enable the computer to either Win or Draw but never lose?
The way to approach this sort of problem is one of exploring possible futures. Usually (for a chess or drafts AI) you'd consider futures a certain number of moves ahead but because tic tac toe games are so short, you can explore to the end of the game.
Conceptual implementation
So you create a branching structure:
The AI imagines itself making every legal move
The AI them imagines the user making each legal move they can make after each of its legal move
Then the AI imagines each of its next legal moves
etc.
Then, going from the most branched end (the furthest forward in time) the player whose turn it is (AI or user) chooses which future is best for it (win, lose or draw) at each branching point. Then it hands over to the player higher up the tree (closer to the present); each time choosing the best future for the player whose imaginary turn it is until finally you're at the first branching point where the AI can see futures which play out towards it losing, drawing and winning. It chooses a future where it wins (or if unavailable draws).
Actual implementation
Note that conceptually this is what is happening but it's not necessary to create the whole tree, then judge it like this. You can just as easily work though the tree getting to the furthest points in time and choosing then.
Here, this approach works nicely with a recursive function. Each level of the function polls all its branches; passing the possible future to them and returns -1,0,+1; choosing the best score for the current player at each point. The top level chooses the move without actually knowing how each future pans out, just how well they pan out.
Pseudo code
I assume in this pseudo code that +1 is AI winning, 0 is drawing, -1 is user losing
determineNextMove(currentStateOfBoard)
currentBestMove= null
currentBestScore= - veryLargeNumber
for each legalMove
score=getFutureScoreOfMove(stateOfBoardAfterLegalMove , AI’sMove)
if score>currentBestScore
currentBestMove=legalMove
currentBestScore=score
end
end
make currentBestMove
end
getFutureScoreOfMove(stateOfBoard, playersTurn)
if no LegalMoves
return 1 if AI wins, 0 if draw, -1 if user wins
end
if playersTurn=AI’sTurn
currentBestScore= - veryLargeNumber //this is the worst case for AI
else
currentBestScore= + veryLargeNumber //this is the worst case for Player
end
for each legalMove
score=getFutureScoreOfMove(stateOfBoardAfterLegalMove , INVERT playersTurn)
if playersTurn ==AI’sTurn AND score>currentBestScore //AI wants positive score
currentBestScore=score
end
if playersTurn ==Users’sTurn AND score<currentBestScore //user wants negative score
currentBestScore=score
end
end
return currentBestScore
end
This pseudo code doesn't care what the starting board is (you call this function every AI move with the current board) and doesn't return what path the future will take (we can't know if the user will play optimally so this information is useless), but it will always choose the move that goes towards the optimum future for the AI.
Considerations for larger problems
In this case where you explore to the end of the game, it is obvious which the best possible future is (win, lose or draw), but if you're only going (for example) five moves into the future, you'd have to find some way of determining that; in chess or drafts piece score is the easiest way to do this with piece position being a useful enhancement.
I have been doing such a thing about 5 years ago. I've made a research. In tic tac toe it doesn't take long, you just need to prepare patterns for first two or three moves.
You need to check how to play:
Computer starts first.
Player starts first.
There are 9 different start positions:
But actually just 3 of them are different (others are rotated).
So after that you will see what should be done after some specific moves, I think you don't need any algorithms in this case because tic tac toe ending is determining by first moves. So in this case you will need a few if-else or switch statements and random generator.
tic tac toe belong to group of games, which won't be lost if you know how to play, so for such a games you do not need to use trees and modified sorting algorithms. To write such algorithm you need just a few functions:
CanIWin() to check if computer has 2 in a row and possible to win.
ShouldIBlock() to check if player do not have 2 in a row and need to block it.
Those two functions must be called in this order, if it returns true you need either to win or not to let player win.
After that you need to do other calculations for move.
One exclusive situation is when computer starts the game. You need to chose cell which belongs to the biggest amount of different directions (there are 8 of them - 3 horizontal, vertical and 2 diagonal). In such a algorithm computer will always choose center because it has 4 directions, you should add small possibility to choose second best option to make game a bit more attractive.
So when you reach situation where are some chosen parts of the board and computer have to move you need to rate every free cell. (If first or second function returned true you have to take an action before reaching this place!!!). So to rate cell you need to count how many open directions left on every cell, also you need to block at least one opponent direction.
After that you will have a few possible cells to put your mark. So you need to check for necessary moves sequence, because you will have a few options and it may be that one of them lead you into loosing. So after that you will have set and you can randomly choose move, or to choose the one with biggest score.
I have to say similar thing, as said at the beginning of post. Bigger games do not have perfect strategy and, lets say chess are much based on patterns, but also on forward thinking strategy (for such a thing is using patricia trie). So to sum up you do not need difficult algorithms, just a few functions to count how much you gain and opponent loses with move.
Make a subsidiary program to predict the cases with which the user can win. Then you can say your ai to do the things that the user has to do to win.
Looking back at my past projects I often encounter this one:
A client or a manager presents a task to me and asks for an estimate. I give an estimate say 24 hours. They also ask a business analyst and from what I've heard their experience is mostly non-technical. They give an estimate say 16 hours. In the end, they would consider the value given by the analyst even though aside from providing an estimate on my side, I've explained to them the feasibility of the task on the technical side. They treat the analysts estimate as a "fact of life" even though it is only an estimate and the true value is in the actual task itself. Worse, I see a pattern that they tend to be biased in choosing the lower value (say I presented a lower value estimate than the analyst, they quickly consider it) compared to the feasibility of the task. If you have read Peopleware, they are the types of people who given a set of work hours will do anything and everything in their power to shorten in even though that is not really possible.
Do you have specific negotiation skills and tactics that you used before to avoid this?
If I can help it, I would almost never give a number like "24 hours". Doing so makes several implicit assumptions:
The estimate is accurate to within an hour.
All of the figures in the number are significant figures.
The estimate is not sensitive to conditions that may arise between the time you give the estimate and the time the work is complete.
In most cases these are demonstrably wrong. To avoid falling into the trap posed by (1), quote ranges to reflect how uncertain you are about the accuracy of the estimate: "3 weeks, plus or minus 3 days". This also takes care of (2).
To close the loophole of (3), state your assumptions explicitly: "3 weeks, plus or minutes 3 days, assuming Alice and Bob finish the Frozzbozz component".
IMO, being explicit about your assumptions this way will show a greater depth of thought than the analyst's POV. I'd much rather pay attention to someone who's thought about this more intensely than someone who just pulled a number out of the air, and that will certainly count for plus points on your side of the negotiation.
Do you not have a work breakdown structure that validates your estimate?
If your manager/customer does not trust your estimate, you should be able to easily prove it beyond the ability of an analyst.
Nothing makes your estimate intrinsically better than his beyond the breakdown that shows it to be true. Something like this for example:
Gather Feature Requirements (2 hours)
Design Feature (4 hours)
Build Feature
1 easy form (4 hours)
1 easy business component (4 hours)
1 easy stored procedure (2 hours)
Test Feature
3 easy unit tests (4 hours)
1 regression test (4 hours)
Deploy Feature
1 easy deployment (4 hours)
==========
(28 hours)
Then you say "Okay, I came up with 28 hours, show me where I am wrong. Show me how you can do it in 16."
Sadly scott adams had a lot to contribute to this debate
Dilbert: "In a perfect world the project would take eight months. But based on past projects in this company, I applied a 1.5 incompetence multiplier. And then I applied an LWF of 6.3."
Pointy-Haired Boss: "LWF?"
Alice: "Lying Weasel Factor."
You can "control" clients a little easier than managers since the only power they really have is to not give the work to you (that solves your incorrect estimates problem pretty quickly).
But you just need to point out that it's not the analyst doing the work, it's you. And nobody is better at judging your times than you are.
It's a fact of life that people paying for the work (including managers) will focus on the lower figure. Many times I've submitted proper estimates with lower (e.g., $10.000) and upper bounds (e.g., $11,000) and had emails back saying that the clients were quite happy that I'd quoted $10,000 for the work.
Then, for some reason, they take umbrage when I bill them $10,500. You have to make it clear up front that estimates are, well, estimates, not guarantees. Otherwise they wouldn't be paying time-and-materials but fixed-price (and the fixed price would be considerably higher to cover the fact that the risk is now yours, not theirs).
In addition, you should include all assumptions and risks in any quotes you give. This will both cover you and demonstrate that your estimate is to be taken more seriously than some back-of-an-envelope calculation.
One thing you can do to try to fix this over time, and improve your estimating skills as well, is to track all of the estimates you make, and match those up with the actual time taken. If you can go back to your boss with a list of the last twenty estimates from both you and the business analyst, and the time each actually took, it will be readily apparent whose estimates you should trust.
Under no circumstances give a single figure, give a best, worst and a most likely. If you respond correctly then the next question should be "How do I get a more accurate number" to which the answer should be more detailed requirements and/or design depending where you are in the lifecycle.
Then you give another more refined range of best .. most ... likely and wost. This continues until you are done.
This is known as the cone of uncertanty I have lost count of the number of times I have drawn it on a whiteboard when talking estimates with clients.
Do you have specific negotiation skills and tactics that you used before to avoid this?
Don't work for such people.
Seriously.
Changing their behavior is beyond your control.
Here's the background... in my free time I'm designing an artillery warfare game called Staker (inspired by the old BASIC games Tank Wars and Scorched Earth) and I'm programming it in MATLAB. Your first thought might be "Why MATLAB? There are plenty of other languages/software packages that are better for game design." And you would be right. However, I'm a dork and I'm interested in learning the nuts and bolts of how you would design a game from the ground up, so I don't necessarily want to use anything with prefab modules. Also, I've used MATLAB for years and I like the challenge of doing things with it that others haven't really tried to do.
Now to the problem at hand: I want to incorporate AI so that the player can go up against the computer. I've only just started thinking about how to design the algorithm to choose an azimuth angle, elevation angle, and projectile velocity to hit a target, and then adjust them each turn. I feel like maybe I've been overthinking the problem and trying to make the AI too complex at the outset, so I thought I'd pause and ask the community here for ideas about how they would design an algorithm.
Some specific questions:
Are there specific references for AI design that you would suggest I check out?
Would you design the AI players to vary in difficulty in a continuous manner (a difficulty of 0 (easy) to 1 (hard), all still using the same general algorithm) or would you design specific algorithms for a discrete number of AI players (like an easy enemy that fires in random directions or a hard enemy that is able to account for the effects of wind)?
What sorts of mathematical algorithms (pseudocode description) would you start with?
Some additional info: the model I use to simulate projectile motion incorporates fluid drag and the effect of wind. The "fluid" can be air or water. In air, the air density (and thus effect of drag) varies with height above the ground based on some simple atmospheric models. In water, the drag is so great that the projectile usually requires additional thrust. In other words, the projectile can be affected by forces other than just gravity.
In a real artillery situation all these factors would be handled either with formulas or simply brute-force simulation: Fire an electronic shell, apply all relevant forces and see where it lands. Adjust and try again until the electronic shell hits the target. Now you have your numbers to send to the gun.
Given the complexity of the situation I doubt there is any answer better than the brute-force one. While you could precalculate a table of expected drag effects vs velocity I can't see it being worthwhile.
Of course a game where the AI dropped the first shell on your head every time wouldn't be interesting. Once you know the correct values you'll have to make the AI a lousy shot. Apply a random factor to the shot and then walk to towards the target--move it say 30+random(140)% towards the true target each time it shoots.
Edit:
I do agree with BCS's notion of improving it as time goes on. I said that but then changed my mind on how to write a bunch of it and then ended up forgetting to put it back in. The tougher it's supposed to be the smaller the random component should be.
Loren's brute force solution is appealing as because it would allow easy "Intelligence adjustments" by adding more iterations. Also the adjustment factors for the iteration could be part of the intelligence as some value will make it converge faster.
Also for the basic system (no drag, wind, etc) there is a closed form solution that can be derived from a basic physics text. I would make the first guess be that and then do one or more iteration per turn. You might want to try and come up with an empirical correction correlation to improve the first shot (something that will make the first shot distributions average be closer to correct)
Thanks Loren and BCS, I think you've hit upon an idea I was considering (which prompted question #2 above). The pseudocode for an AIs turn would look something like this:
nSims; % A variable storing the numbers of projectile simulations
% done per turn for the AI (i.e. difficulty)
prevParams; % A variable storing the previous shot parameters
prevResults; % A variable storing some measure of accuracy of the last shot
newParams = get_new_guess(prevParams,prevResults);
loop for nSims times,
newResults = simulate_projectile_flight(newParams);
newParams = get_new_guess(newParams,newResults);
end
fire_projectile(newParams);
In this case, the variable nSims is essentially a measure of "intelligence" for the AI. A "dumb" AI would have nSims=0, and would simply make a new guess each turn (based on results of the previous turn). A "smart" AI would refine its guess nSims times per turn by simulating the projectile flight.
Two more questions spring from this:
1) What goes into the function get_new_guess? How should I adjust the three shot parameters to minimize the distance to the target? For example, if a shot falls short of the target, you can try to get it closer by adjusting the elevation angle only, adjusting the projectile velocity only, or adjusting both of them together.
2) Should get_new_guess be the same for all AIs, with the nSims value being the only determiner of "intelligence"? Or should get_new_guess be dependent on another "intelligence" parameter (like guessAccuracy)?
A difference between artillery games and real artillery situations is that all sides have 100% information, and that there are typically more than 2 opponents.
As a result, your evaluation function should consider which opponent it would be more urgent to try and eliminate. For example, if I have an easy kill at 90%, but a 50% chance on someone who's trying to kill me and just missed two shots near me, it's more important to deal with that chance.
I think you would need some way of evaluating the risk everyone poses to you in terms of ammunition, location, activity, past history, etc.
I'm now addressing the response you posted:
While you have the general idea I don't believe your approach will be workable--it's going to converge way too fast even for a low value of nSims. I doubt you want more than one iteration of get_new_guess between shells and it very well might need some randomizing beyond that.
Even if you can use multiple iterations they wouldn't be good at making a continuously increasing difficulty as they will be big steps. It seems to me that difficulty must be handled by randomness.
First, get_initial_guess:
To start out I would have a table that divides the world up into zones--the higher the difficulty the more zones. The borders between these zones would have precalculated power for 45, 60 & 75 degrees. Do a test plot, if a shell smacks terrain try again at a higher angle--if 75 hits terrain use it anyway.
The initial shell should be fired at a random power between the values given for the low and high bounds.
Now, for get_new_guess:
Did the shell hit terrain? Increase the angle. I think there will be a constant ratio of how much power needs to be increased to maintain the same distance--you'll need to run tests on this.
Assuming it didn't smack a mountain, note if it's short or long. This gives you a bound. The new guess is somewhere between the two bounds (if you're missing a bound, use the value from the table in get_initial_guess in it's place.)
Note what percentage of the way between the low and high bound impact points the target is and choose a power that far between the low and high bound power.
This is probably far too accurate and will likely require some randomizing. I've changed my mind about adding a simple random %. Rather, multiple random numbers should be used to get a bell curve.
Another thought: Are we dealing with a system where only one shell is active at once? Long ago I implemented an artillery game where you had 5 barrels, each with a fixed reload time that was above the maximum possible flight time.
With that I found myself using a strategy of firing shells spread across the range between my current low bound and high bound. It's possible that being a mere human I wasn't using an optimal strategy, though--this was realtime, getting a round off as soon as the barrel was ready was more important than ensuring it was aimed as well as possible as it would converge quite fast, anyway. I would generally put a shell on target on the second salvo and the third would generally all be hits. (A kill required killing ALL pixels in the target.)
In an AI situation I would model both this and a strategy of holding back some of the barrels to fire more accurate rounds later. I would still fire a spread across the target range, the only question is whether I would use all barrels or not.
I have personally created such a system - for the web-game Zwok, using brute force. I fired lots of shots in random directions and recorded the best result. I wouldn't recommend doing it any other way as the difference between timesteps etc will give you unexpected results.