evaluating a rank based on several variables (Game Programming) - language-agnostic

I am designing a fairly simple space combat desktop game with no graphics, but i want the back end to be robust enough for lots of expansion. I want to rank three different aspects of a ship's capabilities on a scale from 1 to 100 (although i'm willing to reconsider these numbers.)
For instance, i have the hitpoint section of the ship class as follows:
// section private defense
float baseHull;
float hullMod;
float baseArmor;
float armorMod
float baseSheild;
float ShieldMod;
float miscMod = 1.0; // this can be “rarer ship types, i.e. elites or bosses or stations or the rich.**
these can be any arbitrary value, for now. i haven't designed anything to fit in the variables yet, because I'm trying to figure out how to rank the ships based on these sections... one each for movement, hitpoints, and offensive capabilities. As an added bonus, a global ranking would be nice too. the hitpoints section as above would just be "hitpoints" on the screen, like 50,000HP for a moderate support class ship and 100 for the space shuttles we have on earth.
the ranking would determine likelihood of winning a fight, and the "XP" rewarded for winning a fight. Adding them all up wouldn't work, because a ship with 10 meters of uranium plating isn't necessarily better than one with 1 meter of lead plating and shields. for reference, earth clothing would be a rank 1, an M1A1 tank would be like a 5, and the death star would be up around 40-50ish.
I've been searching for ways to do this with real world data, but i am neither a mathematics whiz or a statistician. is there a way to weight this into a handy function? is it possible to reverse the function to say input a value and have it assign the internals (this would be really cool, but not necessary.)

Well, a simple way to combine those variables to a total hitpoint value would be:
hitpoints = baseHull * hullMod + baseArmor * armorMod + baseShield * shieldMod;
You could then assign, say, values between 0 and 100 for the base values determining "how much" of hull, armor, and shield they have, and values between 1 and 10 for the modifying values, which define "how strong" each item is.
Calculating the winner of a fight could be done like this, for example:
totalPoints = ship1Points + ship2Points;
ship1won = (rand() % totalPoints) < ship1Points;
Where the points of the ships are some values calculated by the hitpoints and the offense values of the ships. So you calculate the total points of the two combating ships and pick a random number between 0 and the total points. If ship1points is, say, 20, and ship2points is 50, ship 1 has the possibility of winning of 20 / 70. To reduce the probability even more (say you want to be more sure that the stronger ship wins), you could multiply both points by a constant or square them before the final calculation.

Related

Function to dampen a value

I have a list of documents each having a relevance score for a search query. I need older documents to have their relevance score dampened, to try to introduce their date in the ranking process. I already tried fiddling with functions such as 1/(1+date_difference), but the reciprocal function is too discriminating for close recent dates.
I was thinking maybe a mathematical function with range (0..1) and domain(0..x) to amplify their score, where the x-axis is the age of a document. It's best to explain what I further need from the function by an image:
Decaying behavior is often modeled well by an exponentional function (many decaying processes in nature also follow it). You would use 2 positive parameters A and B and get
y(x) = A exp(-B x)
Since you want a y-range [0,1] set A=1. Larger B give slower decays.
If a simple 1/(1+x) decreases too quickly too soon, a sigmoid function like 1/(1+e^-x) or the error function might be better suited to your purpose. Let the current date be somewhere in the negative numbers for such a function, and you can get a value that is current for some configurable time and then decreases towards a base value.
log((x+1)-age_of_document)
Where the base of the logarithm is (x+1). Note the x is as per your diagram and is the "threshold". If the age of the document is greater than x the score goes negative. Multiply by the maximum possible score to introduce scaling.
E.g. Domain = (0,10) with a maximum score of 10: 10*(log(11-x))/log(11)
A bit late, but as thiton says, you might want to use a sigmoid function instead, since it has a "floor" value for your long tail data points. E.g.:
0.8/(1+5^(x-3)) + 0.2 - You can adjust the constants 5 and 3 to control the slope of the curve. The 0.2 is where the floor will be.

Determining edge weights given a list of walks in a graph

These questions regard a set of data with lists of tasks performed in succession and the total time required to complete them. I've been wondering whether it would be possible to determine useful things about the tasks' lengths, either as they are or with some initial guesstimation based on appropriate domain knowledge. I've come to think graph theory would be the way to approach this problem in the abstract, and have a decent basic grasp of the stuff, but I'm unable to know for certain whether I'm on the right track. Furthermore, I think it's a pretty interesting question to crack. So here we go:
Is it possible to determine the weights of edges in a directed weighted graph, given a list of walks in that graph with the lengths (summed weights) of said walks? I recognize the amount and quality of permutations on the routes taken by the walks will dictate the quality of any possible answer, but let's assume all possible walks and their lengths are given. If a definite answer isn't possible, what kind of things can be concluded about the graph? How would you arrive at those conclusions?
What if there were several similar walks with possibly differing lengths given? Can you calculate a decent average (or other illustrative measure) for each edge, given enough permutations on different routes to take? How will discounting some permutations from the available data set affect the calculation's accuracy?
Finally, what if you had a set of initial guesses as to the weights and had to refine those using the walks given? Would that improve upon your guesstimation ability, and how could you apply the extra information?
EDIT: Clarification on the difficulties of a plain linear algebraic approach. Consider the following set of walks:
a = 5
b = 4
b + c = 5
a + b + c = 8
A matrix equation with these values is unsolvable, but we'd still like to estimate the terms. There might be some helpful initial data available, such as in scenario 3, and in any case we can apply knowledge of the real world - such as that the length of a task can't be negative. I'd like to know if you have ideas on how to ensure we get reasonable estimations and that we also know what we don't know - eg. when there's not enough data to tell a from b.
Seems like an application of linear algebra.
You have a set of linear equations which you need to solve. The variables being the lengths of the tasks (or edge weights).
For instance if the tasks lengths were t1, t2, t3 for 3 tasks.
And you are given
t1 + t2 = 2 (task 1 and 2 take 2 hours)
t1 + t2 + t3 = 7 (all 3 tasks take 7 hours)
t2 + t3 = 6 (tasks 2 and 3 take 6 hours)
Solving gives t1 = 1, t2 = 1, t3 = 5.
You can use any linear algebra techniques (for eg: http://en.wikipedia.org/wiki/Gaussian_elimination) to solve these, which will tell you if there is a unique solution, no solution or an infinite number of solutions (no other possibilities are possible).
If you find that the linear equations do not have a solution, you can try adding a very small random number to some of the task weights/coefficients of the matrix and try solving it again. (I believe falls under Perturbation Theory). Matrices are notorious for radically changing behavior with small changes in the values, so this will likely give you an approximate answer reasonably quickly.
Or maybe you can try introducing some 'slack' task in each walk (i.e add more variables) and try to pick the solution to the new equations where the slack tasks satisfy some linear constraints (like 0 < s_i < 0.0001 and minimize sum of s_i), using Linear Programming Techniques.
Assume you have an unlimited number of arbitrary characters to represent each edge. (a,b,c,d etc)
w is a list of all the walks, in the form of 0,a,b,c,d,e etc. (the 0 will be explained later.)
i = 1
if #w[i] ~= 1 then
replace w[2] with the LENGTH of w[i], minus all other values in w.
repeat forever.
Example:
0,a,b,c,d,e 50
0,a,c,b,e 20
0,c,e 10
So:
a is the first. Replace all instances of "a" with 50, -b,-c,-d,-e.
New data:
50, 50
50,-b,-d, 20
0,c,e 10
And, repeat until one value is left, and you finish! Alternatively, the first number can simply be subtracted from the length of each walk.
I'd forget about graphs and treat lists of tasks as vectors - every task represented as a component with value equal to it's cost (time to complete in this case.
In tasks are in different orderes initially, that's where to use domain knowledge to bring them to a cannonical form and assign multipliers if domain knowledge tells you that the ratio of costs will be synstantially influenced by ordering / timing. Timing is implicit initial ordering but you may have to make a function of time just for adjustment factors (say drivingat lunch time vs driving at midnight). Function might be tabular/discrete. In general it's always much easier to evaluate ratios and relative biases (hardnes of doing something). You may need a functional language to do repeated rewrites of your vectors till there's nothing more that romain knowledge and rules can change.
With cannonical vectors consider just presence and absence of task (just 0|1 for this iteratioon) and look for minimal diffs - single task diffs first - that will provide estimates which small number of variables. Keep doing this recursively, be ready to back track and have a heuristing rule for goodness or quality of estimates so far. Keep track of good "rounds" that you backtraced from.
When you reach minimal irreducible state - dan't many any more diffs - all vectors have the same remaining tasks then you can do some basic statistics like variance, mean, median and look for big outliers and ways to improve initial domain knowledge based estimates that lead to cannonical form. If you finsd a lot of them and can infer new rules, take them in and start the whole process from start.
Yes, this can cost a lot :-)

mysql/stats: Weighting an average to accentuate differences from the mean

This is for a new feature on http://cssfingerprint.com (see /about for general info).
The feature looks up the sites you've visited in a database of site demographics, and tries to guess what your demographic stats are based on that.
All my demgraphics are in 0..1 probability format, not ratios or absolute numbers or the like.
Essentially, you have a large number of data points that each tend you towards their own demographics. However, just taking the average is poor, because it means that by adding in a lot of generic data, the number goes down.
For example, suppose you've visited sites S0..S50. All except S0 are 48% female; S0 is 100% male. If I'm guessing your gender, I want to have a value close to 100%, not just the 49% that a straight average would give.
Also, consider that most demographics (i.e. everything other than gender) does not have the average at 50%. For example, the average probability of having kids 0-17 is ~37%. The more a given site's demographics are different from this average (e.g. maybe it's a site for parents, or for child-free people), the more it should count in my guess of your status.
What's the best way to calculate this?
For extra credit: what's the best way to calculate this, that is also cheap & easy to do in mysql?
ETA: I think that something approximating what I want is Φ(AVG(z-score ^ 2, sign preserved)). But I'm not sure if this is a good weighting function.
(Φ is the standard normal distribution function - http://en.wikipedia.org/wiki/Standard_normal_distribution#Definition)
A good framework for these kinds of calculations is Bayesian inference. You have a prior distribution of the demographics - eg 50% male, 37% childless, etc. Preferrably, you would have it multivariately: 10% male childless 0-17 Caucasian ..., but you can start with one-at-a-time.
After this prior each site contributes new information about the likelihood of a demographic category, and you get the posterior estimate which informs your final guess. Using some independence assumptions the updating formula is as follows:
posterior odds = (prior odds) * (site likelihood ratio),
where odds = p/(1-p) and the likelihood ratio is a multiplier modifying the odds after visiting the site. There are various formulas for it, but in this case I would just use the above formula for the general population and the site's population to calculate it.
For example, for a site that has 35% of its visitors in the "under 20" agegroup, which represents 20% of the population, the site likelihood ratio would be
LR = (0.35/0.65) / (0.2/0.8) = 2.154
so visiting this site would raise the odds of being "under 20" 2.154-fold.
A site that is 100% male would have an infinite LR, but you would probably want to limit it somewhat by, say, using only 99.9% male. A site that is 50% male would have an LR of 1, so it would not contribute any information on gender distribution.
Suppose you start knowing nothing about a person - his or her odds of being "under 20" are 0.2/0.8 = 0.25. Suppose the first site has an LR=2.154 for this outcome - now the odds of being "under 20" becomes 0.25*(2.154) = 0.538 (corresponding to the probability of 35%). If the second site has the same LR, the posterior odds become 1.16, which is already 54%, etc. (probability = odds/(1+odds)). At the end you would pick the category with the highest posterior probability.
There are loads of caveats with these calculations - for example, the assumption of independence likely being wrong, but it can provide a good start.
The naive Bayesian formula for you case looks like this:
SELECT probability
FROM (
SELECT #apriori := CAST(#apriori * ratio / (#apriori * ratio + (1 - #apriori) * (1 - ratio)) AS DECIMAL(30, 30)) AS probability,
#step := #step + 1 AS step
FROM (
SELECT #apriori := 0.5,
#step := 0
) vars,
(
SELECT 0.99 AS ratio
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
UNION ALL
SELECT 0.48
) q
) q2
ORDER BY
step DESC
LIMIT 1
Quick 'n' dirty: get a male score by multiplying the male probabilities, and a female score by multiplying the female probabilities. Predict the larger. (Actually, don't multiply; sum the log of each probability instead.) I think this is a maximum likelihood estimator if you make the right (highly unrealistic) assumptions.
The standard formula for calculating the weighted mean is given in this question and this question
I think you could look into these approaches and then work out how you calculate your weights.
In your gender example above you could adopt something along the lines of a set of weights {1, ..., 0 , ..., 1} which is a linear decrease from 0 to 1 for gender values of 0% male to 50% and then a corresponding increase up to 100%. If you want the effect to be skewed in favour of the outlying values then you easily come up with a exponential or trigonometric function that provides a different set of weights. If you wanted to then a normal distribution curve will also do the trick.

Placement of defensive structures in a game

I am working on an AI bot for the game Defcon. The game has cities, with varying populations, and defensive structures with limited range. I'm trying to work out a good algorithm for placing defence towers.
Cities with higher populations are more important to defend
Losing a defence tower is a blow, so towers should be placed reasonably close together
Towers and cities can only be placed on land
So, with these three rules, we see that the best kind of placement is towers being placed in a ring around the largest population areas (although I don't want an algorithm just to blindly place a ring around the highest area of population, sometime there might be 2 sets of cities far apart, in which case the algorithm should make 2 circles, each one half my total towers).
I'm wondering what kind of algorithms might be used for determining placement of towers?
I would define a function determines the value of a tower placed at that position. Then search for maxima in that function and place a tower there.
A sketch for the function could look like this:
if water return 0
popsum = sum for all city over (population/distance) // it's better to have towers close by
towersum = - sum for all existing towers (1/distance) // you want you towers spread somewhat evenly
return popsum + towersum*f // f adjusts the relative importance of spreading towers equally and protecting the population centers with many towers
Should give a reasonable algorithm to start with. For improvement you might change the 1/distance function to something different, to get a faster or slower drop of.
I'd start with implementing a fitness function that calculates the expected protection provided by a set of towers on a given map.
You'd calculate the amount of population inside the "protected" area where areas covered by two towers is rated a bit higher than area covered by only one tower (the exact scaling factor depends a lot on the game mechanics, 'though).
Then you could use a genetic algorithm to experiment with different sets of placements and let that run for several (hundered?) iterations.
If your fitness function is a good fit to the real quality of the placement and your implementation of the genetic algorithm is correct, then you should get a reasonable result.
And once you've done all that you can start developing an attack plan that tries to optimize the casualties for any given set of defense tower placements. Once you have that you can set the two populations against each other and reach even better defense plans this way (that is one of the basic ideas of artificial life).
I don't know the game but from your description it seems that you need an algorithm similar to the one for solving the (weighted) k-centers problem. Well, unfortunately, this is an NP hard problem so in the best case you'll get an approximation upper bounded by some factor.
Take a look here: http://algo2.iti.kit.edu/vanstee/courses/kcenter.pdf
Just define a utility function that takes a potential build position as input and returns a "rating" for that position. I imagine it would look something like:
utility(position p) = k1 * population_of_city_at_p +
k2 * new_area_covered_if_placed_at_p +
k3 * number_of_nearby_defences
(k1, k2, and k3 are arbitrary constants that you'll need to tune)
Then, just randomly sample of bunch of different points, p and choose the one with the highest utility.

What is the proper method of constraining a pseudo-random number to a smaller range?

What is the best way to constrain the values of a PRNG to a smaller range? If you use modulus and the old max number is not evenly divisible by the new max number you bias toward the 0 through (old_max - new_max - 1). I assume the best way would be something like this (this is floating point, not integer math)
random_num = PRNG() / max_orginal_range * max_smaller_range
But something in my gut makes me question that method (maybe floating point implementation and representation differences?).
The random number generator will produce consistent results across hardware and software platforms, and the constraint needs to as well.
I was right to doubt the pseudocode above (but not for the reasons I was thinking). MichaelGG's answer got me thinking about the problem in a different way. I can model it using smaller numbers and test every outcome. So, let's assume we have a PRNG that produces a random number between 0 and 31 and you want the smaller range to be 0 to 9. If you use modulus you bias toward 0, 1, 2, and 3. If you use the pseudocode above you bias toward 0, 2, 5, and 7. I don't think there can be a good way to map one set into the other. The best that I have come up with so far is to regenerate the random numbers that are greater than old_max/new_max, but that has deep problems as well (reducing the period, time to generate new numbers until one is in the right range, etc.).
I think I may have naively approached this problem. It may be time to start some serious research into the literature (someone has to have tackled this before).
I know this might not be a particularly helpful answer, but I think the best way would be to conceive of a few different methods, then trying them out a few million times, and check the result sets.
When in doubt, try it yourself.
EDIT
It should be noted that many languages (like C#) have built in limiting in their functions
int maximumvalue = 20;
Random rand = new Random();
rand.Next(maximumvalue);
And whenever possible, you should use those rather than any code you would write yourself. Don't Reinvent The Wheel.
This problem is akin to rolling a k-sided die given only a p-sided die, without wasting randomness.
In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p". Thus, for example, if p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k has prime factors other than 2, the best you can do is get arbitrarily close to no waste of randomness, such as by batching multiple rolls of the p-sided die until p^n is "close enough" to a power of k.
Let me also go over some of your concerns about regenerating random numbers:
"Reducing the period": Besides batching of bits, this concern can be dealt with in several ways:
Use a PRNG with a bigger "period" (maximum cycle length).
Add a Bays–Durham shuffle to the PRNG's implementation.
Use a "true" random number generator; this is not trivial.
Employ randomness extraction, which is discussed in Devroye and Gravel 2015-2020 and in my Note on Randomness Extraction. However, randomness extraction is pretty involved.
Ignore the problem, especially if it isn't a security application or serious simulation.
"Time to generate new numbers until one is in the right range": If you want unbiased random numbers, then any algorithm that does so will generally have to run forever in the worst case. Again, by Lemma 3, the algorithm will run forever in the worst case unless "every prime number dividing k also divides p", which is not the case if, say, k is 10 and p is 32.
See also the question: How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?, especially my answer there.
If PRNG() is generating uniformly distributed random numbers then the above looks good. In fact (if you want to scale the mean etc.) the above should be fine for all purposes. I guess you need to ask what the error associated with the original PRNG() is, and whether further manipulating will add to that substantially.
If in doubt, generate an appropriately sized sample set, and look at the results in Excel or similar (to check your mean / std.dev etc. for what you'd expect)
If you have access to a PRNG function (say, random()) that'll generate numbers in the range 0 <= x < 1, can you not just do:
random_num = (int) (random() * max_range);
to give you numbers in the range 0 to max_range?
Here's how the CLR's Random class works when limited (as per Reflector):
long num = maxValue - minValue;
if (num <= 0x7fffffffL) {
return (((int) (this.Sample() * num)) + minValue);
}
return (((int) ((long) (this.GetSampleForLargeRange() * num))) + minValue);
Even if you're given a positive int, it's not hard to get it to a double. Just multiply the random int by (1/maxint). Going from a 32-bit int to a double should provide adequate precision. (I haven't actually tested a PRNG like this, so I might be missing something with floats.)
Psuedo random number generators are essentially producing a random series of 1s and 0s, which when appended to each other, are an infinitely large number in base two. each time you consume a bit from you're prng, you are dividing that number by two and keeping the modulus. You can do this forever without wasting a single bit.
If you need a number in the range [0, N), then you need the same, but instead of base two, you need base N. It's basically trivial to convert the bases. Consume the number of bits you need, return the remainder of those bits back to your prng to be used next time a number is needed.