Will this random double generator work? - language-agnostic

Intuitively one might write a random double generator as follows:
double randDouble(double lowerBound, double upperBound)
{
double range = upperBound - lowerBound;
return lowerBound + range * rand();
}
Suppose we assume that rand() returns an evenly distributed pseudorandom double on the interval [0, 1).
Is this method guaranteed to return a random double within [lowerBound, upperBound) with a uniform probability distribution? I'm specifically interested in whether the nature of floating point calculations might cause spikes or dips in the final distribution for some ranges.

First, rand() generates pseudo-random numbers and not truly random. Thus, I will assume you are asking if your function generates pseudo-random numbers within the specified range.
Second, like Oli Charlesworth said, many rand implementations return a number between 0 and RAND_MAX, where RAND_MAX is the largest possible value it can take. In these cases, you can obtain a value in [0, 1) with
double r = rand()/((double)RAND_MAX+1);
the +1 is there so that r can't be 1.
Other languages have a rand that returns a value between 0 and 1, in which case you don't need to do the above division. Either way, it turns out that your function returns a decent approximation of a random distribution. See the following link for more details: http://www.thinkage.ca/english/gcos/expl/c/lib/rand.html Note that this link gives you slightly different functions which they claim work a bit better, but the one you have probably works good enough.

If your upper and lower bounds are adjacent powers of two, then your resulting distribution will be as good as the one you get from rand(), since you're effectively just altering the exponent of what rand() gives you, without altering the mantissa.
If you want to stretch the range to cover more than one power of two, then there will be valid floating point numbers in the lower half of your range that will never be generated by your method. (You're effectively shifting one or more bits of the mantissa into the exponent, leaving the least significant bit(s) of the mantissa as non-random.)
If you use the method on a more general range (such that the mantissa is modified by the calculation), then you also run in to the same non-uniformity you get when trying to convert convert a random integer to a random integer modulo n without using rejection sampling.
Any correct method for generating a uniform distribution of floating point numbers has to take in to account that the interval of real numbers that round to any given floating point number is not always the same width. In the lower part of a range, floating point numbers will be more dense, so each individual floating point number in that part of the range should be selected less often than the larger numbers.

Well, no. rand() returns a number between 0 and RAND_MAX; this quantisation will leave big holes in your distribution; in fact, almost all floating-point values in-between lowerBound and upperBound will never be selected.

Related

"Safe" Arithmetic of float numbers containing whole number on CUDA

I am currently making some research on using float numbers instead of integers on CUDA for arithmetic operations. The necessity arises since integer arithmetic is very slow compared to floating point arithmetic, and thus there is possible performance enhancement when using floats instead of integers.
I've made a small experiments and written a simple program that just loops and adds 1.0f to a variable..It came out that this works up to 16777216.0f ..adding further 1.0f to the number will leave the number unchanged... So I was wondering weather this number is the maximum number which as far as operators +,-,* involving solely whole numbers will lead to accurate whole number results, say with +/-0.0001 accuracy?
Regards
Daniel
Jonathan Dursi pointed to some important links to explain floating point in his comment.
If you look, you'll note that 16777216 is 2^24. Floating point (single precision) has 23 bits plus the implicit '1' (since values are normalised). With 24 bits you would be able to represent any integer from 1.0 * 2^0 to 1.11..11b * 2^23 (actually you get the negative numbers too since the sign bit is separate, and zero with special coding). You get the extra value (2^24) since that can be represented as 1.0 * 2^24.
As soon as you try to add 1 to 2^24 you will observe that you are subject to the rounding error described in the links Jonathan posted.
So for integers, you will need to restrict your range to [-2^24,2^24]. If you cannot do that, you either need some careful checking or else restrict yourself to integers!

as3 Number type - Logic issues with large numbers

I'm curious about an issue spotted in our team with a very large number:
var n:Number = 64336512942563914;
trace(n < Number.MAX_VALUE); // true
trace(n); // 64336512942563910
var a1:Number = n +4;
var a2:Number = a1 - n;
trace(a2); // 8 Expect to see 4
trace(n + 4 - n); // 8
var a3:Number = parseInt("64336512942563914");
trace(a3); // 64336512942563920
n++;
trace(n); //64336512942563910
trace(64336512942563914 == 64336512942563910); // true
What's going on here?
Although n is large, it's smaller than Number.MAX_VALUE, so why am I seeing such odd behaviour?
I thought that perhaps it was an issue with formatting large numbers when being trace'd out, but that doesn't explain n + 4 - n == 8
Is this some weird floating point number issue?
Yes, it is a floating point issue, but it is not a weird one. It is all expected behavior.
Number data type in AS3 is actually a "64-bit double-precision format as specified by the IEEE Standard for Binary Floating-Point Arithmetic (IEEE-754)" (source). Because the number you assigned to n has too many digits to fit into thos 64 bits, it gets rounded off, and that's the reason for all the "weird" results.
If you need to do some exact big integer arithmetic, you will have to use a custom big integer class, e.g. this one.
Numbers in flash are double precision floating point numbers. Meaning they store a number of significant digits and and exponent. It favors a larger range of expressible numbers over the precision of the numbers due to memory constraints. At some point, numbers with a lot of significant digits, rounding will occur. Which is what you are seeing; the least significant digits are being rounded. Google double precision floating point numbers and you'll find a bunch of technical information on why.
It is the nature of the datatype. If you need precise numbers you should really stick to uint or int based integers. Other languages have fixed point or bigint number processing libraries (sometimes called BigInt or Decimal) which are wrappers around ints and longs to express much larger numbers at the cost of memory consumption.
We have an as3 implementation of BigDecimal copied from Java that we use for ALL calculations. In a trading app the floating point errors were not acceptable.
To be safe with integer math when using Numbers, I checked the AS3 documentation for Number and it states:
The Number class can be used to represent integer values well beyond
the valid range of the int and uint data types. The Number data type
can use up to 53 bits to represent integer values, compared to the 32
bits available to int and uint.
A 53 bit integer gets you to 2^53 - 1, if I'm not mistaken which is 9007199254740991, or about 9 quadrillion. The other 11 bits that help make up the 64 bit Number are used in the exponent. The number used in the question is about 64.3 quadrillion. Going past that point (9 quadrillion) requires more bits for the significant number portion (the mantissa) than is allotted and so rounding occurs. A helpful video explaining why this makes sense (by PBS Studio's Infinite Series).
So yeah, one must search for outside resources such as the BigInt. Hopefully, the resources I linked to are useful to someone.
This is, indeed, a floating point number approximation issue.
I guess n is to large compared to 4, so it has to stick with children of its age:
trace(n - n + 4) is ok since it does n-n = 0; 0 + 4 = 4;
Actually, Number is not the type to be used for large integers, but floating point numbers. If you want to compute large integers you have to stay within the limit of uint.MAX_VALUE.
Cheers!

CUDA, float precision

I am using CUDA 4.0 on Geforce GTX 580 (Fermi) . I have numbers as small as 7.721155e-43 . I want to multiply them with each other just once or better say I want to calculate 7.721155e-43 * 7.721155e-43 .
My experience showed me I can't do it just straight forward. Could you please give me suggestion? Do I need to use double precision? How?
The magnitude of the smallest normal IEEE single-precision number is about 1.18e-38, the smallest denormal gets you down to about 1.40e-45. As a consequece an operand of magnitude 7.82e-43 will comprise only about 9 non-zero bits, which in itself may already be a problem, even before you get to the multiplication (whose result will underflow to zero in single precision). So you may also want to look at any up-stream computation that produces these tiny numbers.
If these small numbers are intermediate terms in a mathematical expression, rewriting that expression into a mathematically equivalent one that does not involve tiny intermediates would be one way of addressing the issue. Or you could scale some operands by factors that are powers of two (so as to not incur additional round-off due to the scaling). For example, scale by 2^24 = 16777216.
Lastly, you can switch part of the computation to double precision. To do so, simply introduce temporary variables of type double, perform the computation on them, then convert the final result back to float:
float r, f = 7.721155e-43f;
double d, t;
d = (double)f; // explicit cast is not necessary, since converting to wider type
t = d * d;
[... more intermediate computation, leaving result in 't' ...]
r = (float)t; // since conversion is to narrower type, cast will avoid warnings
In statistics we often have to work with likelihoods that end up being very small numbers and the standard technique is to use logs for everything. Then multiplication on a log scale is just addition. All intermediate numbers are stored as logs. Indeed it can take a bit of getting used to - but the alternative will often fail even when doing relatively modest computations. In R (for my convenience!) which uses doubles and prints 7 significant figures by default btw:
> 7.721155e-43 * 7.721155e-43
[1] 5.961623e-85
> exp(log(7.721155e-43) + log(7.721155e-43))
[1] 5.961623e-85

What is the proper method of constraining a pseudo-random number to a smaller range?

What is the best way to constrain the values of a PRNG to a smaller range? If you use modulus and the old max number is not evenly divisible by the new max number you bias toward the 0 through (old_max - new_max - 1). I assume the best way would be something like this (this is floating point, not integer math)
random_num = PRNG() / max_orginal_range * max_smaller_range
But something in my gut makes me question that method (maybe floating point implementation and representation differences?).
The random number generator will produce consistent results across hardware and software platforms, and the constraint needs to as well.
I was right to doubt the pseudocode above (but not for the reasons I was thinking). MichaelGG's answer got me thinking about the problem in a different way. I can model it using smaller numbers and test every outcome. So, let's assume we have a PRNG that produces a random number between 0 and 31 and you want the smaller range to be 0 to 9. If you use modulus you bias toward 0, 1, 2, and 3. If you use the pseudocode above you bias toward 0, 2, 5, and 7. I don't think there can be a good way to map one set into the other. The best that I have come up with so far is to regenerate the random numbers that are greater than old_max/new_max, but that has deep problems as well (reducing the period, time to generate new numbers until one is in the right range, etc.).
I think I may have naively approached this problem. It may be time to start some serious research into the literature (someone has to have tackled this before).
I know this might not be a particularly helpful answer, but I think the best way would be to conceive of a few different methods, then trying them out a few million times, and check the result sets.
When in doubt, try it yourself.
EDIT
It should be noted that many languages (like C#) have built in limiting in their functions
int maximumvalue = 20;
Random rand = new Random();
rand.Next(maximumvalue);
And whenever possible, you should use those rather than any code you would write yourself. Don't Reinvent The Wheel.
This problem is akin to rolling a k-sided die given only a p-sided die, without wasting randomness.
In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p". Thus, for example, if p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k has prime factors other than 2, the best you can do is get arbitrarily close to no waste of randomness, such as by batching multiple rolls of the p-sided die until p^n is "close enough" to a power of k.
Let me also go over some of your concerns about regenerating random numbers:
"Reducing the period": Besides batching of bits, this concern can be dealt with in several ways:
Use a PRNG with a bigger "period" (maximum cycle length).
Add a Bays–Durham shuffle to the PRNG's implementation.
Use a "true" random number generator; this is not trivial.
Employ randomness extraction, which is discussed in Devroye and Gravel 2015-2020 and in my Note on Randomness Extraction. However, randomness extraction is pretty involved.
Ignore the problem, especially if it isn't a security application or serious simulation.
"Time to generate new numbers until one is in the right range": If you want unbiased random numbers, then any algorithm that does so will generally have to run forever in the worst case. Again, by Lemma 3, the algorithm will run forever in the worst case unless "every prime number dividing k also divides p", which is not the case if, say, k is 10 and p is 32.
See also the question: How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?, especially my answer there.
If PRNG() is generating uniformly distributed random numbers then the above looks good. In fact (if you want to scale the mean etc.) the above should be fine for all purposes. I guess you need to ask what the error associated with the original PRNG() is, and whether further manipulating will add to that substantially.
If in doubt, generate an appropriately sized sample set, and look at the results in Excel or similar (to check your mean / std.dev etc. for what you'd expect)
If you have access to a PRNG function (say, random()) that'll generate numbers in the range 0 <= x < 1, can you not just do:
random_num = (int) (random() * max_range);
to give you numbers in the range 0 to max_range?
Here's how the CLR's Random class works when limited (as per Reflector):
long num = maxValue - minValue;
if (num <= 0x7fffffffL) {
return (((int) (this.Sample() * num)) + minValue);
}
return (((int) ((long) (this.GetSampleForLargeRange() * num))) + minValue);
Even if you're given a positive int, it's not hard to get it to a double. Just multiply the random int by (1/maxint). Going from a 32-bit int to a double should provide adequate precision. (I haven't actually tested a PRNG like this, so I might be missing something with floats.)
Psuedo random number generators are essentially producing a random series of 1s and 0s, which when appended to each other, are an infinitely large number in base two. each time you consume a bit from you're prng, you are dividing that number by two and keeping the modulus. You can do this forever without wasting a single bit.
If you need a number in the range [0, N), then you need the same, but instead of base two, you need base N. It's basically trivial to convert the bases. Consume the number of bits you need, return the remainder of those bits back to your prng to be used next time a number is needed.

Generating Uniform Random Deviates within a given range

I'd like to generate uniformly distributed random integers over a given range. The interpreted language I'm using has a builtin fast random number generator that returns a floating point number in the range 0 (inclusive) to 1 (inclusive). Unfortunately this means that I can't use the standard solution seen in another SO question (when the RNG returns numbers between 0 (inclusive) to 1 (exclusive) ) for generating uniformly distributed random integers in a given range:
result=Int((highest - lowest + 1) * RNG() + lowest)
The only sane method I can see at the moment is in the rare case that the random number generator returns 1 to just ask for a new number.
But if anyone knows a better method I'd be glad to hear it.
Rob
NB: Converting an existing random number generator to this language would result in something infeasibly slow so I'm afraid that's not a viable solution.
Edit: To link to the actual SO answer.
Presumably you are desperately interested in speed, or else you would just suck up the conditional test with every RNG call. Any other alternative is probably going to be slower than the branch anyway...
...unless you know exactly what the internal structure of the RNG is. Particularly, what are its return values? If they're not IEEE-754 floats or doubles, you have my sympathies. If they are, how many real bits of randomness are in them? You would expect 24 for floats and 53 for doubles (the number of mantissa bits). If those are naively generated, you may be able to use shifts and masks to hack together a plain old random integer generator out of them, and then use that in your function (depending on the size of your range, you may be able to use more shifts and masks to avoid any branching if you have such a generator). If you have a high-quality generator that produces full quality 24- or 53-bit random numbers, then with a single multiply you can convert them from [0,1] to [0,1): just multiply by the largest generatable floating-point number that is less than 1, and your range problem is gone. This trick will still work if the mantissas aren't fully populated with random bits, but you'll need to do a bit more work to find the right multiplier.
You may want to look at the C source to the Mersenne Twister to see their treatment of similar problems.
I don't see why the + 1 is needed. If the random number generator delivers a uniform distribution of values in the [0,1] interval then...
result = lowest + (rng() * (highest - lowest))
should give you a unform distribution of values between lowest
rng() == 0, result = lowest + 0 = lowest
and highest
rng() == 1, result = lowest + highest - lowest = highest
Including + 1 means that the upper bound on the generated number can be above highest
rng() == 1, result = lowest + highest - lowest + 1 = highest + 1.
The resulting distribution of values will be identical to the distribution of the random numbers, so uniformity depends on the quality of your random number generator.
Following on from your comment below you are right to point out that Int() will be the source of a lop-sided distribution at the tails. Better to use Round() to the nearest integer or whatever equivalent you have in your scripting language.