Evenly distributed random numbers relatively prime to 2 - language-agnostic

A specific example
I need to generate a random number between 0 and 2, inclusive. (or choose randomly between -1, 0, and 1).
The naive approach would be to do something like rand() mod 3 where rand() returns an integer. This approach will not generate statistically random numbers unless the upper bound of rand() is not relatively prime (and the lower bound is 0).
For instance, assuming rand() returned 2 bits (from 0 to 3, inclusive), the modulus would map:
0 -> 0
1 -> 1
2 -> 2
3 -> 0
This skew toward 0 would obviously be much less if more bits would be returned, but regardless, the skew would remain.
The generic question
Is there a way of generating an evenly distributed random number between 0 and n-1, inclusive, where n is relatively prime to 2?

A common approach is to discard random values above the last full cycle, and just ask for a new random number.

It might help choosing your rand() upper bound to be k*n where k is an integer. This way the outcome will be evenly distributed provided that rand() is a good random generator.
If it's not possible to reduce the upper bound, you can pick k so that k*n is as close to rand() upper bound as possible and discard the results above this number trying again.

See my answer to a similar question.
Basically, use your RNG and discard everything above N and try again. For optimization, you can use mod, and discard everything above n * floor(MAX / n)

Generic Answer: You need to use more than just 2 bits of the number.
My rule of thumb is to generate floating-point values, x, 0.0 <= x < 1.0, multiply by 3 and truncate. That should get you values in the range 0, 1 and 2 that depend on a larger number of bits.

Related

LC-3 algorithm for converting ASCII strings to Binary Values

Figure 10.4 provides an algorithm for converting ASCII strings to binary values. Suppose the decimal number is arbitrarily long. Rather than store a table of 10 values for the thousands-place digit, another table for the 10 ten-thousands-place digit, and so on, design an algorithm to do the conversion without resorting to any tables whatsoever.
I have attached pictures of figure 10.4. I am not looking for an answer to the problem, but rather can someone please explain this problem and perhaps give some direction on how to go about creating the algorithm?
Figure 10.4
Figure 10.4 second image
I am unsure as to what it means by tables and do not know where to start really.
The tables are those global, initialized arrays: one called Lookup10 holding 10, 20, 30, 40, ..., and another called Lookup100 holding 100, 200, 300, 400...
You can ignore the tables: as per the assignment instructions you're supposed to find a different way to accomplish this anyway.  Or, you can run that code in simulator or mentally to understand how it works.
The bottom line is that LC-3, while it can do anything (it is turning complete), it can't do much in any one instruction.  For arithmetic & logic, it can do add, not, and.  That's pretty much it!  But that's enough — let's note that modern hardware does everything with only one logic gate, namely NAND, which is a binary operator (so NAND directly available; NOT by providing NAND with the same operand for both inputs; AND by doing NOT after NAND; OR using NOT on both inputs first and then NAND; etc..)
For example, LC-3 cannot multiply or divide or modulus or right shift directly — each of those operations is many instructions and in the general case, some looping construct.  Multiplication can be done by repetitive addition, and division/modulus by repetitive subtraction.  These are super inefficient for larger operands, and there are much more efficient algorithms that are also substantially more complex, so those greatly increase program complexity beyond that already with the repetitive operation approach.
That subroutine goes backwards through the use input string.  It takes a string length count in R1 as parameter supplied by caller (not shown).  It looks at the last character in the input and converts it from an ASCII character to a binary number.
(We would commonly do that conversion from ascii character to numeric value using subtraction: moving the character values from the ascii character range of 0x30..0x39 to numeric values in the range 0..9, but they do it with masking, which also works.  The subtraction approach integrates better with error detection (checking if not a valid digit character, which is not done here), whereas the masking approach is simpler for LC-3.)
The subroutine then obtains the 2nd last digit (moving backwards through the user's input string), converting that to binary using the mask approach.  That yields a number between 0 and 9, which is used as an index into the first table Lookup10.  The value obtained from the table at that index position is basically the index × 10.  So this table is a × 10 table.  The same approach is used for the third (and first or, last-going-backwards) digit, except it uses the 2nd table which is a × 100 table.
The standard approach for string to binary is called atoi (search it) standing for ascii to integer.  It moves forwards through the string, and for every new digit, it multiples the existing value, computed so far, by 10 before adding in the next digit's numeric value.
So, if the string is 456, the first it obtains 4, then because there is another digit, 4 × 10 = 40, then + 5 for 45, then × 10 for 450, then + 6 for 456, and so on.
The advantage of this approach is that it can handle any number of digits (up to overflow).  The disadvantage, of course, is that it requires multiplication, which is a complication for LC-3.
Multiplication where one operand is the constant 10 is fairly easy even in LC-3's limited capabilities, and can be done with simple addition without looping.  Basically:
n × 10 = n + n + n + n + n + n + n + n + n + n
and LC-3 can do those 9 additions in just 9 instructions.  Still, we can also observe that:
n × 10 = n × 8 + n × 2
and also that:
n × 10 = (n × 4 + n) × 2     (which is n × 5 × 2)
which can be done in just 4 instructions on LC-3 (and none of these needs looping)!
So, if you want to do this approach, you'll have to figure out how to go forwards through the string instead of backwards as the given table version does, and, how to multiply by 10 (use any one of the above suggestions).
There are other approaches as well if you study atoi.  You could keep the backwards approach, but now will have to multiply by 10, by 100, by 1000, a different factor for each successive digit .  That might be done by repetitive addition.  Or a count of how many times to multiply by 10 — e.g. n × 1000 = n × 10 × 10 × 10.

Theory behind multiplying two numbers without operands

I have been reading a Elements of Programming Interview and am struggling to understand the passage below:
"The algorithm taught in grade-school for decimal multiplication does
not use repeated addition- it uses shift and add to achieve a much
better time complexity. We can do the same with binary numbers- to
multiply x and y we initialize the result to 0 and iterate through the
bits of x, adding (2^k)y to the result if the kth bit of x is 1.
The value (2^k)y can be computed by left-shifting y by k. Since we
cannot use add directly, we must implement it. We can apply the
grade-school algorithm for addition to the binary case, i.e, compute
the sum bit-by-bit and "rippling" the carry along.
As an example, we show how to multiply 13 = (1101) and 9 = (1001)
using the algorithm described above. In the first iteration, since
the LSB of 13 is 1, we set the result to (1001). The second bit of
(1101) is 0, so we move on the third bit. The bit is 1, so we shift
(1001) to the left by 2 to obtain (1001001), which we add to (1001) to
get (101101). The forth and final bit of (1101) is 1, so we shift
(1001) to the left by 3 to obtain (1001000), which we add to (101101)
to get (1110101) = 117.
My Questions are:
What is the overall idea behind this, how is it a "bit-by-bit" addition
where does (2^k)y come from
what does it mean by "left-shifting y by k"
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
The algorithm relies on the way numbers are coded in binary.
Let A be an unsigned number. A is coded by a set of bits an-1an-2...a0 in such a way that A=∑i=0n-1ai×2i
Now, assume you have two numbers A and B coded in binary and you wand to compute A×B
B×A=B×∑i=0n-1ai×2i
=∑i=0n-1B×ai×2i
ai is equal to 0 or 1. If ai=0, the sum will not be modified. If ai=1, we need to add B×ai
So, we can simply deduce the multiplication algorithm
result=0
for i in 0 to n-1
if a[i]=1 // assumes a[i] is the ith bit
result = result + B * 2^i
end
end
What is the overall idea behind this, how is it a "bit-by-bit" addition
It is just an application of the previous method where you process successively every bit of the multiplicator
where does (2^k)y come from
As mentioned above from the way binary numbers are coded. If ith bit is set, then there is a 2i in the decomposition of the number.
what does it mean by "left-shifting y by k"
Left shift means "pushing" the bits leftwards and filling the "holes" with zeroes. Hence if number is 1101 and it is left shifted by three, it becomes 1101000.
This is the way to multiply the number by 2i (just as when "left shifting" by 2 a decimal number and putting zeroes at the right places is the way to multiply by 100=102)
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
Because there is a 1 at right most position, that corresponds to 20. So we left shift by 0 and add it to the result that is initialized to 0.

Three-way xor-like function

I'm trying to solve the following puzzle:
Given a stream of numbers (only 1 iteration over them is allowed) in which all numbers appear 3 times, but 1 number appear only 2 times, find this number, using O(1) memory.
I started with the idea that, if all numbers appeared 2 times, and 1 number only once, I could use xor operation between all numbers and the result would be the incognito number.
So I want to extend this idea to solve the puzzle. All I need is a xor-like function (or operator), which would yield 0 on the third apply:
SEED xor3 X xor3 X xor3 X = SEED
X xor3 Y xor3 SEED xor3 X xor3 Y xor3 Y xor3 X = SEED
Any ideas for such a function?
Regard XOR as summation on each bit of a number expressed in binary (i.e. a radix of 2), modulo 2.
Now consider a numerical system consisting of tribits 0, 1, and 2. That is, it has a radix of 3.
The operator T now becomes an operation on any number, decomposed into this radix. As in XOR, you sum the bits, but the difference is that operator T is ran in modulo 3.
You can easily show that a T a T a is zero for any a. You can also show that T is both commutative and associative. That is necessary since, in general, your sequence will have the numbers jumbled up.
Now apply this to your list of numbers. At the end of the operation, the output will be b where b = o T o and o is the number that occurs exactly twice.
Your solution for the simpler case (all number appear twice, one number appears once) works since xor operates on each bit x as
x xor x = 0 and 0 xor x = x
xor is basically a bit-wise summation modulus 2. You would need the base-3 equivalent: Transform each number into a base-3 representation. And then use summation modulus 3 for each decimal:
0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
Call this operation xor3. Now you have for each decimal x:
x xor3 x xor3 x = 0 and 0 xor3 x = x
If you apply that to all your numbers then all values that appear 3 times will vanish. The result is x xor3 x of the number x that appears twice. You need to apply decimal-wise division by 2 modulus 3.
I believe there are more efficient ways to implement that. The advantage of the xor function in the first case relies on the fact that xor is a natural base-2 operation. Is there any practical application for that?
This approach is a bit fragile: If the precondition (all numbers appear 3 times except one that appears twice) breaks the algorithm will not help you.
Take a Map with int-keys and int-values. Then walk through your numbers and for each number x increase each the according value. If x is a new key take 0 as start value.
Then you can analyze it easily: Walk through all keys and check the cardinality. It should be three for all keys except one that should be two. This is more robust and my gut feeling says it is also faster.

Will this random double generator work?

Intuitively one might write a random double generator as follows:
double randDouble(double lowerBound, double upperBound)
{
double range = upperBound - lowerBound;
return lowerBound + range * rand();
}
Suppose we assume that rand() returns an evenly distributed pseudorandom double on the interval [0, 1).
Is this method guaranteed to return a random double within [lowerBound, upperBound) with a uniform probability distribution? I'm specifically interested in whether the nature of floating point calculations might cause spikes or dips in the final distribution for some ranges.
First, rand() generates pseudo-random numbers and not truly random. Thus, I will assume you are asking if your function generates pseudo-random numbers within the specified range.
Second, like Oli Charlesworth said, many rand implementations return a number between 0 and RAND_MAX, where RAND_MAX is the largest possible value it can take. In these cases, you can obtain a value in [0, 1) with
double r = rand()/((double)RAND_MAX+1);
the +1 is there so that r can't be 1.
Other languages have a rand that returns a value between 0 and 1, in which case you don't need to do the above division. Either way, it turns out that your function returns a decent approximation of a random distribution. See the following link for more details: http://www.thinkage.ca/english/gcos/expl/c/lib/rand.html Note that this link gives you slightly different functions which they claim work a bit better, but the one you have probably works good enough.
If your upper and lower bounds are adjacent powers of two, then your resulting distribution will be as good as the one you get from rand(), since you're effectively just altering the exponent of what rand() gives you, without altering the mantissa.
If you want to stretch the range to cover more than one power of two, then there will be valid floating point numbers in the lower half of your range that will never be generated by your method. (You're effectively shifting one or more bits of the mantissa into the exponent, leaving the least significant bit(s) of the mantissa as non-random.)
If you use the method on a more general range (such that the mantissa is modified by the calculation), then you also run in to the same non-uniformity you get when trying to convert convert a random integer to a random integer modulo n without using rejection sampling.
Any correct method for generating a uniform distribution of floating point numbers has to take in to account that the interval of real numbers that round to any given floating point number is not always the same width. In the lower part of a range, floating point numbers will be more dense, so each individual floating point number in that part of the range should be selected less often than the larger numbers.
Well, no. rand() returns a number between 0 and RAND_MAX; this quantisation will leave big holes in your distribution; in fact, almost all floating-point values in-between lowerBound and upperBound will never be selected.

Generating Uniform Random Deviates within a given range

I'd like to generate uniformly distributed random integers over a given range. The interpreted language I'm using has a builtin fast random number generator that returns a floating point number in the range 0 (inclusive) to 1 (inclusive). Unfortunately this means that I can't use the standard solution seen in another SO question (when the RNG returns numbers between 0 (inclusive) to 1 (exclusive) ) for generating uniformly distributed random integers in a given range:
result=Int((highest - lowest + 1) * RNG() + lowest)
The only sane method I can see at the moment is in the rare case that the random number generator returns 1 to just ask for a new number.
But if anyone knows a better method I'd be glad to hear it.
Rob
NB: Converting an existing random number generator to this language would result in something infeasibly slow so I'm afraid that's not a viable solution.
Edit: To link to the actual SO answer.
Presumably you are desperately interested in speed, or else you would just suck up the conditional test with every RNG call. Any other alternative is probably going to be slower than the branch anyway...
...unless you know exactly what the internal structure of the RNG is. Particularly, what are its return values? If they're not IEEE-754 floats or doubles, you have my sympathies. If they are, how many real bits of randomness are in them? You would expect 24 for floats and 53 for doubles (the number of mantissa bits). If those are naively generated, you may be able to use shifts and masks to hack together a plain old random integer generator out of them, and then use that in your function (depending on the size of your range, you may be able to use more shifts and masks to avoid any branching if you have such a generator). If you have a high-quality generator that produces full quality 24- or 53-bit random numbers, then with a single multiply you can convert them from [0,1] to [0,1): just multiply by the largest generatable floating-point number that is less than 1, and your range problem is gone. This trick will still work if the mantissas aren't fully populated with random bits, but you'll need to do a bit more work to find the right multiplier.
You may want to look at the C source to the Mersenne Twister to see their treatment of similar problems.
I don't see why the + 1 is needed. If the random number generator delivers a uniform distribution of values in the [0,1] interval then...
result = lowest + (rng() * (highest - lowest))
should give you a unform distribution of values between lowest
rng() == 0, result = lowest + 0 = lowest
and highest
rng() == 1, result = lowest + highest - lowest = highest
Including + 1 means that the upper bound on the generated number can be above highest
rng() == 1, result = lowest + highest - lowest + 1 = highest + 1.
The resulting distribution of values will be identical to the distribution of the random numbers, so uniformity depends on the quality of your random number generator.
Following on from your comment below you are right to point out that Int() will be the source of a lop-sided distribution at the tails. Better to use Round() to the nearest integer or whatever equivalent you have in your scripting language.