Three-way xor-like function - function

I'm trying to solve the following puzzle:
Given a stream of numbers (only 1 iteration over them is allowed) in which all numbers appear 3 times, but 1 number appear only 2 times, find this number, using O(1) memory.
I started with the idea that, if all numbers appeared 2 times, and 1 number only once, I could use xor operation between all numbers and the result would be the incognito number.
So I want to extend this idea to solve the puzzle. All I need is a xor-like function (or operator), which would yield 0 on the third apply:
SEED xor3 X xor3 X xor3 X = SEED
X xor3 Y xor3 SEED xor3 X xor3 Y xor3 Y xor3 X = SEED
Any ideas for such a function?

Regard XOR as summation on each bit of a number expressed in binary (i.e. a radix of 2), modulo 2.
Now consider a numerical system consisting of tribits 0, 1, and 2. That is, it has a radix of 3.
The operator T now becomes an operation on any number, decomposed into this radix. As in XOR, you sum the bits, but the difference is that operator T is ran in modulo 3.
You can easily show that a T a T a is zero for any a. You can also show that T is both commutative and associative. That is necessary since, in general, your sequence will have the numbers jumbled up.
Now apply this to your list of numbers. At the end of the operation, the output will be b where b = o T o and o is the number that occurs exactly twice.

Your solution for the simpler case (all number appear twice, one number appears once) works since xor operates on each bit x as
x xor x = 0 and 0 xor x = x
xor is basically a bit-wise summation modulus 2. You would need the base-3 equivalent: Transform each number into a base-3 representation. And then use summation modulus 3 for each decimal:
0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
Call this operation xor3. Now you have for each decimal x:
x xor3 x xor3 x = 0 and 0 xor3 x = x
If you apply that to all your numbers then all values that appear 3 times will vanish. The result is x xor3 x of the number x that appears twice. You need to apply decimal-wise division by 2 modulus 3.
I believe there are more efficient ways to implement that. The advantage of the xor function in the first case relies on the fact that xor is a natural base-2 operation. Is there any practical application for that?

This approach is a bit fragile: If the precondition (all numbers appear 3 times except one that appears twice) breaks the algorithm will not help you.
Take a Map with int-keys and int-values. Then walk through your numbers and for each number x increase each the according value. If x is a new key take 0 as start value.
Then you can analyze it easily: Walk through all keys and check the cardinality. It should be three for all keys except one that should be two. This is more robust and my gut feeling says it is also faster.

Related

Theory behind multiplying two numbers without operands

I have been reading a Elements of Programming Interview and am struggling to understand the passage below:
"The algorithm taught in grade-school for decimal multiplication does
not use repeated addition- it uses shift and add to achieve a much
better time complexity. We can do the same with binary numbers- to
multiply x and y we initialize the result to 0 and iterate through the
bits of x, adding (2^k)y to the result if the kth bit of x is 1.
The value (2^k)y can be computed by left-shifting y by k. Since we
cannot use add directly, we must implement it. We can apply the
grade-school algorithm for addition to the binary case, i.e, compute
the sum bit-by-bit and "rippling" the carry along.
As an example, we show how to multiply 13 = (1101) and 9 = (1001)
using the algorithm described above. In the first iteration, since
the LSB of 13 is 1, we set the result to (1001). The second bit of
(1101) is 0, so we move on the third bit. The bit is 1, so we shift
(1001) to the left by 2 to obtain (1001001), which we add to (1001) to
get (101101). The forth and final bit of (1101) is 1, so we shift
(1001) to the left by 3 to obtain (1001000), which we add to (101101)
to get (1110101) = 117.
My Questions are:
What is the overall idea behind this, how is it a "bit-by-bit" addition
where does (2^k)y come from
what does it mean by "left-shifting y by k"
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
The algorithm relies on the way numbers are coded in binary.
Let A be an unsigned number. A is coded by a set of bits an-1an-2...a0 in such a way that A=∑i=0n-1ai×2i
Now, assume you have two numbers A and B coded in binary and you wand to compute A×B
B×A=B×∑i=0n-1ai×2i
=∑i=0n-1B×ai×2i
ai is equal to 0 or 1. If ai=0, the sum will not be modified. If ai=1, we need to add B×ai
So, we can simply deduce the multiplication algorithm
result=0
for i in 0 to n-1
if a[i]=1 // assumes a[i] is the ith bit
result = result + B * 2^i
end
end
What is the overall idea behind this, how is it a "bit-by-bit" addition
It is just an application of the previous method where you process successively every bit of the multiplicator
where does (2^k)y come from
As mentioned above from the way binary numbers are coded. If ith bit is set, then there is a 2i in the decomposition of the number.
what does it mean by "left-shifting y by k"
Left shift means "pushing" the bits leftwards and filling the "holes" with zeroes. Hence if number is 1101 and it is left shifted by three, it becomes 1101000.
This is the way to multiply the number by 2i (just as when "left shifting" by 2 a decimal number and putting zeroes at the right places is the way to multiply by 100=102)
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
Because there is a 1 at right most position, that corresponds to 20. So we left shift by 0 and add it to the result that is initialized to 0.

How to map number in a range to another in the same range with no collisions?

Effectively what I'm looking for is a function f(x) that outputs into a range that is pre-defined. Calling f(f(x)) should be valid as well. The function should be cyclical, so calling f(f(...(x))) where the number of calls is equal to the size of the range should give you the original number, and f(x) should not be time dependent and will always give the same output.
While I can see that taking a list of all possible values and shuffling it would give me something close to what I want, I'd much prefer it if I could simply plug values into the function one at a time so that I do not have to compute the entire range all at once.
I've looked into Minimal Perfect Hash Functions but haven't been able to find one that doesn't use external libraries. I'm okay with using them, but would prefer to not do so.
If an actual range is necessary to help answer my question, I don't think it would need to be bigger than [0, 2^24-1], but the starting and ending values don't matter too much.
You might want to take a look at Linear Congruential Generator. You shall be looking at full period generator (say, m=224), which means parameters shall satisfy Hull-Dobell Theorem.
Calling f(f(x)) should be valid as well.
should work
the number of calls is equal to the size of the range should give you the original number
yes, for LCG with parameters satisfying Hull-Dobell Theorem you'll get full period covered once, and 'm+1' call shall put you back at where you started.
Period of such LCG is exactly equal to m
should not be time dependent and will always give the same output
LCG is O(1) algorithm and it is 100% reproducible
LCG is reversible as well, via extended Euclid algorithm, check Reversible pseudo-random sequence generator for details
Minimal perfect hash functions are overkill, all you've asked for is a function f that is,
bijective, and
"cyclical" (ie fN=f)
For a permutation to be cyclical in that way, its order must divide N (or be N but in a way that's just a special case of dividing N). Which in turn means the LCM of the orders of the sub-cycles must divide N. One way to do that is to just have one "sub"-cycle of order N. For power of two N, it's also really easy to have lots of small cycles of some other power-of-two order. General permutations do not necessarily satisfy the cycle-requirement, of course they are bijective but the LCM of the orders of the sub-cycles may exceed N.
In the following I will leave all reduction modulo N implicit. Without loss of generality I will assume the range starts at 0 and goes up to N-1, where N is the size of the range.
The only thing I can immediately think of for general N is f(x) = x + c where gcd(c, N) == 1. The GCD condition ensures there is only one cycle, which necessarily has order N.
For power-of-two N I have more inspiration:
f(x) = cx where c is odd. Bijective because gcd(c, N) == 1 so c has a modular multiplicative inverse. Also cN=1, because φ(N)=N/2 (since N is a power of two) so cφ(N)=1 (Euler's theorem).
f(x) = x XOR c where c < N. Trivially bijective and trivially cycles with a period of 2, which divides N.
f(x) = clmul(x, c) where c is odd and clmul is carry-less multiplication. Bijective because any odd c has a carry-less multiplicative inverse. Has some power-of-two cycle length (less than N) so it divides N. I don't know why though. This is a weird one, but it has decent special cases such as x ^ (x << k). By symmetry, the "mirrored" version also works.
Eg x ^ (x >> k).
f(x) = x >>> k where >>> is bit-rotation. Obviously bijective, and fN(x) = x >>> Nk, where Nk mod N = 0 so it rotates all the way back to the unrotated position regardless of what k is.

Reverse function

I have been trying to reverse a quite simple looking function.
the function is presented in assembly:
(Argument is loaded into AX)
AND AX, 0xFFFE (round down to even number)
MUL AX (Multiply AX by AX ; the result is represented as DX:AX)
XOR AX,DX
The function can be described as: H(X) = F(X & 0xFFFE); F(X) = ((X * X) mod 2^16) xor ((X * X) div 2^16)
Calculated all of the values from 1 to 2^16 and plotted on matlab in order to "see" some function.
Can anyone help me find an answer to this? (when given y what is the argument x).
It might be that for some values there is more than one answer, so narrowing it down is my goal.
Thanks,
Or.
It's a hash function.
You can't reverse a hash function, because the whole point of it is that it's a one way function.
The multiply is clearly reversible, it's the xor that's not. By combining the low and high part of the multiplication you lose information.
As you can see in the plot there are some white spaces, because there are 2^16 spaces in that plot that means there are also different input values that hash to the same value.
This is common in a hash function.
The only way to 'reverse' it is to build a lookup table that translates output values into possible input values. However you will find that for every output values that be 1 or more input values.
An even number x an even number is always a multiple of 4.
So the low 2 bits are always 0, ergo the low 2 bits of the result are bits 16+17 of the multiplication.
Bits 2..15 are a mix of bits 2..15 xor bits 18..31.
A quick simulation shows 24350 unique outputs ergo on average 1.34 0.34 duplicates for every input value, not bad.
The maximum number of collisions is 6, but most numbers don't collide.
For all those numbers that don't collide you can uniquely lookup your input value in the lookup table (all this disregarding odd input values obviously).

Simulating a roll with a biased dice

I did a search but didn't really get any proper hits. Maybe I used incorrect terms?
What I want to ask about is an algorithm for simulating a biased role rather than a standard supposedly-random roll.
It wouldn't be a problem if you can't give me exact answers (maybe the explanation is lengthy?) but I would appreciate &pointers to material I can read about it.
What I have in mind is to for example, shift the bias towards the 5, 6 area so that the numbers rolls would have a higher chances of getting a 5 or a 6; that's the sort of problem I'm trying to solve.
[Update]
Upon further thought and by inspecting some of the answers, I've realized that what I want to achieve is really the Roulette Wheel Selection operator that's used in genetic algorithms since having a larger sector means increasing the odds the ball will land there. Am I correct with this line of thought?
In general, if your probabilities are {p1,p2, ...,p6}, construct the following helper list:
{a1, a2, ... a5} = { p1, p1+p2, p1+p2+p3, p1+p2+p3+p4, p1+p2+p3+p4+p5}
Now get a random number X in [0,1]
If
X <= a1 choose 1 as outcome
a1 < X <= a2 choose 2 as outcome
a2 < X <= a3 choose 3 as outcome
a3 < X <= a4 choose 4 as outcome
a4 < X <= a5 choose 5 as outcome
a5 < X choose 6 as outcome
Or, more efficient pseudocode
if X > a5 then N=6
elseif X > a4 then N=5
elseif X > a3 then N=4
elseif X > a2 then N=3
elseif X > a1 then N=2
else N=1
Edit
This is equivalent to the roulette wheel selection you mention in your question update as shown in this picture:
Let's say the die is biased towards a 3.
Instead of picking a random entry from an array 1..6 with 6 entries, pick a random entry from an array 1..6, 3, 3. (8 entries).
Make a 2 dimensional array of possible values and their weights. Sum up all the weights. Randomly choose a value on the range of 0 to the sum of the weights.
Now iterate through the array while keeping an accumulator of the weights seen so far. Once this value exceeds your random number, pick the value of the die represented here.
Hope this helps
Hmm. Say you want to have a 1/2 chance of getting a six, and a 1/10 chance of getting any other face. To simulate this, you could generate a random integer n in [1, 2, ... , 10] , and the outcome would map to six if n is in [6, 7, 8, 9, 10] and map to n otherwise.
One way that's usually fairly easy is to start with a random number in an expanded range, and break that range up into unequal pieces.
For example, with a perfectly even (six-sided) die, each number should come up 1/6th of the time. Let's assume you decide on round percentages -- all the other numbers should come up 16 percent of the time, but 2 should come up 17 percent of the time.
You could do that by generating numbers from 1 to 100. If the number is from 1 to 16, it comes out as a 1. If it's from 17 to 34, it comes out as a 2. If it's from 34 to 50, it comes out as a 3 (and the rest are blocks of 16 apiece).

Evenly distributed random numbers relatively prime to 2

A specific example
I need to generate a random number between 0 and 2, inclusive. (or choose randomly between -1, 0, and 1).
The naive approach would be to do something like rand() mod 3 where rand() returns an integer. This approach will not generate statistically random numbers unless the upper bound of rand() is not relatively prime (and the lower bound is 0).
For instance, assuming rand() returned 2 bits (from 0 to 3, inclusive), the modulus would map:
0 -> 0
1 -> 1
2 -> 2
3 -> 0
This skew toward 0 would obviously be much less if more bits would be returned, but regardless, the skew would remain.
The generic question
Is there a way of generating an evenly distributed random number between 0 and n-1, inclusive, where n is relatively prime to 2?
A common approach is to discard random values above the last full cycle, and just ask for a new random number.
It might help choosing your rand() upper bound to be k*n where k is an integer. This way the outcome will be evenly distributed provided that rand() is a good random generator.
If it's not possible to reduce the upper bound, you can pick k so that k*n is as close to rand() upper bound as possible and discard the results above this number trying again.
See my answer to a similar question.
Basically, use your RNG and discard everything above N and try again. For optimization, you can use mod, and discard everything above n * floor(MAX / n)
Generic Answer: You need to use more than just 2 bits of the number.
My rule of thumb is to generate floating-point values, x, 0.0 <= x < 1.0, multiply by 3 and truncate. That should get you values in the range 0, 1 and 2 that depend on a larger number of bits.