Cast some light on population count algorithm - binary

I looked for good methods to do popcount (count of set bits). I found this one, here
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for(c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
Trying on a few examples, its true that it works. What property of binary operations / representation makes it work?
Could you hint at some further readings on popcount and binary representation?

You're starting off with an initial v that initially has n bits in it set.
The point of the game is to have 1 less bit to count at each iteration of the loop. That way, we can just count the number of iterations of the loop that were necessary before we got to the point where n = 0 to figure out the value of the initial n.
Notice that, if n = 0, then v = 0, and so the loop will stop at this point. But so long as v > 0, we'll run the body of the loop at least once. At each iteration, we end up with a v that has 1 fewer bit set.
Here's why. The first property we need is that v && v == v. Now v is a sequence of bits (the exact number of bits depends on your machine / OS), that you can order from most significant to least significant. When you decrement v, we can note the following:
the least significant bit, call it v[k], that is set to 1 will be set to 0;
all bits more significant than v[k] will not change when you decrement v.
Therefore, ANDing v with its decrement will preserve all the more significant bits, but set v[k] to 0. And by definition, all bits that are less significant than v[k], ie v[k-1] ... v[0], are already 0 because v[k] is "the least significant bit that is 1". Therefore after ANDing all the less significant bits will remain at 0. The upshot is that v && (v - 1) contains one less bit set to 1 than v has.

Subtracting a 1 bit from a 0 bit turns that bit into a 1 and causes a borrow from the next bit to the left, resulting in a subtract 1 there as well. This continues cascading to the left until you reach a 1 bit, where subtracting 1 from 1 is 0. At this point the subtraction is finished. You've converted all the 0 into 1 up to the first set bit, and converted that bit from 1 to 0.
When you and the before and after values, the before has zeros to the right of the first bit and the after has a zero at that bit. Since anything anded with zero is zero, you keep all the zeros from the original value and set the single bit to zero as well.

Related

Theory behind multiplying two numbers without operands

I have been reading a Elements of Programming Interview and am struggling to understand the passage below:
"The algorithm taught in grade-school for decimal multiplication does
not use repeated addition- it uses shift and add to achieve a much
better time complexity. We can do the same with binary numbers- to
multiply x and y we initialize the result to 0 and iterate through the
bits of x, adding (2^k)y to the result if the kth bit of x is 1.
The value (2^k)y can be computed by left-shifting y by k. Since we
cannot use add directly, we must implement it. We can apply the
grade-school algorithm for addition to the binary case, i.e, compute
the sum bit-by-bit and "rippling" the carry along.
As an example, we show how to multiply 13 = (1101) and 9 = (1001)
using the algorithm described above. In the first iteration, since
the LSB of 13 is 1, we set the result to (1001). The second bit of
(1101) is 0, so we move on the third bit. The bit is 1, so we shift
(1001) to the left by 2 to obtain (1001001), which we add to (1001) to
get (101101). The forth and final bit of (1101) is 1, so we shift
(1001) to the left by 3 to obtain (1001000), which we add to (101101)
to get (1110101) = 117.
My Questions are:
What is the overall idea behind this, how is it a "bit-by-bit" addition
where does (2^k)y come from
what does it mean by "left-shifting y by k"
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
The algorithm relies on the way numbers are coded in binary.
Let A be an unsigned number. A is coded by a set of bits an-1an-2...a0 in such a way that A=∑i=0n-1ai×2i
Now, assume you have two numbers A and B coded in binary and you wand to compute A×B
B×A=B×∑i=0n-1ai×2i
=∑i=0n-1B×ai×2i
ai is equal to 0 or 1. If ai=0, the sum will not be modified. If ai=1, we need to add B×ai
So, we can simply deduce the multiplication algorithm
result=0
for i in 0 to n-1
if a[i]=1 // assumes a[i] is the ith bit
result = result + B * 2^i
end
end
What is the overall idea behind this, how is it a "bit-by-bit" addition
It is just an application of the previous method where you process successively every bit of the multiplicator
where does (2^k)y come from
As mentioned above from the way binary numbers are coded. If ith bit is set, then there is a 2i in the decomposition of the number.
what does it mean by "left-shifting y by k"
Left shift means "pushing" the bits leftwards and filling the "holes" with zeroes. Hence if number is 1101 and it is left shifted by three, it becomes 1101000.
This is the way to multiply the number by 2i (just as when "left shifting" by 2 a decimal number and putting zeroes at the right places is the way to multiply by 100=102)
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
Because there is a 1 at right most position, that corresponds to 20. So we left shift by 0 and add it to the result that is initialized to 0.

How to know when a float variable is going to stop increasing by 0.001?

I want to know how to determine at wich value a float (or double) variable is going to stop increasing its value if I am increasing it by 0.001.
If we talk about the binary representation of the float value: 1 bit for Sign, 8 exponent bits and 23 bits for mantissa. We know that when we reach a determined high value (32768) and then we add a very small value (0.001), due to the EXC 127 representation of the exponent, the addition result will be:
32768 + 0 = 32768
According to that, the variable will have the same value eventhough we are adding 0.001.
The next code never breaks.
float max =100000;
float delta=0.001F;
float time = 0;
while (time < max)
{
time += delta;
if(time == max)
break;
}
Can someone help me to determine an ecuation to know when a variable is going to stop increasing? (Independently if it is a float or a double, the idea is to have a floating comma variable).
Your addition will become idempotent (that is, the result will not change) after time gets large enough that the size of its ULP (unit in the last place) is greater than the size of your delta.
Your time variable is by default greater than max variable.
It's REALLY simple:
time is never gonna equal max if time starts off with 215100 and max is 100000 as long as you are adding some positive number to time. Also, comparing floats is kinda problematic due to float-imprecision.
To answer your question for an equation:
addition will fail completely if
[(log a)/(log 2)]<[((log b)/(log 2))^-c]
Where
a is your small float you want to add
b is your large fliat you want to add on
c is the length of the mantissa (23 for float)

complexity calculation and logic concept

I'm trying to findthe complexity of the following code, and I don't know if I'm using my logic right, please correct me if I made a mistake
1)
For a = 1 to N
j = v
j = j / 2
k = i
While k >= 1
do some kind of processing
k = k / 2 // integer division
2)
For i = 1 to N
d = d / 2 // integer division
k = i
While k >= 1
k = k-1
This one should also be N * log N?
3)
For i = 1 to N functiontwo(x)
call functiontwo(i) if (x <= 0)
return some value
This one should be also n * log N, or am I wrong, because it is calling function two, and function two is log n?
Please let me know if I did the right way or give advice on figure out the loop logic better, thank you.
(Disclaimer: I haven't done these in a while, but since nobody else has jumped in yet, my two cents are hopefully better than nothing.)
I believe your logic is sound on #1. The i loop should be O(N), and the j and k loops appear to be O(logN), making the overall O(NlogN).
I question your conclusion on #2, though. Since k is decremented by one instead of divided, it seems to me that the k loop would be O(N), making for O(N^2) overall.
Hmmm...#3 is weird. I see why your first thought is O(NlogN). The division ordinarily would make it analagous to #1. Except... the first argument sent to functiontwo will be a positive value from the i loop. Since x > 0, it will then call functiontwo with half the original argument, which is still positive. Which will happen again, and again, etc. The mathematician in me starts to think that will never end. But I suppose one could argue that eventually you will reach the limit of the precision of your numeric data type and eventually have the result of x/2 be so close to zero that the computer counts it as zero. In that case, I imagine O(NlogN) would be accurate.
BTW, my answer for #3 is assuming that the x/2 is not integer division since you specified it for the others, but not for this one.

Three-way xor-like function

I'm trying to solve the following puzzle:
Given a stream of numbers (only 1 iteration over them is allowed) in which all numbers appear 3 times, but 1 number appear only 2 times, find this number, using O(1) memory.
I started with the idea that, if all numbers appeared 2 times, and 1 number only once, I could use xor operation between all numbers and the result would be the incognito number.
So I want to extend this idea to solve the puzzle. All I need is a xor-like function (or operator), which would yield 0 on the third apply:
SEED xor3 X xor3 X xor3 X = SEED
X xor3 Y xor3 SEED xor3 X xor3 Y xor3 Y xor3 X = SEED
Any ideas for such a function?
Regard XOR as summation on each bit of a number expressed in binary (i.e. a radix of 2), modulo 2.
Now consider a numerical system consisting of tribits 0, 1, and 2. That is, it has a radix of 3.
The operator T now becomes an operation on any number, decomposed into this radix. As in XOR, you sum the bits, but the difference is that operator T is ran in modulo 3.
You can easily show that a T a T a is zero for any a. You can also show that T is both commutative and associative. That is necessary since, in general, your sequence will have the numbers jumbled up.
Now apply this to your list of numbers. At the end of the operation, the output will be b where b = o T o and o is the number that occurs exactly twice.
Your solution for the simpler case (all number appear twice, one number appears once) works since xor operates on each bit x as
x xor x = 0 and 0 xor x = x
xor is basically a bit-wise summation modulus 2. You would need the base-3 equivalent: Transform each number into a base-3 representation. And then use summation modulus 3 for each decimal:
0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
Call this operation xor3. Now you have for each decimal x:
x xor3 x xor3 x = 0 and 0 xor3 x = x
If you apply that to all your numbers then all values that appear 3 times will vanish. The result is x xor3 x of the number x that appears twice. You need to apply decimal-wise division by 2 modulus 3.
I believe there are more efficient ways to implement that. The advantage of the xor function in the first case relies on the fact that xor is a natural base-2 operation. Is there any practical application for that?
This approach is a bit fragile: If the precondition (all numbers appear 3 times except one that appears twice) breaks the algorithm will not help you.
Take a Map with int-keys and int-values. Then walk through your numbers and for each number x increase each the according value. If x is a new key take 0 as start value.
Then you can analyze it easily: Walk through all keys and check the cardinality. It should be three for all keys except one that should be two. This is more robust and my gut feeling says it is also faster.

Evenly distributed random numbers relatively prime to 2

A specific example
I need to generate a random number between 0 and 2, inclusive. (or choose randomly between -1, 0, and 1).
The naive approach would be to do something like rand() mod 3 where rand() returns an integer. This approach will not generate statistically random numbers unless the upper bound of rand() is not relatively prime (and the lower bound is 0).
For instance, assuming rand() returned 2 bits (from 0 to 3, inclusive), the modulus would map:
0 -> 0
1 -> 1
2 -> 2
3 -> 0
This skew toward 0 would obviously be much less if more bits would be returned, but regardless, the skew would remain.
The generic question
Is there a way of generating an evenly distributed random number between 0 and n-1, inclusive, where n is relatively prime to 2?
A common approach is to discard random values above the last full cycle, and just ask for a new random number.
It might help choosing your rand() upper bound to be k*n where k is an integer. This way the outcome will be evenly distributed provided that rand() is a good random generator.
If it's not possible to reduce the upper bound, you can pick k so that k*n is as close to rand() upper bound as possible and discard the results above this number trying again.
See my answer to a similar question.
Basically, use your RNG and discard everything above N and try again. For optimization, you can use mod, and discard everything above n * floor(MAX / n)
Generic Answer: You need to use more than just 2 bits of the number.
My rule of thumb is to generate floating-point values, x, 0.0 <= x < 1.0, multiply by 3 and truncate. That should get you values in the range 0, 1 and 2 that depend on a larger number of bits.