Meaning of removing first digit from a binary number? - binary

If you have a binary number say 1010 (which is 10 in base 10), is saying that dividing by two will remove the first digit (making it end up as 010), true?
Basically how do you remove the first digit (i.e. if the binary number is 0 or 1, then it will end up as nothing)? I don't want code or anything, I just want to know like something like you divide or multiply by two.
Also do not consider any of the left most zeroes, of a binary number.

It works the same way as it does in base ten. The number 401, without its first digit, is 1. You've subtracted 400, no? Now, to divide by ten, you would SHIFT the digits to the right. 401 shifted right is 040. 401/10 = 40. Note that the 1 is discarded because we're working with integer division.
So in binary, it's exactly the same, but with powers of 2. Removing the first bit does not DIVIDE by two. It SUBTRACTS the value of its position. So 101b (which is 4+1 = 5), without its largest bit, is 001b, or 1 decimal. It's subtraction: 5 - 4 = 1.
To divide by two, you shift the bits to the right, just like in base 10. So 101b would become 010b, which is 2 decimal. 5/2 == 2 (we're dropping the fractional part since it's integer division)
Make sense? If you're ever confused about binary, just think of how the digits & positions work in base ten, and instead of powers of ten, use powers of two.

If by "first digit" you mean "first significant digit", then what you're looking for is something like number and not (1 shl (int(log number / log 2))), where and and not are the bitwise operations, shl means shift left, and int is rounding down (never up) to integer. log is just a logarithm, in any base (same base for both cases).
If by "first digit" you mean the digit in some nth position (let the rightmost position be 0, counting to the left), then you just do number and not (1 shl position).

Removing a digit is like changing it to 0. Changing 1010 to 0010 is accomplished by subtracting 1000: 1010 - 1000 = 0010.

Related

Theory behind multiplying two numbers without operands

I have been reading a Elements of Programming Interview and am struggling to understand the passage below:
"The algorithm taught in grade-school for decimal multiplication does
not use repeated addition- it uses shift and add to achieve a much
better time complexity. We can do the same with binary numbers- to
multiply x and y we initialize the result to 0 and iterate through the
bits of x, adding (2^k)y to the result if the kth bit of x is 1.
The value (2^k)y can be computed by left-shifting y by k. Since we
cannot use add directly, we must implement it. We can apply the
grade-school algorithm for addition to the binary case, i.e, compute
the sum bit-by-bit and "rippling" the carry along.
As an example, we show how to multiply 13 = (1101) and 9 = (1001)
using the algorithm described above. In the first iteration, since
the LSB of 13 is 1, we set the result to (1001). The second bit of
(1101) is 0, so we move on the third bit. The bit is 1, so we shift
(1001) to the left by 2 to obtain (1001001), which we add to (1001) to
get (101101). The forth and final bit of (1101) is 1, so we shift
(1001) to the left by 3 to obtain (1001000), which we add to (101101)
to get (1110101) = 117.
My Questions are:
What is the overall idea behind this, how is it a "bit-by-bit" addition
where does (2^k)y come from
what does it mean by "left-shifting y by k"
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
The algorithm relies on the way numbers are coded in binary.
Let A be an unsigned number. A is coded by a set of bits an-1an-2...a0 in such a way that A=∑i=0n-1ai×2i
Now, assume you have two numbers A and B coded in binary and you wand to compute A×B
B×A=B×∑i=0n-1ai×2i
=∑i=0n-1B×ai×2i
ai is equal to 0 or 1. If ai=0, the sum will not be modified. If ai=1, we need to add B×ai
So, we can simply deduce the multiplication algorithm
result=0
for i in 0 to n-1
if a[i]=1 // assumes a[i] is the ith bit
result = result + B * 2^i
end
end
What is the overall idea behind this, how is it a "bit-by-bit" addition
It is just an application of the previous method where you process successively every bit of the multiplicator
where does (2^k)y come from
As mentioned above from the way binary numbers are coded. If ith bit is set, then there is a 2i in the decomposition of the number.
what does it mean by "left-shifting y by k"
Left shift means "pushing" the bits leftwards and filling the "holes" with zeroes. Hence if number is 1101 and it is left shifted by three, it becomes 1101000.
This is the way to multiply the number by 2i (just as when "left shifting" by 2 a decimal number and putting zeroes at the right places is the way to multiply by 100=102)
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
Because there is a 1 at right most position, that corresponds to 20. So we left shift by 0 and add it to the result that is initialized to 0.

how to convert negative fraction decimal to binary

I would like to ask about how to represent negative fraction decimal numbers to signed binary. for example what is -0.5 in binary.
First you can consider the negative number as positive. After finding the decimal and fraction parts seperately you can convert it to negative by making all 1s to 0s and 0s to 1s. Finally you need to add 1.
For example to convert -5.75 to binary first we need to decide the position of the point and the length of the binary number. Let's define the lenght of the number as 8 bits, 4 bits for decimal and 4 bits for fraction parts.
First find the binary representation of (positive) 5.75:
5=0101
0.75=.1100 then
5.75=0101.1100 in 8 bits
To convert negative, procedure is standard, invert all bits and add 1 to the least significand digit.
number: 0101.1100
inversion: 1010.0011
addition of 1: 1010.0011 + 0.0001=1010.0100
Let's check if our repsresentation is correct or not:
1010.0100 => -8 + 2 + 0.25 = -5.75
That's all.
I hope you find this helpful.
The below deals with converting decimal to binary. If you want to make this negative, then just add a minus sign after you are done. (e.g. convert 25.5 to binary 11001.1, so -25.5 is -11001.1)
If you want to deal with signed binary (something different) then go to the source I linked and read more about that. The source has an online calculator for you to use.
Source: http://www.realbinaryconverter.com/index.php
Converting Decimal Fractions to Binary Fractions
Taking a number like 436.656625 and trying to convert to binary is an easy process. Let's not concern ourselves here with converting the whole integer part (436) because that has been covered at the linked site. All we have to do now is convert the binary fraction part (.656625) and then combine it with the binary whole integer part.
Multiply 0.656625 by 2: 0.656625*2=1.31325
Keep note of that 1 from the whole part
Remove the 1. Now we have 0.31325
Repeat step 2: multiply by 2: 0.31325*2=0.6265
Keep note of that 0 from the whole part
Multiply by 2: 0.6265*2=1.253
Keep note of that 1 from the whole part
Remove the 1. Now we have 0.253
Multiply by 2: 0.253*2=0.506
Keep note of that 0 from the whole part
Multiply by 2: 0.506*2=1.012
Keep note of that 1 from the whole part
Remove the 1. Now we have 0.012
Repeat this process of: multiplying by two and noting the resulting whole integer part. If the whole integer part is 1, note the one then remove the one and continue this process.
...
In the end, our binary fraction comes out as .10101000000110... (and it goes on). I will now demonstrate with a simpler result. Converting 0.8125 to binary.
0.8125*2=1.625, we get 1
0.625*2=1.25, we get 1
0.25*2=0.5, we get 0
0.5*2=1.0, we get 1
Nothing left!
Now take the numbers we got and place them after the decimal point in the order we got them. 0.8125 = 0.1101 in binary.
Now to answer your specific question
Multiply 0.5 by 2: 0.5*2=1.0
Keep note of that 1 from the whole part
Remove the 1. Now we have 0
Nothing left!
Now we take the numbers we got and place them after the decimal point in the order we got them. 0.5 = 0.1 in binary.
So -0.5 = -0.1 in binary. You can check this at the source provided.
Just like decimal, digits to the left of the decimal point are rising powers of the base, and digits to the right are reducing powers.
So 124.71 is:
10^2 10^1 10^0 10^-1 10^-2
100 + 20 + 4 + 7/10 + 1/100
With binary, it's the same, except the rising powers are 1, 2, 4, 8, ... rather than 1, 10, 100, 1000, ....
And the reducing powers aren't 1/10, 1/100, 1/1000, ... but 1/2, 1/4, 1/8, ....
So -0.5 is an easy one since it consists of only the 1/2 place, in binary:
-0.1: - means it will be negative.
0 x 2^0(1) = 0
1 x 2^-1(1/2) = 0.5
----
-0.5
More complex numbers are built by summing the individual places. For example, decimal 65.625 is:
1 x 2^6 = 64
0 x 2^5
0 x 2^4
0 x 2^3
0 x 2^2
0 x 2^1
1 x 2^0(1) = 1
.
1 x 2^-1 = 0.5
0 x 2^-2
1 x 2^-3 = 0.125
^ ------
| 65.625
|
+-> 1000001.101
It is a relatively simple operation to convert a negative decimal(base-10) number to binary(base-2).
I'll take -10 for example.
The first thing you should do is represent positive 10 in binary : 01010.
The next step is to flip all the bits, which means change the zero's to one's and vice versa; 10101.
The final step is to add 1 to 10101, which results into 10110.

signed arithmetic on multiword numbers?

Say I have a 4 bit ALU, I have a carry flag, overflow flag, and a sign flag(MSB). How would I go about subtracting for example, two signed 8 bit numbers? I take the lower nibble of both numbers and subtract them right, but I don't understand how to know if there needs to be a 5th bit, and carry that over to the LSB of the high nibble of the number, and if so, how to add it considering I am doing this in 2's complement so I already have Carryin being used.. Any help would be appreciated.
This has been asked and answered here many times.
You know the rule about twos complement yes? Invert and add one. Also from grade school
a + b = a + (-b).
We dont have subtract hardware we have add hardware. What you do is a + (-b). Also from grade school we learned about carrying, 9+3 = 2 carry the 1. And from the second column on we have either two or three operands that are added together (a + (-b) + c, c being the carry in). If you think about it we can have a carry in on every column, sometimes it is zero. That is how the hardware works, each column is three in two out, carry in[n], a[n], b[n] the output is result[n] and carry out[n]. and as we know from grade school the carry out of this column is the carry in of the next column. So for a normal add the carry in of the least significant bit is always a zero, but for subtract we want to invert and add one so what we do is invert b and change the carry in of that first bit to a 1 which is the same as
a + (~b) + 1 which equals a + (-b) which equals a - b.
As far as addition and subtract hardware is concerned there is no such thing as signed or unsigned add or subtract. There does exist an unsigned overflow (carry out of the msbit) and a signed overflow (true if carry in and carry out of the msbit are not the same, false if they match).
This works for any number of bits, for example if you have 8 bit hardware but want to do math on 256 bit numbers, just do them 8 bits at a time and apply the carry out to the next 8 bits (add with carry or subtract with borrow instruction). Visualize the single columns one at a time, 4 bits is just four of those columns, 8, 9 bits 37 bits, etc. You can easily take any of those larger numbers draw a vertical line anywhere separating it into two operations all you have to do is what you do for single columns the carry out of the msbit of the thing on the right becomes the carry in of the lsbit of the thing on the left of the dividing line. Apply this to 8 bit math with 4 bit hardware...
So a subtract is an add with a carry in of 1 and the second operand inverted. Now some hardware inverts the carry out (unsigned overflow) on a subtract so that it becomes 1 for borrow and 0 for not borrow (unsigned borrow/overflow). Some dont. So you have to know how this works if you dont have a subtract with borrow instruction. If you have a subtract with borrow it doesnt matter if they invert carry out they will generically invert carry in (on a subtract). If they dont then again they wont on a subtract with borrow. but if you have to use an add with carry to simulate a subtract with borrow you need to possibly not only invert the second operand but invert the carry bit. If you dont have an add with carry then you have to simulate that as well by simply adding 1 or not.

Calculate which numbers cause problems when converting decimal to floating point?

I know when converting some numbers from binary to floating point there can be problems.
For example: 1.4 results in 1.39999. 6.2 is 6.1999999, 6.6 is 6.599999, etc.
Is there a way to calculate which numbers will cause these problems? Like create some sort of spreadsheet or database with numbers from 1-50,000 which don't convert exactly?
The errors in floating point calculations are rooted in the way that floating point numbers are stored. Depending on which precision you are using (usually either single (float) or double-precision). Double-precision floats take more space, but are roughly twice as precise.
Floating point numbers are typically stored in an IEEE format and thus only the most significant bits are included.
In a string of bits stored in this way to represent a floating point number, there are different parts that comprise the number. First, there is a sign bit indicating whether the number is positive or negative. Next, the exponent portion of the number is stored (in single-precision floats, this takes 8 bits). After the exponent, all remaining bits represent the significant digits of the number. Thus, the bits 1 10000000 01000000000000000000000 represent -0.5 The sign bit (first one) signifies that the number is negative. The next eight bits are the exponent. Usually, the exponent is stored with a bias so that small numbers may be stored precisely in addition to large ones. To do this, the exponent used for calculations is -127 (if an 8 bit exponent) + the exponent bits interpreted as an integer (in our case 128). All remaining bits signify the significant digits of the number starting with the ones place and moving rightward from there we cut the value in half each time (1 is 1, but 01 is 1/2 and 001 is 1/4) in our case the total number is -0.5 (-1 for the sign bit * 2^1 for the exponent * 0.5 for the remaining bits)
For further example, here is a converter that uses checkboxes to indicate the bits. At worst, you could write a bot to test all of the combinations you care about.
In general, if it cannot be described with a fraction that is not directly made of combinations of powers of two, it will be rounded. (If you can't get it with 1/2 + 1/4 + 1/8 etc. it will probably be rounded)
Almost all numbers are troublesome. The minor exception are those, that when repeatedly multiplied by 2 get rid of the fractional part and end up being less than 2^24.
e.g.
0.125 -> 0.25 -> 0.5 -> 1.0 ! OK
6.4 --> 12.8 --> 25.6 -> 51.2 -> 102.4 : we got a cycle! (.4 .8 .6 .2 ... )
EDIT
Given the purpose/context of the issue, the inexactness seems to cause trouble when the floating point is rounded towards zero.
e.g.
10.2 == 10.199999809, when the next floating point would be
next == 10.200000763 <-- the difference to the wanted value is ~4 times higher
vs.
10.3 == 10.300000197, when the previous (rounded down fp would be)
prev == 10.299992370, <-- this would have been also ~4x further from away
Perhaps it's time to talk to the programmer of the CNC driver...

Binary substraction : 2's complement & carry

I want to substract 1 to the number in binary representation 1010 1101. I write the two s complement of 1: 1111 1111, and I sum with the first number:
bitwise addition, with carry, gives 1 1010 1100: because of carry, I end up with 1 bit more. how is this dealt with in binary addition?
also, I am right in the use of two's complement to do addition?
thanks.
That is an entirely valid and common way to do subtraction, but the 'carry' flag doesn't mean the same thing that it does for normal addition. Since instead of subtracting n, you're adding a large number, the carry flag needs to be handled differently. That extra 1 would usually signify a carry in bitwise addition, whereas here it signifies that everything worked out right. If there wasn't a carry there, it actually means that the result should have been negative - a - b was converted to a + 2^n - b which was less than 2^n, meaning that b > a and so a - b < 0. Either way, it doesn't matter as your result will show up correctly within the 8 bits of your result.