Least Significant Bit if there is a binary point - binary

If I have eight binary positions for instance:
7 6 5 4 3 2 1 0
its clear what the MSB and LSB is - the Most Significant Bit is the leftmost position (7) and Least Significant Bit the rightmost position (0)
But what if I have a binary.point (that is a fraction speaking decimal) i.e
7 6 5 4 3 2 1 0 -1 -2 -3
What could one here say is the most significant bit - is it the bit closest to the binary point or is it at position -3?
Its regarding floating point numbers

well, in any case it is the "last digit"
From a conceptual point of view, you can just think with decimal numbers :
Which digit is the least interesting, that is the least important if it changes, in :
123.45 $
clearly if i change the 5 i'm changing the "cents"
, which are the least important.
As in 12,345 $ , the least important is still the "5"
Though in 12,345.00 $ , the least important is the last 0.
You're talking about decimal points.
You mean fixed representation ? Or floating point number ?
If it is a fixed representation, then how do you know where is the decimal point ? Is it a convention somewhere ?
because both 123.45 and 12,345 could be, in decimal, represented by "12345". In the first case assuming that we have integers, in the second assuming that we have two-decimal-digits numbers.
If you talk about using floating points, then the notion of "least significant digit" would mean the least siginificant bit of the mantissa
but just talking about the "least significant bit" of a sequence of bit, whatever it means, is always the "last" one (or "first" one depending on conventions).
EDIT
For floating point numbers, then you have to remember that it's a complex thing.
I mean complex in the sense that you have several "numbers" packed into the representation in bits.
To take the "usual" 64 floating point (see http://en.wikipedia.org/wiki/Double-precision_floating-point_format)
you have the first bit for sign, the next 11 are an integer that is the "exponent", and the remaining one are the "mantissa" ("fractional part") (thought as 1.something_meant_by_your_mantissa_bits )
The last (64th) bit is the 52th (last) bit of the mantissa, thus the least significant bit of the corresponding number. But the 12th bit is also a last bit, the last-one of the 11-bit exponent.
I'll try to make again a example with decimal digits (it is a bit simpler and different from the IEEE "double precision")
Let's say we write 8 decimal digits numbers with 5-digits "mantissa" and 3-digits exponent.
in "1.2345 E 678" (meaning 1.2345 times ten to 678)
the "5" is the "least important digit".
If we "pack" the number like the IEEE double, "3digit-exponent then 5digit-mantissa" we would have
67812345
so the last digit of the big block is actually the least significant bit of the number.
But if we had another convention of packing, for example if the floating point number is supposed to be "5digit-mantissa then 3digit-exponent"
that is
12345678
and in our 8-digit the least significant one is the actually the 5th and not the last (8th) one.
then the least significant digit is not the last one.
It strongly depends on your convention of what your numbers (bits) mean.
Usually we talk about least significant bits in Bytes or Words that have meaning like memory address or just rough group of bits. Thus, you can think of them as integer numbers in anyway.
For the convenience, I guess the IEEE standard for floating point numbers put the mantissa at the end of the group so that the last bit, that is the least significant bit in the number, corresponds to the usual programming sense of "least significant bit of the group seen as a big pack of bits".

Related

Representing numbers with IEEE754 with Round to Nearest Even

I'm currently learning about IEEE754 standard and rounding, and I have an exercise which is the following:
Add -325.875 to 0.546875 in IEEE754, but with 3 bits dedicated to the mantissa instead of 23.
I'm having a lot of trouble doing this, especially representing the intermediary values, and the guard/round/sticky bits. Can someone give me a step-by-step solution, to the problem?
My biggest problem is that obviously I can't represent 0.546875 as 0.100011 as that would have more precision than the system has. So how would that be represented?
Apologies if the wording is confusing.
Preliminaries
The preferred term for the fraction portion of a floating-point number is “significand,” not “mantissa.” “Mantissa” is an old word for the fraction portion of a logarithm. Mantissas are logarithmic; adding to the mantissa multiplies the number represented. Significands are linear; adding to the significand adds to the number represented (as scaled by the exponent).
When working with a significand, use its mathematical precision, not the number of bits in the storage format. The IEEE-754 binary32 format has 23 bits in its primary field for the encoding of a significand, but another bit is encoded via the exponent field. Mathematically, numbers in the binary32 format behave as if they have 24 bits in their significands.
So, the task is to work with numbers with four bits in their significands, not three.
Work
In binary, −325.875 is −101000101.1112•2. In scientific notation, that is −1.010001011112•28. Rounding it to four bits in the significand gives −1.0102•28.
In binary, 0.546875 is .1000112. In scientific notation, that is 1.000112•2−1. Rounding it to four bits in the significand gives 1.0012•2−1. Note that the first four bits are 1000, but they are immediately followed by 11, so we round up. 1.00011 is closer to 1.001 than it is to 1.000.
So, in a floating-point format with four-bit significands, we want to add −1.0102•28 and 1.0012•2−1. If we adjust the latter number to have the same exponent as the former, we have −1.0102•28 and 0.0000000010012•28. To add those, we note the signs are different, so we want to subtract the magnitudes. It may help to line up the digits as we were taught in elementary school:
1.010000000000
0.000000001001
——————————————
1.001111110111
Thus, the mathematical result would be −1.0011111101112•28. However, we need to round the significand to four bits. The first four bits are 1001, but they are followed by 11, so we round up, producing 1010. So the final result is −1.0102•28.
−1.0102•28 is −1.25•28 = −320.

How does exponent bias make comparison easier

I'm reading this article about exponent bias in floating point numbers and it says the following:
n IEEE 754 floating point numbers, the exponent is biased in the
engineering sense of the word – the value stored is offset from the
actual value by the exponent bias. Biasing is done because exponents
have to be signed values in order to be able to represent both tiny
and huge values, but two's complement, the usual representation for
signed values, would make comparison harder. To solve this problem the
exponent is biased before being stored, by adjusting its value to put
it within an unsigned range suitable for comparison. By arranging the
fields so that the sign bit is in the most significant bit position,
the biased exponent in the middle, then the mantissa in the least
significant bits, the resulting value will be ordered properly,
whether it's interpreted as a floating point or integer value. This
allows high speed comparisons of floating point numbers using fixed
point hardware.
I've also found this explanation from wikipedia's article about offset binary:
This has the consequence that the "zero" value is represented by a 1
in the most significant bit and zero in all other bits, and in general
the effect is conveniently the same as using two's complement except
that the most significant bit is inverted. It also has the consequence
that in a logical comparison operation, one gets the same result as
with a two's complement numerical comparison operation, whereas, in
two's complement notation a logical comparison will agree with two's
complement numerical comparison operation if and only if the numbers
being compared have the same sign. Otherwise the sense of the
comparison will be inverted, with all negative values being taken as
being larger than all positive values.
I don't really understand what kind of comparison they are talking about here. Can someone please explain using a simple example?
'Comparison' here refers to the usual comparison of numbers by size: 5 > 4, etc. Suppose floating-point numbers were stored with as
[sign bit] [unbiased exponent] [mantissa]
For example, if the exponent is a 2's complement 3-bit binary number and the mantissa is a 4-bit unsigned binary number, you'd have
1 010 1001 = 4.5
1 110 0111 = 0.21875
You can see that the first is bigger than the second, but to figure this out, the computer would have to calculate 1.001 x 2^2 and 0.111 x 2^(-2) and then compare the resulting floating-point numbers. This is already complex with floating-point hardware, and if there is no such hardware for this computer, then...
So the number is stored as
[sign bit] [biased exponent] [mantissa]
Using the same 3-bit binary number for the exponent (this time biased; see a related question) and unsigned 4-bit mantissa, we have
1 101 1001 = 4.5
1 001 0111 = 0.21875
But now comparison is very easy! You can treat the two numbers as integers 11011001 and 10010111 and see that the first is obviously bigger: obvious even to a computer, as integer comparisons are easy. This is why biased exponents are used.

Calculate which numbers cause problems when converting decimal to floating point?

I know when converting some numbers from binary to floating point there can be problems.
For example: 1.4 results in 1.39999. 6.2 is 6.1999999, 6.6 is 6.599999, etc.
Is there a way to calculate which numbers will cause these problems? Like create some sort of spreadsheet or database with numbers from 1-50,000 which don't convert exactly?
The errors in floating point calculations are rooted in the way that floating point numbers are stored. Depending on which precision you are using (usually either single (float) or double-precision). Double-precision floats take more space, but are roughly twice as precise.
Floating point numbers are typically stored in an IEEE format and thus only the most significant bits are included.
In a string of bits stored in this way to represent a floating point number, there are different parts that comprise the number. First, there is a sign bit indicating whether the number is positive or negative. Next, the exponent portion of the number is stored (in single-precision floats, this takes 8 bits). After the exponent, all remaining bits represent the significant digits of the number. Thus, the bits 1 10000000 01000000000000000000000 represent -0.5 The sign bit (first one) signifies that the number is negative. The next eight bits are the exponent. Usually, the exponent is stored with a bias so that small numbers may be stored precisely in addition to large ones. To do this, the exponent used for calculations is -127 (if an 8 bit exponent) + the exponent bits interpreted as an integer (in our case 128). All remaining bits signify the significant digits of the number starting with the ones place and moving rightward from there we cut the value in half each time (1 is 1, but 01 is 1/2 and 001 is 1/4) in our case the total number is -0.5 (-1 for the sign bit * 2^1 for the exponent * 0.5 for the remaining bits)
For further example, here is a converter that uses checkboxes to indicate the bits. At worst, you could write a bot to test all of the combinations you care about.
In general, if it cannot be described with a fraction that is not directly made of combinations of powers of two, it will be rounded. (If you can't get it with 1/2 + 1/4 + 1/8 etc. it will probably be rounded)
Almost all numbers are troublesome. The minor exception are those, that when repeatedly multiplied by 2 get rid of the fractional part and end up being less than 2^24.
e.g.
0.125 -> 0.25 -> 0.5 -> 1.0 ! OK
6.4 --> 12.8 --> 25.6 -> 51.2 -> 102.4 : we got a cycle! (.4 .8 .6 .2 ... )
EDIT
Given the purpose/context of the issue, the inexactness seems to cause trouble when the floating point is rounded towards zero.
e.g.
10.2 == 10.199999809, when the next floating point would be
next == 10.200000763 <-- the difference to the wanted value is ~4 times higher
vs.
10.3 == 10.300000197, when the previous (rounded down fp would be)
prev == 10.299992370, <-- this would have been also ~4x further from away
Perhaps it's time to talk to the programmer of the CNC driver...

what would indicate an overflow?

Im doing this question and some clarification would be super helpful. What exactly would an overflow entail? If when converting to decimal notation an extra bit would be needed? Fro part 3 "consider the bits as two's complement numbers" does he mean find the 2's complement? Thanks a bunch.
For number 3 he does not mean find the 2's complement. He is telling you to treat the values as signed numbers using 2's complement notation. That would mean the first value in a) is positive and the other three are negative.
For overflow it is different for 2 and 3. For 2, unsigned numbers, overflow occurs if there is a carry out of the high bit. For 3, 2's complement signed numbers, overflow occurs if the sign of the result is not correct. For example, if you add two positive numbers and the result is negative, there was overflow.
If you add x and y and get a result that is less than x or less than y, then the addition has overflowed (wrapped-around).
An overflow would be if the resulting sum is a larger number than can be expressed in an 8 bit system. I believe that would be any number greater than 255 (1 << 8).
Your assumption "an extra bit" is mostly correct. In an 8 bit system, all numbers are stored in 8 bits. Any operation that results in a number greater than the maximum that can be represented will be an overflow. This doesn't happen when you convert to decimal, but when you actually perform the sum with the binary values. If all numbers are 8 bits, you can't just add an additional bit when you need to store a larger number.
Yes, "two's complement" is the same as "2's complement". I'm not aware of any distinction between whether you spell it out or use the numeral.

how to represent floating points in binary?

I have been working on these three lab questions for about 5 hours. I'm stuck on the last question.
Consider the following floating point number representation which
stores a floating point number is 16 bits. You have a sign-bit, a six
bit (excess-32) exponent, and a nine bit mantissa.
Explain how the 9-bit mantissa might get you into trouble.
Here is the preceding question. Not sure if it will help in analysis.
What is the range of exponents it supports?
000000 to 111111 or 0 to 63 where exponent values less
than 32 are negative, and exponent values greater than 32 are
positive.
I have a pretty good foundation for floating points and converting between decimals and floating points. Any guidance would be greatly appreciated.
To me, the ratio mantissa to exponent is a bit off. Even if we assume there is a hidden bit, effectively making this into a 10 bit mantissa (with top bit always set), you can represent + or - 2^31, but in 2^31/2^10 = 2^21 steps (i.e. steps of 2097152).
I'd rather use 11 bits mantissa and 5 bit exponent, making this 2^15/2^11 = 2^4, i.e. steps of 16.
So for me the trouble would be that 9+1 bits is simply too imprecise, compared to the relatively large exponent.
My guess is that a nine-bit mantissa simply provides far too little precision, so that any operation apart from trivial ones, will make the calculation far too inexact to be useful.
I admit this answer is a little bit far-fetched, but apart from this, I can't see a problem with the representation.