Representing floating point numbers and integers in binary - binary

Sorry for the earlier no so clear questions,
I should of asked my doubt than just posting the question.
just to clarify so lets say it is a 3bit float and a 3 bit integer, so for int its
2^3, [_ _ _ _ ], ex: 8=[1 0 0 0].
What about for the float will it be like
[ _ _ _ _ . _ _ _ _]
example i can represent 8.5 as
[1 0 0 0. 1 0 0 0]
Is what i wrote correct, i get how to represent a integer value given how many bits it should, but for float, are the number of bits same before and after the decimal place.

3 bit floating point has to have a maximum of 3 bits, which makes it if not impossible, completely impractical. Here is why from the definition of a floating point representation (IEEE-754)
Floating point is comprised of finite numbers, which may be either base 2 (binary) or base 10 (decimal). Each finite number is described by three integers: s = a sign (zero or one), c = a significand (or 'coefficient'), q = an exponent. This means in your case that you have a sign bit (bit 1) a significand (bit2) and and exponent (bit3). The range for this is useless.
You don't string series of three bits together because floating point is a predetermined fixed length that everybody agrees on ahead of time, at least in the sense you mean.
There are examples of how to do extended floating point in GSL source:
http://www.gnu.org/software/gsl/
Decimal numbers can also be represented by other formats like: BCD
http://en.wikipedia.org/wiki/Binary-coded_decimal
Final point: you CAN make 3bit FP numbers, and then string them together if you want to make bigger numbers. They just won't be 3bit anymore: 6bit 12bit, etc.

Related

JSON.parse big numbers result in incorrect numbers [duplicate]

See this code:
var jsonString = '{"id":714341252076979033,"type":"FUZZY"}';
var jsonParsed = JSON.parse(jsonString);
console.log(jsonString, jsonParsed);
When I see my console in Firefox 3.5, the value of jsonParsed is the number rounded:
Object id=714341252076979100 type=FUZZY
Tried different values, the same outcome (number rounded).
I also don't get its rounding rules. 714341252076979136 is rounded to 714341252076979200, whereas 714341252076979135 is rounded to 714341252076979100.
Why is this happening?
You're overflowing the capacity of JavaScript's number type, see §8.5 of the spec for details. Those IDs will need to be strings.
IEEE-754 double-precision floating point (the kind of number JavaScript uses) can't precisely represent all numbers (of course). Famously, 0.1 + 0.2 == 0.3 is false. That can affect whole numbers just like it affects fractional numbers; it starts once you get above 9,007,199,254,740,991 (Number.MAX_SAFE_INTEGER).
Beyond Number.MAX_SAFE_INTEGER + 1 (9007199254740992), the IEEE-754 floating-point format can no longer represent every consecutive integer. 9007199254740991 + 1 is 9007199254740992, but 9007199254740992 + 1 is also 9007199254740992 because 9007199254740993 cannot be represented in the format. The next that can be is 9007199254740994. Then 9007199254740995 can't be, but 9007199254740996 can.
The reason is we've run out of bits, so we no longer have a 1s bit; the lowest-order bit now represents multiples of 2. Eventually, if we keep going, we lose that bit and only work in multiples of 4. And so on.
Your values are well above that threshold, and so they get rounded to the nearest representable value.
As of ES2020, you can use BigInt for integers that are arbitrarily large, but there is no JSON representation for them. You could use strings and a reviver function:
const jsonString = '{"id":"714341252076979033","type":"FUZZY"}';
// Note it's a string −−−−^−−−−−−−−−−−−−−−−−−^
const obj = JSON.parse(jsonString, (key, value) => {
if (key === "id" && typeof value === "string" && value.match(/^\d+$/)) {
return BigInt(value);
}
return value;
});
console.log(obj);
(Look in the real console, the snippets console doesn't understand BigInt.)
If you're curious about the bits, here's what happens: An IEEE-754 binary double-precision floating-point number has a sign bit, 11 bits of exponent (which defines the overall scale of the number, as a power of 2 [because this is a binary format]), and 52 bits of significand (but the format is so clever it gets 53 bits of precision out of those 52 bits). How the exponent is used is complicated (described here), but in very vague terms, if we add one to the exponent, the value of the significand is doubled, since the exponent is used for powers of 2 (again, caveat there, it's not direct, there's cleverness in there).
So let's look at the value 9007199254740991 (aka, Number.MAX_SAFE_INTEGER):
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− sign bit
/ +−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− exponent
/ / | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+− significand
/ / | / |
0 10000110011 1111111111111111111111111111111111111111111111111111
= 9007199254740991 (Number.MAX_SAFE_INTEGER)
That exponent value, 10000110011, means that every time we add one to the significand, the number represented goes up by 1 (the whole number 1, we lost the ability to represent fractional numbers much earlier).
But now that significand is full. To go past that number, we have to increase the exponent, which means that if we add one to the significand, the value of the number represented goes up by 2, not 1 (because the exponent is applied to 2, the base of this binary floating point number):
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− sign bit
/ +−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− exponent
/ / | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+− significand
/ / | / |
0 10000110100 0000000000000000000000000000000000000000000000000000
= 9007199254740992 (Number.MAX_SAFE_INTEGER + 1)
Well, that's okay, because 9007199254740991 + 1 is 9007199254740992 anyway. But! We can't represent 9007199254740993. We've run out of bits. If we add just 1 to the significand, it adds 2 to the value:
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− sign bit
/ +−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− exponent
/ / | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+− significand
/ / | / |
0 10000110100 0000000000000000000000000000000000000000000000000001
= 9007199254740994 (Number.MAX_SAFE_INTEGER + 3)
The format just cannot represent odd numbers anymore as we increase the value, the exponent is too big.
Eventually, we run out of significand bits again and have to increase the exponent, so we end up only being able to represent multiples of 4. Then multiples of 8. Then multiples of 16. And so on.
What you're seeing here is actually the effect of two roundings. Numbers in ECMAScript are internally represented double-precision floating-point. When id is set to 714341252076979033 (0x9e9d9958274c359 in hex), it actually is assigned the nearest representable double-precision value, which is 714341252076979072 (0x9e9d9958274c380). When you print out the value, it is being rounded to 15 significant decimal digits, which gives 14341252076979100.
It is not caused by this json parser. Just try to enter 714341252076979033 to fbug's console. You'll see the same 714341252076979100.
See this blog post for details:
http://www.exploringbinary.com/print-precision-of-floating-point-integers-varies-too
JavaScript uses double precision floating point values, ie a total precision of 53 bits, but you need
ceil(lb 714341252076979033) = 60
bits to exactly represent the value.
The nearest exactly representable number is 714341252076979072 (write the original number in binary, replace the last 7 digits with 0 and round up because the highest replaced digit was 1).
You'll get 714341252076979100 instead of this number because ToString() as described by ECMA-262, §9.8.1 works with powers of ten and in 53 bit precision all these numbers are equal.
The problem is that your number requires a greater precision than JavaScript has.
Can you send the number as a string? Separated in two parts?
JavaScript can only handle exact whole numbers up to about 9000 million million (that's 9 with 15 zeros). Higher than that and you get garbage. Work around this by using strings to hold the numbers. If you need to do math with these numbers, write your own functions or see if you can find a library for them: I suggest the former as I don't like the libraries I've seen. To get you started, see two of my functions at another answer.

Why offset is calculated as `2^{n-1} - 1` instead of `2^{n-1}` for floating point exponent representation

I'm trying to understand why offset K in binary offset notation is calculated as
2^{n-1}-1 instead of 2^{n-1} for floating point exponent representation. Here is my reasoning for 2^{n-1}.
Four bits can represent values in the range [-8;7], so 0000 represents -8. An offset from zero here is 8 and can be calculated as 2^{n-1}. Using this offset we can define representation of any number, for example, the number 3.
What number do we need to add to -8 to get 3? It's 11, so 3 in offset binary is represented as 1011. And the formula seems to be number to represent + offset.
However, the real formula is number to represent + offset - 1, and so the correct representation is 1010. Can someone please explain why we also subtract additional one?
I am posting this as an answer to better explain my thougths, but even though I'll quote the standard a few times, I haven't found an explicitly stated reason.
In the following, I'll refer to the IEEE 754 standard (and succesive revisions) for floating point representation, even if OP doesn't mention it (if I'm wrong, please, let me know).
The question is about the particular representation of the exponent in a floating point number.
In subclause 3.3 Sets of floating-point data is said (emphasis mine):
The set of finite floating-point numbers representable within a
particular format is determined by the following integer parameters:
― b = the radix, 2 or 10
― p = the number of digits in the significand (precision)
― emax = the maximum exponent e
― emin = the minimum exponent e
emin shall be 1 − emax for all formats.
Later it specifies:
The smallest positive normal floating-point number is bemin and
the largest is bemax×(b − b1 − p). The non-zero floating-point numbers for a format with magnitude less than bemin
are called subnormal because their magnitudes lie between
zero and the smallest normal magnitude.
In 3.4 Binary interchange format encodings:
Representations of floating-point data in the binary interchange formats are encoded in k bits in the following three fields (...):
a) 1-bit sign S
b) w-bit biased exponent E = e + bias
c) (t=p−1)-bit trailing significand field digit string T=d1 d2 ... dp − 1 ; the leading bit of the significand, d0, is implicitly encoded in the biased exponent E
(...)
The range of the encoding’s biased exponent E shall include:
― every integer between 1 and 2w − 2, inclusive, to encode normal numbers
― the reserved value 0 to encode ±0 and subnormal numbers
― the reserved value 2w − 1 to encode ± ∞ and NaNs.
For example a 32-bit floating point number has those parameters:
k, storage width in bits 32
p, precision in bits 24
emax, maximum exponent e 127
emin, minimum exponent e -126
bias, E − e 127
w, exponent field width in bits 8
t, trailing significand field width in bits 23
In this Q&A is pointed out that: "The purpose of the bias is so that the exponent is stored in unsigned form, making it easier to do comparisons."
Considering the above mentioned 32-bit floating point representation a normal (not subnormal) number has an encoded biased exponent E in the range between 1 and 254.
The reason behind the particular choice of the range -126, 127 for the exponent could be, in my opinion, to extend the range of representable numbers: very low numbers are represented by subnormals so a bigger (even if only by one) maximum exponent can take care of the big ones.

Why is hexadecimal more machine-friendly than decimal?

This may be a silly question since I can see a relationship between "16 is a power of two" and "each bit represents 2^(n of bits) values", so it is logical that two hex numerical digits represent a byte's possible values. However I am still too blind to see how it would be any better than decimal as either way it needs to be converted into binary.
Can someone explain the numerical notation conversion process in low-level pseudo-programming to make it more understandable? I really don't see how the conversion process is shortened if there is a conversion going on anyway.
As you have noticed. Both are equally machine friendly. Both 10 and 0x0a is the same number. In the same way that 1/2 and 0.5 is the same number in decimal.
From the machine's point of view it doesn't matter. The machine doesn't see "10" or "0x0a". The machine just sees 00001010. Actually, not even that. What's really in the machine are two bits at a voltage (or charge) level that represents on and eight bits at a voltage level that represents off.
The reason for the popularity of hexadecimal numbers is that it is easier to read for a "human". At least for engineers and programmers (to be honest I'm a bit hesitant to mention programmers because it isn't true for a vast majority of programmers).
First of all, engineers and programmers who have to write device drivers rarely care if a variable has the value 10 or 62 or 233 or whatever. They do care if a number fits within memory or not. What values are sent to the hardware is the user's problem. Weather or not that value can be sent is what the engineer or driver writers have to deal with.
For this hex numbers have a very significant advantage because each character is aligned to exactly one nybble. Which means that a byte is represented by exactly two characters and two characters represent exactly one byte. Contrast this with decimals where a byte requires three characters but three characters can represent up to 12 bits.
Quick, can you fit 133 in one byte? Maybe you know that one byte can represent numbers from 0-255 so it may seem obvious, but a number like 0x133 obviously requires two bytes. In fact, to make it more clear you'll often see engineers write (hex) 01 33 in data dumps or documentation to make it more obvious that it's two bytes.
It's even more useful at higher bit counts. Quick, can you fit 4311111111 in 32 bits? I have a vague idea that 32 bits is roughly 4 million but am not sure if that number fits in 32 bits. Contrast this with it's hexadecimal representation: 0x1 00F6 55C7 it is much more obvious that you need at least 33 bits to represent that number.
Engineers are also used to seeing bit patterns in hexadecimal. The basic pattern is simple:
1 = first bit
2 = second bit
4 = third bit
8 = fourth bit
So, if you want to set bit number 19 in a register, your thought process would go something like this:
Bit 19 is after the 16th bit. And you know that the 16th bit is `0x0000 8000`
(when I think in hex, my mind always add the blank spaces for clarity).
19 is 3 bits higher than 16. And the number that represents the third bit is 4.
So bit number 19 is: `0x0004 0000`
That part I mentioned about the 16th bit is another basic pattern most engineers trained to read hex recognize:
00 00 00 80 = bit no 8
00 00 80 00 = bit no 16
00 80 00 00 = bit no 24
80 00 00 00 = bit no 32
Another common pattern that I've trained myself to recognize is square waves:
_ _ _ _
0x55 = _| |_| |_| |_| (01010101)
_ _ _ _
0xaa = |_| |_| |_| |_ (10101010)
_ _ _ _
0x33 = _ _| |_ _| (00110011)
_ _ _ _
0xcc = |_ _| |_ _ (11001100)
_ _ _ _
0x66 = _| |_ _| |_ (01100110)
_ _ _ _
0x99 = |_ _| |_ _| (10011001)
_ _ _ _
0xf0 = |_ _ _ _ (11110000)
_ _ _ _
0x0f = _ _ _ _| (00001111)
Which means, that for someone like me, if you tell me you want to set the first, third and fourth bits of a variable in under 2 seconds I'd go Aha: 0x0d. Because obviously third and foutth bits is c (square wave pattern) so add bit one and it becomes d.
You may be thinking to yourself: wait a minute, wouldn't binary representation be better. Well, yes (bin) 0000 1101 is much more obviously the first, third and fourth bits set. The problem with binary representation is that at anything higher than 8 bits the number becomes too large for the human eye to easily comprehend and in practice have lead to even more mistakes and misunderstandings compared to hexadecimal.
Also, not many languages have support for binary representation and of those that do most don't allow spaces or other separators (underscore for example) between digits which makes the number even less readable compared to hex.
Hexadecimal hasn't always been the preferred representation among engineers and (some) programmers. At one point octal numbers were just as popular. But octals have the same problem as decimal. A byte requires three octal digits to represent but three octal digits can actually represent 9 bits. So it doesn't have that nice perfect division that hexadecimal has.
There is at least one other method to represent binary values that's very popular. You've seen it many times but may not have realized what it actually is: the dotted decimal notation of IPv4 addresses is actually an encoding that represents the values as decimal yet attempts to fit them within the structure of memory:
192.168.0.1 = 11000000 10101000 00000000 00000001
(192) (168) (0) (1)
Alas, as we move forward with IPv6, we seem to have decided to go back to hexadecimal encoding.
It isn't more machine-friendly, it's more people-friendly. As the question comments have pointed out, machines deal almost exclusively with bits. But they expose design elements to people in a variety of ways. Different machine designs have encouraged different powers-of-two encodings in a number of ways. There's the obvious effect of the now-standard 8-bit byte, but bytes haven't always been 8 bits. A lot of the use of hexadecimal comes from how machines, especially those designed prior to near-universal microcoding, lay out instruction codes. In some designs, octal makes better sense. For example, the Intel 8080 CPU had a MOVinstruction that contained two register ids. Expressed in hexadecimal, the various register combinations produce opcodes ranging from 40 to 7F, with no obvious patterns. But in octal, it's obvious: 200 through 277, or rather 2xy, where x and y are registerids.

What is this floating point format?

I am trying to figure out how to read a historical binary data file. I believe it came from an older 32 bit Solaris system. I am looking at a section of the file that I believe contains 32 bit floating numbers (not IEEE floats). The format appears to be (as a hex dump):
xx41 xxxx
xx42 xxxx
The 41 and 42 in those positions appear consistently through the floating point numbers. I'm afraid that I do not have any additional information to add to this. So the first part of my question is, what format is this? If the first part can not be answered directly, a list of likely possibilities would be great. Lastly, how would you suggest going about determining what format this is? Thank you for your input.
Could this be PDP-11 format? The giveaway for me is that the second byte is mostly constant, which suggests that the exponent of the floating-point format is ending up in the second byte rather than the first (as you'd expect for a big-endian machine) or the last (for a little-endian machine). The PDP-11 is notorious for its funny byte order for floats and integers; see the material near the bottom of this Floating-Point Formats page.
The values of 41 and 42 would appear to be consistent with positive values of roughly unit-magnitude: the exponent bias for the PDP-11 format appears to be 128, so with the unusual byte-order I'd expect the 2nd byte that you list to contain the sign and the topmost 7 bits of the exponent; that would make the unbiased exponent for a second byte of 41 be either 2 or 3 depending on the 8th exponent bit (which should appear as the MSB of the first byte).
See also this page for a brief description of the PDP-11 format.
[EDIT] Here's some Python code to convert from a 4-byte string in the form you describe to a Python float, assuming that the 4-byte string represents a float in PDP-11 format.
import struct
def pdp_to_float(xs):
"""Convert a 4-byte PDP-11 single-precision float to a Python float."""
ordered_bytes = ''.join(xs[i] for i in [1, 0, 3, 2])
n = struct.unpack('>I', ordered_bytes)[0]
fraction = n & 0x007fffff
exponent = (n & 0x7f800000) >> 23
sign = (n & 0x80000000) >> 31
hidden = 1 if exponent != 0 else 0
return (-1)**sign * 2**(exponent-128) * (hidden + fraction / 2.0**23)
Example:
>>> pdp_to_float('\x00\x00\x00\x00')
0.0
>>> pdp_to_float('\x23\x41\x01\x00')
5.093750476837158
>>> pdp_to_float('\x00\x42\x00\x00')
16.0
The data described is consistent with the usual IEEE 754 format, stored in big-endian order, then displayed by a little-endian dump program two bytes at a time.
32-bit floats in the interval [8, 128) have first bytes of 0x41 or 0x42. Consider such a number, perhaps 0x41010203. Stored big end first, it would appear in memory as the four bytes 0x41, 0x01, 0x02, and 0x03. When the dump program reads 16-byte integers, little end first, it will read and display 0x0141 and 0x0302.

Why prefer two's complement over sign-and-magnitude for signed numbers?

I'm just curious if there's a reason why in order to represent -1 in binary, two's complement is used: flipping the bits and adding 1?
-1 is represented by 11111111 (two's complement) rather than (to me more intuitive) 10000001 which is binary 1 with first bit as negative flag.
Disclaimer: I don't rely on binary arithmetic for my job!
It's done so that addition doesn't need to have any special logic for dealing with negative numbers. Check out the article on Wikipedia.
Say you have two numbers, 2 and -1. In your "intuitive" way of representing numbers, they would be 0010 and 1001, respectively (I'm sticking to 4 bits for size). In the two's complement way, they are 0010 and 1111. Now, let's say I want to add them.
Two's complement addition is very simple. You add numbers normally and any carry bit at the end is discarded. So they're added as follows:
0010
+ 1111
=10001
= 0001 (discard the carry)
0001 is 1, which is the expected result of "2+(-1)".
But in your "intuitive" method, adding is more complicated:
0010
+ 1001
= 1011
Which is -3, right? Simple addition doesn't work in this case. You need to note that one of the numbers is negative and use a different algorithm if that's the case.
For this "intuitive" storage method, subtraction is a different operation than addition, requiring additional checks on the numbers before they can be added. Since you want the most basic operations (addition, subtraction, etc) to be as fast as possible, you need to store numbers in a way that lets you use the simplest algorithms possible.
Additionally, in the "intuitive" storage method, there are two zeroes:
0000 "zero"
1000 "negative zero"
Which are intuitively the same number but have two different values when stored. Every application will need to take extra steps to make sure that non-zero values are also not negative zero.
There's another bonus with storing ints this way, and that's when you need to extend the width of the register the value is being stored in. With two's complement, storing a 4-bit number in an 8-bit register is a matter of repeating its most significant bit:
0001 (one, in four bits)
00000001 (one, in eight bits)
1110 (negative two, in four bits)
11111110 (negative two, in eight bits)
It's just a matter of looking at the sign bit of the smaller word and repeating it until it pads the width of the bigger word.
With your method you would need to clear the existing bit, which is an extra operation in addition to padding:
0001 (one, in four bits)
00000001 (one, in eight bits)
1010 (negative two, in four bits)
10000010 (negative two, in eight bits)
You still need to set those extra 4 bits in both cases, but in the "intuitive" case you need to clear the 5th bit as well. It's one tiny extra step in one of the most fundamental and common operations present in every application.
Wikipedia says it all:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.
In other words, adding is the same, wether or not the number is negative.
Even though this question is old , let me put in my 2 cents.
Before I explain this ,lets get back to basics. 2' complement is 1's complement + 1 .
Now what is 1's complement and what is its significance in addition.
Sum of any n-bit number and its 1's complement gives you the highest possible number that can be represented by those n-bits.
Example:
0010 (2 in 4 bit system)
+1101 (1's complement of 2)
___________________________
1111 (the highest number that we can represent by 4 bits)
Now what will happen if we try to add 1 more to the result. It will results in an overflow.
The result will be 1 0000 which is 0 ( as we are working with 4 bit numbers , (the 1 on left is an overflow )
So ,
Any n-bit number + its 1's complement = max n-bit number
Any n-bit number + its 1'complement + 1 = 0 ( as explained above, overflow will occur as we are adding 1 to max n-bit number)
Someone then decided to call 1's complement + 1 as 2'complement. So the above statement becomes:
Any n'bit number + its 2's complement = 0
which means 2's complement of a number = - (of that number)
All this yields one more question , why can we use only the (n-1) of the n bits to represent positive number and why does the left most nth bit represent sign (0 on the leftmost bit means +ve number , and 1 means -ve number ) . eg why do we use only the first 31 bits of an int in java to represent positive number if the 32nd bit is 1 , its a -ve number.
1100 (lets assume 12 in 4 bit system)
+0100(2's complement of 12)
___________________________
1 0000 (result is zero , with the carry 1 overflowing)
Thus the system of (n + 2'complement of n) = 0 , still works. The only ambiguity here is 2's complement of 12 is 0100 which ambiguously also represents +8 , other than representing -12 in 2s complement system.
This problem will be solved if positive numbers always have a 0 in their left most bit. In that case their 2's complement will always have a 1 in their left most bit , and we wont have the ambiguity of the same set of bits representing a 2's complement number as well as a +ve number.
Two's complement allows addition and subtraction to be done in the normal way (like you wound for unsigned numbers). It also prevents -0 (a separate way to represent 0 that would not be equal to 0 with the normal bit-by-bit method of comparing numbers).
Two's complement allows negative and positive numbers to be added together without any special logic.
If you tried to add 1 and -1 using your method
10000001 (-1)
+00000001 (1)
you get
10000010 (-2)
Instead, by using two's complement, we can add
11111111 (-1)
+00000001 (1)
you get
00000000 (0)
The same is true for subtraction.
Also, if you try to subtract 4 from 6 (two positive numbers) you can 2's complement 4 and add the two together 6 + (-4) = 6 - 4 = 2
This means that subtraction and addition of both positive and negative numbers can all be done by the same circuit in the cpu.
this is to simplify sums and differences of numbers. a sum of a negative number and a positive one codified in 2's complements is the same as summing them up in the normal way.
The usual implementation of the operation is "flip the bits and add 1", but there's another way of defining it that probably makes the rationale clearer. 2's complement is the form you get if you take the usual unsigned representation where each bit controls the next power of 2, and just make the most significant term negative.
Taking an 8-bit value a7 a6 a5 a4 a3 a2 a1 a0
The usual unsigned binary interpretation is:
27*a7 + 26*a6 + 25*a5 + 24*a4 + 23*a3 + 22*a2 + 21*a1 + 20*a0
11111111 = 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255
The two's complement interpretation is:
-27*a7 + 26*a6 + 25*a5 + 24*a4 + 23*a3 + 22*a2 + 21*a1 + 20*a0
11111111 = -128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = -1
None of the other bits change meaning at all, and carrying into a7 is "overflow" and not expected to work, so pretty much all of the arithmetic operations work without modification (as others have noted). Sign-magnitude generally inspect the sign bit and use different logic.
To expand on others answers:
In two's complement
Adding is the same mechanism as plain positive integers adding.
Subtracting doesn't change too
Multiplication too!
Division does require a different mechanism.
All these are true because two's complement is just normal modular arithmetic, where we choose to look at some numbers as negative by subtracting the modulo.
Reading the answers to this question, I came across this comment [edited].
2's complement of 0100(4) will be 1100. Now 1100 is 12 if I say normally. So,
when I say normal 1100 then it is 12, but when I say 2's complement 1100 then
it is -4? Also, in Java when 1100 (lets assume 4 bits for now) is stored then
how it is determined if it is +12 or -4 ?? – hagrawal Jul 2 at 16:53
In my opinion, the question asked in this comment is quite interesting and so I'd like first of all to rephrase it and then to provide an answer and an example.
QUESTION – How can the system establish how one or more adjacent bytes have to be interpreted? In particular, how can the system establish whether a given sequence of bytes is a plain binary number or a 2's complement number?
ANSWER – The system establishes how to interpret a sequence of bytes through types.
Types define
how many bytes have to be considered
how those bytes have to be interpreted
EXAMPLE – Below we assume that
char's are 1 byte long
short's are 2 bytes long
int's and float's are 4 bytes long
Please note that these sizes are specific to my system. Although pretty common, they can be different from system to system. If you're curious of what they are on your system, use the sizeof operator.
First of all we define an array containing 4 bytes and initialize all of them to the binary number 10111101, corresponding to the hexadecimal number BD.
// BD(hexadecimal) = 10111101 (binary)
unsigned char l_Just4Bytes[ 4 ] = { 0xBD, 0xBD, 0xBD, 0xBD };
Then we read the array content using different types.
unsigned char and signed char
// 10111101 as a PLAIN BINARY number equals 189
printf( "l_Just4Bytes as unsigned char -> %hi\n", *( ( unsigned char* )l_Just4Bytes ) );
// 10111101 as a 2'S COMPLEMENT number equals -67
printf( "l_Just4Bytes as signed char -> %i\n", *( ( signed char* )l_Just4Bytes ) );
unsigned short and short
// 1011110110111101 as a PLAIN BINARY number equals 48573
printf( "l_Just4Bytes as unsigned short -> %hu\n", *( ( unsigned short* )l_Just4Bytes ) );
// 1011110110111101 as a 2'S COMPLEMENT number equals -16963
printf( "l_Just4Bytes as short -> %hi\n", *( ( short* )l_Just4Bytes ) );
unsigned int, int and float
// 10111101101111011011110110111101 as a PLAIN BINARY number equals 3183328701
printf( "l_Just4Bytes as unsigned int -> %u\n", *( ( unsigned int* )l_Just4Bytes ) );
// 10111101101111011011110110111101 as a 2'S COMPLEMENT number equals -1111638595
printf( "l_Just4Bytes as int -> %i\n", *( ( int* )l_Just4Bytes ) );
// 10111101101111011011110110111101 as a IEEE 754 SINGLE-PRECISION number equals -0.092647
printf( "l_Just4Bytes as float -> %f\n", *( ( float* )l_Just4Bytes ) );
The 4 bytes in RAM (l_Just4Bytes[ 0..3 ]) always remain exactly the same. The only thing that changes is how we interpret them.
Again, we tell the system how to interpret them through types.
For instance, above we have used the following types to interpret the contents of the l_Just4Bytes array
unsigned char: 1 byte in plain binary
signed char: 1 byte in 2's complement
unsigned short: 2 bytes in plain binary notation
short: 2 bytes in 2's complement
unsigned int: 4 bytes in plain binary notation
int: 4 bytes in 2's complement
float: 4 bytes in IEEE 754 single-precision notation
[EDIT] This post has been edited after the comment by user4581301. Thank you for taking the time to drop those few helpful lines!
Two's complement is used because it is simpler to implement in circuitry and also does not allow a negative zero.
If there are x bits, two's complement will range from +(2^x/2+1) to -(2^x/2). One's complement will run from +(2^x/2) to -(2^x/2), but will permit a negative zero (0000 is equal to 1000 in a 4 bit 1's complement system).
It's worthwhile to note that on some early adding machines, before the days of digital computers, subtraction would be performed by having the operator enter values using a different colored set of legends on each key (so each key would enter nine minus the number to be subtracted), and press a special button would would assume a carry into a calculation. Thus, on a six-digit machine, to subtract 1234 from a value, the operator would hit keys that would normally indicate "998,765" and hit a button to add that value plus one to the calculation in progress. Two's complement arithmetic is simply the binary equivalent of that earlier "ten's-complement" arithmetic.
The advantage of performing subtraction by the complement method is reduction in the hardware
complexity.The are no need of the different digital circuit for addition and subtraction.both
addition and subtraction are performed by adder only.
I have a slight addendum that is important in some situations: two's compliment is the only representation that is possible given these constraints:
Unsigned numbers and two's compliment are commutative rings with identity. There is a homomorphism between them.
They share the same representation, with a different branch cut for negative numbers, (hence, why addition and multiplication are the same between them.)
The high bit determines the sign.
To see why, it helps to reduce the cardinality; for example, Z_4.
Sign and magnitude and ones' compliment both do not form a ring with the same number of elements; a symptom is the double zero. It is therefore difficult to work with on the edges; to be mathematically consistent, they require checking for overflow or trap representations.
Well, your intent is not really to reverse all bits of your binary number. It is actually to subtract each its digit from 1. It's just a fortunate coincidence that subtracting 1 from 1 results in 0 and subtracting 0 from 1 results in 1. So flipping bits is effectively carrying out this subtraction.
But why are you finding each digit's difference from 1? Well, you're not. Your actual intent is to compute the given binary number's difference from another binary number which has the same number of digits but contains only 1's. For example if your number is 10110001, when you flip all those bits, you're effectively computing (11111111 - 10110001).
This explains the first step in the computation of Two's Complement. Now let's include the second step -- adding 1 -- also in the picture.
Add 1 to the above binary equation:
11111111 - 10110001 + 1
What do you get? This:
100000000 - 10110001
This is the final equation. And by carrying out those two steps you're trying to find this, final difference: the binary number subtracted from another binary number with one extra digit and containing zeros except at the most signification bit position.
But why are we hankerin' after this difference really? Well, from here on, I guess it would be better if you read the Wikipedia article.
We perform only addition operation for both addition and subtraction. We add the second operand to the first operand for addition. For subtraction we add the 2's complement of the second operand to the first operand.
With a 2's complement representation we do not need separate digital components for subtraction—only adders and complementers are used.
A major advantage of two's-complement representation which hasn't yet been mentioned here is that the lower bits of a two's-complement sum, difference, or product are dependent only upon the corresponding bits of the operands. The reason that the 8 bit signed value for -1 is 11111111 is that subtracting any integer whose lowest 8 bits are 00000001 from any other integer whose lowest 8 bits are 0000000 will yield an integer whose lowest 8 bits are 11111111. Mathematically, the value -1 would be an infinite string of 1's, but all values within the range of a particular integer type will either be all 1's or all 0's past a certain point, so it's convenient for computers to "sign-extend" the most significant bit of a number as though it represented an infinite number of 1's or 0's.
Two's-complement is just about the only signed-number representation that works well when dealing with types larger than a binary machine's natural word size, since when performing addition or subtraction, code can fetch the lowest chunk of each operand, compute the lowest chunk of the result, and store that, then load the next chunk of each operand, compute the next chunk of the result, and store that, etc. Thus, even a processor which requires all additions and subtractions to go through a single 8-bit register can handle 32-bit signed numbers reasonably efficiently (slower than with a 32-bit register, of course, but still workable).
When using of the any other signed representations allowed by the C Standard, every bit of the result could potentially be affected by any bit of the operands, making it necessary to either hold an entire value in registers at once or else follow computations with an extra step that would, in at least some cases, require reading, modifying, and rewriting each chunk of the result.
There are different types of representations those are:
unsigned number representation
signed number representation
one's complement representation
Two's complement representation
-Unsigned number representation used to represent only positive numbers
-Signed number representation used to represent positive as well as a negative number. In Signed number representation MSB bit represents sign bit and rest bits represents the number. When MSB is 0 means number is positive and When MSB is 1 means number is negative.
Problem with Signed number representation is that there are two values for 0.
Problem with one's complement representation is that there are two values for 0.
But if we use Two's complement representation then there will only one value for 0 that's why we represent negative numbers in two's complement form.
Source:Why negative numbers are stored in two's complement form bytesofgigabytes
One satisfactory answer of why Two2's Complement is used to represent negative numbers rather than One's Complement system is that
Two's Complement system solves the problem of multiple representations of 0 and the need for end-around-carry which exist in the One's complement system of representing negative numbers.
For more information Visit https://en.wikipedia.org/wiki/Signed_number_representations
For End-around-carry Visit
https://en.wikipedia.org/wiki/End-around_carry