How does adding MIN_VALUE compare integers as unsigned? - language-agnostic

In Java the int type is signed, but it has a method that compares two ints as if they were unsigned:
public static int compareUnsigned(int x, int y) {
return compare(x + MIN_VALUE, y + MIN_VALUE);
}
It adds Integer.MIN_VALUE to each argument, then calls the normal signed comparison method, which is:
public static int compare(int x, int y) {
return (x < y) ? -1 : ((x == y) ? 0 : 1);
}
How does adding MIN_VALUE to each argument magically make the comparison unsigned?

This technique works for any size of integer, but I'll use an 8-bit byte-sized integer to explain, because the numbers are smaller and easier to work with.
An 8-bit type has 28 = 256 possible values. At a low level these are just bits, and signed vs unsigned is a matter of how we interpret those bits. When interpreted as an unsigned integer, they have a range of 0 to 255. When interpreted as a signed two's complement integer, they have a range of −128 to +127.
The number line for the types looks like this:
 
Notice that the positive numbers from 0 to 127 can be represented by both signed and unsigned types, and they are represented by exactly the same bit patterns (00000000 to 01111111).
The bit patterns which represent the large positive numbers from 128 to 255 in the unsigned interpretation are reused for the numbers −128 to −1 in the signed interpretation. It is as if someone took the unsigned number line, chopped off the upper half of the range, and glued it on at the lower end of the line.
Now, let's look at what happens when we compare a pair of integers.
Case 1: Both values are in the signed positive range
With both values in the range 0 to 127, they have the same numeric value whether the bits are interpreted as signed or unsigned.
We unconditionally add MIN_VALUE to each value. MIN_VALUE for our signed byte type is −128, so adding that means we are actually subtracting 128.
An example: our comparison function, using signed types, is given x = 20 and y = 60. Adding MIN_VALUE, we get x' = 20 − 128 = −108 and y' = 60 − 128 = −68:
 
Adding MIN_VALUE to a positive value will always map it to a negative value. At the extreme ends of the range, 0 would become −128, and 127 would become −1. The operation will not change the order of x and y relative to each other, so the result of any comparison between x' and y' will be the same as if we had not added MIN_VALUE, which is correct.
Case 2: Both values are in the signed negative range
In this case, both values are in the range −128 to −1 if interpreted as signed. If interpreted as unsigned they are in the range 128 to 255 (which is 256 greater).
When we unconditionally add MIN_VALUE to each of our signed negative values, it always causes overflow and wrap-around, to signed positive values. Numerically, this wrap-around is the same as adding 256. If we are given x = −35 and y = −80 to compare, we get x' = −35 − 128 + 256 = 93 and y' = −80 − 128 + 256 = 48:
 
We can also visualize this with the unsigned interpretations of −35 and −80, which are 221 and 176. When subtracting 128, we get exactly the same results for x' and y'. One of the advantages of two's complement is that addition and subtraction give the same results regardless of whether you treat the data as signed or unsigned, so CPUs can use the same circuitry.
As in case 1, the operation does not change the results of any comparisons between the two numbers. Our x was greater than y (being of lesser negative magnitude), and x' is also greater than y'. So comparisons between these inputs will be correct.
Case 3: One value is in the signed positive range, the other negative
This is the interesting case. Notice that when we add MIN_VALUE, it always changes a number's sign. Positive values are mapped to negative values and negative values are mapped to positive values.
Let's compare x = −35 and y = 60. Since we want these to be compared as unsigned, we really intend x to be interpreted as −35 + 256 = 221. So x needs to be interpreted as greater than y, even though our signed data type will not normally do this.
Because the numbers have opposite signs, the MIN_VALUE operation which changes the signs will reverse the numbers' order on the number line. x' = −35 − 128 + 256 = 93, and y' = 60 − 128 = −68. So we get x' is greater than y', which is what we wanted:
 
Generalization
Since we've handled all combinations of positive and negative, we know the technique works for all possible values.
In the case of 32-bit ints, the ranges are bigger (signed range is −2,147,483,648 (MIN_VALUE) to +2,147,483,647, and unsigned range is 0 to 4,294,967,295) but it works just the same. In fact it works for every size of integer, and in every programming language, provided that:
The signed integers use two's complement representation (which is nearly universal).
Addition wraps around on overflow (rather than raising an error or promoting to a bigger number type or being undefined).
You can also do the reverse: if you have only an unsigned integer type, and you want to do a two's complement signed comparison, add (the unsigned interpretation of) the signed minimum value to each number.
Because the technique is just two unconditional addition operations, it is extremely efficient even if not treated specially by a compiler or VM.

Related

What is the difference between signed and unsigned binary

I've been reading a few sites, but none of them make sense to me. Is signed and unsigned binary them same as signed and unsigned variables. I'd be glad if you could help :)
The "signed" indicator means that the item can hold positive or negative values. "Unsigned" doesn't distinguish between positive and negative values. A signed/unsigned variable can refer to any numerical data type (such as binary, integer, float, etc). Each data type might be further defined as signed or unsigned.
For example, an 8-bit signed binary could hold values from 0-127, both positive and negative (1 bit is used for the sign and 7 bits for the value), while an 8-bit unsigned binary could hold values from 0-255 (nothing distinguishes whether or not the value should be considered positive or negative, though it is commonly assumed to be positive).
A signed binary is a specific data type of a signed variable.
Hope that helps!
A "signed" variable means that the value holds a positive or negative value using it's most significant bit (the last bit to the left), which is what we call the "signed bit". An "unsigned" variable does not, but instead the most significant bit is just the next power of two.
We call a signed bit that is 1 a negative number whereas on an unsigned number the bit would fall under the regular binary bit rules.
For example max values look like this: Unsigned Char 0b11111111 (0xFF in hex) = 255 in decimal, (128+64+32+16+8+4+2+1 = 255)
Signed Char 0b11111111 (0xFF in hex) = -127 in decimal, (-1 * (64+32+16+8+4+2+1) = - 127)
Additionally what you might see in code:
Unsigned Char 0b10000001 (0x81 in hex) = 129 in decimal, (128 + 1 = 129)
Signed Char 0b10000001 (0x81 in hex) = -1 in decimal, (-1 * 1)
(Note: char is one byte which means it has eight digits in binary that can be changed)
(For anyone who is wondering, 0b means the bit is in binary and 0x means it is in hex)
Signed and Unsigned Binary refers to the conversion that depends on sign of the binary represented. Whereas for the variables it refers to having the variable able to store the negative value or not.
In Binary for signed bit: We say 1 is negative and 0 is positive. So if you see second example, the first bit is 1 means? - right, its negative. And we dont include it for the conversion base2 to base10.
For example: 1001
In Unsigned bit (dont care about sign) : 9
For example: 1001
In Signed bit (MSB is a sign bit): -1
For variables is it very likely that stores negative numbers.
MSB: Most Significant Bit
It depends on the position or situation. Example,in assembly, We want to load byte have value: 0xFF(~11111111 in binary) from memory. $s3 have address of this value.
with func lbu( load byte unsignal ), it only allows to load unsignal binary: lb rt, offset(rs).
lbu $s0, 32($s3) : lbu will load value and 0-extend to 32 bit 0x000000FF which is interpreted as 255.
with func addi, it allows to load signal binary: lb rt, ofset(rs).
lb $s0, 32($s3) : lb will load value and 1-extend to 32 bit 0xFFFFFFFF which is interpreted as -1.

Understanding offset-k method of representing negative integers

I'm reading this article about offset-k method of representing negative integers. Can someone please explain the following passage using some examples:
One logical way to represent signed integers is to have enough range
in binary numbers so that the zero can be offset to the middle of the
range of positive binary numbers. Then the magnitude of a negative
binary number can be simply subtracted from that zero point.
I understand the mechanics, e.g. to represent number 4 in 11 bits, I'll do 4+1023=1027, but can't understand the logic behind it and why it works.
If we have numbers ranging from -8 to +8 we can remove the sign by adding 8 to all of or numbers. The numbers would then be 0 to +16. It is rather like shifting the scale (as in converting Celsius to Kelvin) to obtain only positive values.
This representation allows operations on the biased numbers to be the same as for unsigned integers, but actually represents both positive and negative values.
This method is called by several names - Excess-K, also called offset binary or biased representation, uses a fixed value K as a biasing value.
A value is represented by the unsigned number which is K greater than the intended value.
Biased representations are now primarily used for the exponent of floating-point numbers. The IEEE floating-point standard defines the exponent field of a single-precision (32-bit) number as an 8-bit excess-127 field.
To understand More clearly the two examples below:
Example:1
4-bit Pattern
0110 the digit/column value of the most significant bit is 8, so 4 bit patterns are referred to as an Excess (8) notation.
To convert this example find the sum value of the entire pattern as though a standard binary number:
=(0 x 8) + (1 x 4) + (1 x 2) + (0 x 1 )
=0 + 4 + 2 + 0
=6
Then subtract the excess value,8, from the sum, (6 - 8)
The result is a signed value, -2.
Example 2
5-bit pattern
11110, the digit/column value of the most significant bit is 16, so 5-bit patterns are referred to as an Excess (16) notation.
To convert this example find the sum value of the entire pattern as though a standard binary number:
(1x16) + (1x8) + (1x4) + (1x2) + (0x1)
= 16 + 8 + 4 + 2 + 0
= 30
Then subtract the current excess value, 16, from the sum, (30 - 16)
The result is a signed value, + 14.
Hope it will clear the logic behind and understand that sign bit of 0 represents the negative sign and 1 represents the non-negative sign to denote a signed value

when two 16-bit signed data are multiplied, what should be the size of resultant?

I have faced an interview question related to embedded systems and C/C++. The question is:
If we multiply 2 signed (2's complement) 16-bit data, what should be the size of resultant data?
I've started attempting it with an example of multiplying two signed 4-bit, so, if we multiply +7 and -7, we end up with -49, which requires 7 bits. But, I could not formulate a general relation.
I think I need to understand binary multiply deeply to solve this question.
First, n bits signed integer contains a value in the range -(2^(n-1))..+(2^(n-1))-1.
For example, for n=4, the range is -(2^3)..(2^3)-1 = -8..+7
The range of the multiplication result is -8*+7 .. -8*-8 = -56..+64.
+64 is more than 2^6-1 - it is 2^6 = 2^(2n-2) ! You'll need 2n-1 bits to store such POSITIVE integer.
Unless you're doing proprietary encoding (see next paragraph) - you'll need 2n bits:
One bit for the sign, and 2n-1 bits for the absolute value of the multiplication result.
If M is the result of the multiplication, you can store -M or M-1 instead. this can save you 1 bit.
This will depend on context. In C/C++, all intermediates smaller than int are promoted to int. So if int is larger than 16-bits, then the result will be a signed 32-bit integer.
However, if you assign it back to a 16-bit integer, it will truncate leaving only bottom 16 bits of the two's complement of the new number.
So if your definition of "result" is the intermediate immediately following the multiply, then the answer is the size of int. If you define the size as after you've stored it back to a 16-bit variable, then answer is the size of the 16-bit integer type.

What is the total number of unique values for a double in the range [0.0, 1.0)?

Random.NextDouble() (a Double from the range [0.0,1.0)) is sometimes multiplied with a large Int64 (let Int64 big = 9000000000L), and the result floored to obtain a random Int64 value larger than what can be obtained from Random.Next() (an Int32 from the range [0,Int32.MaxValue)).
Random r = new Random();
long big = 9000000000L;
long answer = (long) (r.NextDouble() * big);
It seems to me that the total number of unique values for a Double in the range [0.0, 1.0) provides an upper-bound for the number of unique Int64 it can possibly generate. A loose upper-bound, in fact, as many different Doubles will map to the same Int64.
Hence, I would like to know: what is the total number of unique values for a double in the range [0.0, 1.0)?
Even better if you can tell me what is the largest value "big" can take so that "answer" can be a value from the range [0,big), and whether the distribution of values of "answer" is uniform, assuming that Random.NextDouble() is uniform.
Edit: Double (double) here refers to IEEE 754 floating-point double, while Int64 (long) and Int32 (int) refer to 64-bit and 32-bit signed 2's complement respectively.
Inspired by this question: Generating 10 digits unique random number in java
While I used C#, this question is language-agnostic and is more about discrete mathematics than programming, but it bothers me not mainly from a sense of mathematical curiousity, but from that of a programmer wanting to use a formula only if it does what it is supposed to do and from a security viewpoint.
IEEE-754 has 11 bits of exponent, and 52 of mantissa. Assuming the sign bit is 0 (positive), If the exponent ranges from 0x001 to 0x3FE, the value is a standard floating point number between 0 and 1. The mantissa is interpreted with a leading 1 that is not stored. For each of these 0x3FE values for the exponent, there are 2^52 values of the mantissa. In addition, if the exponent is 0x000, the mantissa is interpreted without that leading value, but as if the exponent were 0x001, for a total of 0x3FF = 1023 exponents where all mantissas are valid. This is a total of 1023*2^52 values. In addition, negative 0 may count, which is one more value.
If random doubles were generated uniformly from all values, then this would indeed produce a bias when multiplying in order to generate an Int64. However, any reasonable random library will approximate a uniform distribution on [0, 1), and this will not be biased when turning it into an Int64. The largest value for "big" that will allow all integers in [0, big) to be produced is 2^53 -- the resolution of the 2^52 numbers between 1/2 and 1 is 2^(-53). However, it's often the case that these numbers are produced by dividing random integers by the integer range (usually Int32) meaning you can't actually produce more numbers than this source. Consider directly combining two Int32s instead, e.g. by shifting one by 32 bits and combining them into an Int64. (Though be wary -- the state space for the generator might only be 32 bits.)
As a corollary to your question, I'll tell you that the Random C# generator uses internally a generator that "gives him" numbers between 0...Int32.MaxValue - 1. Then it divides the number by Int32.MaxValue (technically it multiplies by the inverse of that number) to return a double. So in C#, there are only Int32.MaxValue possible doubles returned (0...Int32.MaxValue - 1)
The IEEE754 is pretty clear on the precision of doubles:
http://en.wikipedia.org/wiki/IEEE_754-2008
You have 52 bits of precision plus an additional assumed bit.
You have exponents from -1022 to 1023, about 11 bits, including a sign.
The 64th bit is the overall sign for the number.
We'll ignore subnormalized numbers.
You're asking about exponents between -1022 and 0. This means you have about 10 of the available 11 bits of exponent available to you.
You have 52+1 bits of mantissa available.
This is about 62 bits of usable precision to represent 2**62 distinct values from
#wnoise pretty much nailed it, but here's my two cents.
IEEE floats can be compared and incremented as integers with some restrictions, see this question for details. So, if we cast +0.0 and 1.0 to 64 bit integers, we get the number of steps between zero and one:
#include <iostream>
int main()
{
double zero = 0.0;
double one = 1.0;
unsigned long long z = *reinterpret_cast<unsigned long long*>(&zero);
unsigned long long o = *reinterpret_cast<unsigned long long*>(&one);
std::cout << z << std::endl;
std::cout << o << std::endl;
}
This gives me 0 and 4607182418800017408, respectively, i.e. there are 4607182418800017408 unique double values in the range [0.0, 1.0).
The total number of unique values for a double in the range [0.0, 1.0) depends on the representation of double in the particular environment.
One of the most common representation is the one specified by IEEE 754. That format is e. g. mandated by Java and C# (see 1.3 Types and variables for the latter).
That depends on the implementation of double.There are implementations that do not allow denormalized values and leave out the leading one; determining the number of possible values here is easy:
there are a few "special" values (0, +0, -0, +∞, -∞, silent NaN, signalling NaN) that typically cost you one possible exponent
there is no way that shifting the mantissa and modifying the exponent gives an equivalent number
If your implementation allows denormalized values, determining this number becomes a bit more difficult, but I'd start by mapping the possible values in this representation to the equivalent representation with the fixed leading one (which will use one bit less in the mantissa); if you've found an appropriate mapping, this will be injective, and you have reduced the problem to a simpler one.

Why prefer two's complement over sign-and-magnitude for signed numbers?

I'm just curious if there's a reason why in order to represent -1 in binary, two's complement is used: flipping the bits and adding 1?
-1 is represented by 11111111 (two's complement) rather than (to me more intuitive) 10000001 which is binary 1 with first bit as negative flag.
Disclaimer: I don't rely on binary arithmetic for my job!
It's done so that addition doesn't need to have any special logic for dealing with negative numbers. Check out the article on Wikipedia.
Say you have two numbers, 2 and -1. In your "intuitive" way of representing numbers, they would be 0010 and 1001, respectively (I'm sticking to 4 bits for size). In the two's complement way, they are 0010 and 1111. Now, let's say I want to add them.
Two's complement addition is very simple. You add numbers normally and any carry bit at the end is discarded. So they're added as follows:
0010
+ 1111
=10001
= 0001 (discard the carry)
0001 is 1, which is the expected result of "2+(-1)".
But in your "intuitive" method, adding is more complicated:
0010
+ 1001
= 1011
Which is -3, right? Simple addition doesn't work in this case. You need to note that one of the numbers is negative and use a different algorithm if that's the case.
For this "intuitive" storage method, subtraction is a different operation than addition, requiring additional checks on the numbers before they can be added. Since you want the most basic operations (addition, subtraction, etc) to be as fast as possible, you need to store numbers in a way that lets you use the simplest algorithms possible.
Additionally, in the "intuitive" storage method, there are two zeroes:
0000 "zero"
1000 "negative zero"
Which are intuitively the same number but have two different values when stored. Every application will need to take extra steps to make sure that non-zero values are also not negative zero.
There's another bonus with storing ints this way, and that's when you need to extend the width of the register the value is being stored in. With two's complement, storing a 4-bit number in an 8-bit register is a matter of repeating its most significant bit:
0001 (one, in four bits)
00000001 (one, in eight bits)
1110 (negative two, in four bits)
11111110 (negative two, in eight bits)
It's just a matter of looking at the sign bit of the smaller word and repeating it until it pads the width of the bigger word.
With your method you would need to clear the existing bit, which is an extra operation in addition to padding:
0001 (one, in four bits)
00000001 (one, in eight bits)
1010 (negative two, in four bits)
10000010 (negative two, in eight bits)
You still need to set those extra 4 bits in both cases, but in the "intuitive" case you need to clear the 5th bit as well. It's one tiny extra step in one of the most fundamental and common operations present in every application.
Wikipedia says it all:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.
In other words, adding is the same, wether or not the number is negative.
Even though this question is old , let me put in my 2 cents.
Before I explain this ,lets get back to basics. 2' complement is 1's complement + 1 .
Now what is 1's complement and what is its significance in addition.
Sum of any n-bit number and its 1's complement gives you the highest possible number that can be represented by those n-bits.
Example:
0010 (2 in 4 bit system)
+1101 (1's complement of 2)
___________________________
1111 (the highest number that we can represent by 4 bits)
Now what will happen if we try to add 1 more to the result. It will results in an overflow.
The result will be 1 0000 which is 0 ( as we are working with 4 bit numbers , (the 1 on left is an overflow )
So ,
Any n-bit number + its 1's complement = max n-bit number
Any n-bit number + its 1'complement + 1 = 0 ( as explained above, overflow will occur as we are adding 1 to max n-bit number)
Someone then decided to call 1's complement + 1 as 2'complement. So the above statement becomes:
Any n'bit number + its 2's complement = 0
which means 2's complement of a number = - (of that number)
All this yields one more question , why can we use only the (n-1) of the n bits to represent positive number and why does the left most nth bit represent sign (0 on the leftmost bit means +ve number , and 1 means -ve number ) . eg why do we use only the first 31 bits of an int in java to represent positive number if the 32nd bit is 1 , its a -ve number.
1100 (lets assume 12 in 4 bit system)
+0100(2's complement of 12)
___________________________
1 0000 (result is zero , with the carry 1 overflowing)
Thus the system of (n + 2'complement of n) = 0 , still works. The only ambiguity here is 2's complement of 12 is 0100 which ambiguously also represents +8 , other than representing -12 in 2s complement system.
This problem will be solved if positive numbers always have a 0 in their left most bit. In that case their 2's complement will always have a 1 in their left most bit , and we wont have the ambiguity of the same set of bits representing a 2's complement number as well as a +ve number.
Two's complement allows addition and subtraction to be done in the normal way (like you wound for unsigned numbers). It also prevents -0 (a separate way to represent 0 that would not be equal to 0 with the normal bit-by-bit method of comparing numbers).
Two's complement allows negative and positive numbers to be added together without any special logic.
If you tried to add 1 and -1 using your method
10000001 (-1)
+00000001 (1)
you get
10000010 (-2)
Instead, by using two's complement, we can add
11111111 (-1)
+00000001 (1)
you get
00000000 (0)
The same is true for subtraction.
Also, if you try to subtract 4 from 6 (two positive numbers) you can 2's complement 4 and add the two together 6 + (-4) = 6 - 4 = 2
This means that subtraction and addition of both positive and negative numbers can all be done by the same circuit in the cpu.
this is to simplify sums and differences of numbers. a sum of a negative number and a positive one codified in 2's complements is the same as summing them up in the normal way.
The usual implementation of the operation is "flip the bits and add 1", but there's another way of defining it that probably makes the rationale clearer. 2's complement is the form you get if you take the usual unsigned representation where each bit controls the next power of 2, and just make the most significant term negative.
Taking an 8-bit value a7 a6 a5 a4 a3 a2 a1 a0
The usual unsigned binary interpretation is:
27*a7 + 26*a6 + 25*a5 + 24*a4 + 23*a3 + 22*a2 + 21*a1 + 20*a0
11111111 = 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255
The two's complement interpretation is:
-27*a7 + 26*a6 + 25*a5 + 24*a4 + 23*a3 + 22*a2 + 21*a1 + 20*a0
11111111 = -128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = -1
None of the other bits change meaning at all, and carrying into a7 is "overflow" and not expected to work, so pretty much all of the arithmetic operations work without modification (as others have noted). Sign-magnitude generally inspect the sign bit and use different logic.
To expand on others answers:
In two's complement
Adding is the same mechanism as plain positive integers adding.
Subtracting doesn't change too
Multiplication too!
Division does require a different mechanism.
All these are true because two's complement is just normal modular arithmetic, where we choose to look at some numbers as negative by subtracting the modulo.
Reading the answers to this question, I came across this comment [edited].
2's complement of 0100(4) will be 1100. Now 1100 is 12 if I say normally. So,
when I say normal 1100 then it is 12, but when I say 2's complement 1100 then
it is -4? Also, in Java when 1100 (lets assume 4 bits for now) is stored then
how it is determined if it is +12 or -4 ?? – hagrawal Jul 2 at 16:53
In my opinion, the question asked in this comment is quite interesting and so I'd like first of all to rephrase it and then to provide an answer and an example.
QUESTION – How can the system establish how one or more adjacent bytes have to be interpreted? In particular, how can the system establish whether a given sequence of bytes is a plain binary number or a 2's complement number?
ANSWER – The system establishes how to interpret a sequence of bytes through types.
Types define
how many bytes have to be considered
how those bytes have to be interpreted
EXAMPLE – Below we assume that
char's are 1 byte long
short's are 2 bytes long
int's and float's are 4 bytes long
Please note that these sizes are specific to my system. Although pretty common, they can be different from system to system. If you're curious of what they are on your system, use the sizeof operator.
First of all we define an array containing 4 bytes and initialize all of them to the binary number 10111101, corresponding to the hexadecimal number BD.
// BD(hexadecimal) = 10111101 (binary)
unsigned char l_Just4Bytes[ 4 ] = { 0xBD, 0xBD, 0xBD, 0xBD };
Then we read the array content using different types.
unsigned char and signed char
// 10111101 as a PLAIN BINARY number equals 189
printf( "l_Just4Bytes as unsigned char -> %hi\n", *( ( unsigned char* )l_Just4Bytes ) );
// 10111101 as a 2'S COMPLEMENT number equals -67
printf( "l_Just4Bytes as signed char -> %i\n", *( ( signed char* )l_Just4Bytes ) );
unsigned short and short
// 1011110110111101 as a PLAIN BINARY number equals 48573
printf( "l_Just4Bytes as unsigned short -> %hu\n", *( ( unsigned short* )l_Just4Bytes ) );
// 1011110110111101 as a 2'S COMPLEMENT number equals -16963
printf( "l_Just4Bytes as short -> %hi\n", *( ( short* )l_Just4Bytes ) );
unsigned int, int and float
// 10111101101111011011110110111101 as a PLAIN BINARY number equals 3183328701
printf( "l_Just4Bytes as unsigned int -> %u\n", *( ( unsigned int* )l_Just4Bytes ) );
// 10111101101111011011110110111101 as a 2'S COMPLEMENT number equals -1111638595
printf( "l_Just4Bytes as int -> %i\n", *( ( int* )l_Just4Bytes ) );
// 10111101101111011011110110111101 as a IEEE 754 SINGLE-PRECISION number equals -0.092647
printf( "l_Just4Bytes as float -> %f\n", *( ( float* )l_Just4Bytes ) );
The 4 bytes in RAM (l_Just4Bytes[ 0..3 ]) always remain exactly the same. The only thing that changes is how we interpret them.
Again, we tell the system how to interpret them through types.
For instance, above we have used the following types to interpret the contents of the l_Just4Bytes array
unsigned char: 1 byte in plain binary
signed char: 1 byte in 2's complement
unsigned short: 2 bytes in plain binary notation
short: 2 bytes in 2's complement
unsigned int: 4 bytes in plain binary notation
int: 4 bytes in 2's complement
float: 4 bytes in IEEE 754 single-precision notation
[EDIT] This post has been edited after the comment by user4581301. Thank you for taking the time to drop those few helpful lines!
Two's complement is used because it is simpler to implement in circuitry and also does not allow a negative zero.
If there are x bits, two's complement will range from +(2^x/2+1) to -(2^x/2). One's complement will run from +(2^x/2) to -(2^x/2), but will permit a negative zero (0000 is equal to 1000 in a 4 bit 1's complement system).
It's worthwhile to note that on some early adding machines, before the days of digital computers, subtraction would be performed by having the operator enter values using a different colored set of legends on each key (so each key would enter nine minus the number to be subtracted), and press a special button would would assume a carry into a calculation. Thus, on a six-digit machine, to subtract 1234 from a value, the operator would hit keys that would normally indicate "998,765" and hit a button to add that value plus one to the calculation in progress. Two's complement arithmetic is simply the binary equivalent of that earlier "ten's-complement" arithmetic.
The advantage of performing subtraction by the complement method is reduction in the hardware
complexity.The are no need of the different digital circuit for addition and subtraction.both
addition and subtraction are performed by adder only.
I have a slight addendum that is important in some situations: two's compliment is the only representation that is possible given these constraints:
Unsigned numbers and two's compliment are commutative rings with identity. There is a homomorphism between them.
They share the same representation, with a different branch cut for negative numbers, (hence, why addition and multiplication are the same between them.)
The high bit determines the sign.
To see why, it helps to reduce the cardinality; for example, Z_4.
Sign and magnitude and ones' compliment both do not form a ring with the same number of elements; a symptom is the double zero. It is therefore difficult to work with on the edges; to be mathematically consistent, they require checking for overflow or trap representations.
Well, your intent is not really to reverse all bits of your binary number. It is actually to subtract each its digit from 1. It's just a fortunate coincidence that subtracting 1 from 1 results in 0 and subtracting 0 from 1 results in 1. So flipping bits is effectively carrying out this subtraction.
But why are you finding each digit's difference from 1? Well, you're not. Your actual intent is to compute the given binary number's difference from another binary number which has the same number of digits but contains only 1's. For example if your number is 10110001, when you flip all those bits, you're effectively computing (11111111 - 10110001).
This explains the first step in the computation of Two's Complement. Now let's include the second step -- adding 1 -- also in the picture.
Add 1 to the above binary equation:
11111111 - 10110001 + 1
What do you get? This:
100000000 - 10110001
This is the final equation. And by carrying out those two steps you're trying to find this, final difference: the binary number subtracted from another binary number with one extra digit and containing zeros except at the most signification bit position.
But why are we hankerin' after this difference really? Well, from here on, I guess it would be better if you read the Wikipedia article.
We perform only addition operation for both addition and subtraction. We add the second operand to the first operand for addition. For subtraction we add the 2's complement of the second operand to the first operand.
With a 2's complement representation we do not need separate digital components for subtraction—only adders and complementers are used.
A major advantage of two's-complement representation which hasn't yet been mentioned here is that the lower bits of a two's-complement sum, difference, or product are dependent only upon the corresponding bits of the operands. The reason that the 8 bit signed value for -1 is 11111111 is that subtracting any integer whose lowest 8 bits are 00000001 from any other integer whose lowest 8 bits are 0000000 will yield an integer whose lowest 8 bits are 11111111. Mathematically, the value -1 would be an infinite string of 1's, but all values within the range of a particular integer type will either be all 1's or all 0's past a certain point, so it's convenient for computers to "sign-extend" the most significant bit of a number as though it represented an infinite number of 1's or 0's.
Two's-complement is just about the only signed-number representation that works well when dealing with types larger than a binary machine's natural word size, since when performing addition or subtraction, code can fetch the lowest chunk of each operand, compute the lowest chunk of the result, and store that, then load the next chunk of each operand, compute the next chunk of the result, and store that, etc. Thus, even a processor which requires all additions and subtractions to go through a single 8-bit register can handle 32-bit signed numbers reasonably efficiently (slower than with a 32-bit register, of course, but still workable).
When using of the any other signed representations allowed by the C Standard, every bit of the result could potentially be affected by any bit of the operands, making it necessary to either hold an entire value in registers at once or else follow computations with an extra step that would, in at least some cases, require reading, modifying, and rewriting each chunk of the result.
There are different types of representations those are:
unsigned number representation
signed number representation
one's complement representation
Two's complement representation
-Unsigned number representation used to represent only positive numbers
-Signed number representation used to represent positive as well as a negative number. In Signed number representation MSB bit represents sign bit and rest bits represents the number. When MSB is 0 means number is positive and When MSB is 1 means number is negative.
Problem with Signed number representation is that there are two values for 0.
Problem with one's complement representation is that there are two values for 0.
But if we use Two's complement representation then there will only one value for 0 that's why we represent negative numbers in two's complement form.
Source:Why negative numbers are stored in two's complement form bytesofgigabytes
One satisfactory answer of why Two2's Complement is used to represent negative numbers rather than One's Complement system is that
Two's Complement system solves the problem of multiple representations of 0 and the need for end-around-carry which exist in the One's complement system of representing negative numbers.
For more information Visit https://en.wikipedia.org/wiki/Signed_number_representations
For End-around-carry Visit
https://en.wikipedia.org/wiki/End-around_carry