I have signed numbers (2s complement) stored in 32-bit integers, and I want to extract 16-bit fields from them. Is it true that if I extract the low 16 bits from a 32-bit signed number, the result will be correct as long as the original (32-bit) number fits into 16 bits ?
For positive numbers it is trivially true, and it seems that for negatives as well. But can it be proven ?
Thanks in advance
Yes, in two's complement the sign bits extend "all the way" to the left. When you cast a signed short to a signed int then the number is "sign extended" and has the same value.
Example: Nibble(-2) = 1110 => Byte(-2) = 1111_1110
Obviously the opposite it true too, if you capture at least one sign bit then the value of the number remains unchanged.
From my (second) reading of your question, it doesn't seem as if you need to "extract" any bits, but rather convert the whole number?
I.e. do something like this:
int negative = -4711;
short x = (short) negative;
In this case, the compiler will make sure that as much as possible of the original number's precision gets converted in the assignment. This would be the case even if the underlying hardware was not using 2:s complement. If it is, then this is likely to just be a truncation, as explained by Motti.
Related
I am writing a generic routine for converting fixed-point numbers between decimal and binary representations.
For positive numbers the processing is simple, however when things come to negative ones I found divergent sources. Someone says there is a single bit used to hold the sign while others say the whole number should be represented in a pseudo integer using 2's complement even it is negative.
Please anyone tell me which source is correct or is there a standard representation for signed fixed point numbers?
Additionally, if the 2's complement representation was correct then how to represent negative numbers with zero integer part. For example -0.125?
Fixed-point numbers are just binary values where the place values have been changed. Assigning place values to the bits is an arbitrary human activity, and we can do it in any way that makes sense. Normally we talk about binary integers so it is convenient to assign the place value 2^0 = 1 to the LSB, 2^1=2 to the bit to the left of the LSB, and so on. For an N bit integer the place value of the MSB becomes 2^(N-1). If we want a two's-complement representation, we change the place value of the MSB to -2^(N-1) and all of the other bit place values are unchanged.
For fixed-point values, if we want F bits to represent a fractional part of the number, then the place value of the LSB becomes 2^(0-F)
and the place value of the MSB becomes 2^(N-1-F) for unsigned numbers and -2^(N-1-F) for signed numbers.
So, how would we represent -0.125 in a two's-complement fixed-point value? That is equal to 0.875 - 1, so we can use a representation where the place value of the MSB is -1 and the value of all of the other bits adds up to 0.875. If you choose a
4-bit fixed-point number with 3 fraction bits you would say that
1111 binary equals -0.125 decimal. Adding up the place values of the bits we have (-1) + 0.5 + 0.25 + 0.125 = -0.125. My personal preference is to write the binary number as 1.111 to note which bits are fraction and which are integer.
The reason we use this approach is that the normal integer arithmetic operators still work.
It's easiest to think of fixed-point numbers as scaled integers — rather than shifted integers. For a given fixed-point type, there is a fixed scale which is a power of two (or ten). To convert from the real value to the integer representation, multiply by that scale. To convert back again, simply divide. Then the issue of how negative values are represented becomes a detail of the integer type with which you are representing your number.
Please anyone tell me which source is correct...
Both are problematic.
Your first source is incorrect. The given example is not...
the same as 2's complement numbers.
In two’s complement, the MSB's (most significant bit's) weight is negated but the other bits still contribute positive values. Thus a two’s complement number with all bits set to 1 does not produce the minimum value.
Your second source could be a little misleading where it says...
shifting the bit pattern of a number to the right by 1 bit always divide the number by 2.
This statement brushes over the matter of underflow that occurs when the LSB (least significant bit) is set to 1, and the resultant rounding. Right-shifting commonly results in rounding towards negative infinity while division results in rounding towards zero (truncation). Both produce the same behavior for positive numbers: 3/2 == 1 and 3>>1 == 1. For negative numbers, they are contrary: -3/2 == -1 but -3>>1 == -2.
...is there a standard representation for signed fixed point numbers?
I don't think so. There are language-specific standards, e.g. ISO/IEC TR 18037 (draft). But the convention of scaling integers to approximate real numbers of predetermined range and resolution is well established. How the underlying integers are represented is another matter.
Additionally, if the 2's complement representation was correct then how to represent negative numbers with zero integer part. For example -0.125?
That depends on the format of your integer and your choice of radix. Assuming a 16-bit two’s complement number representing binary fixed-point values, the scaling factor is 2^15 which is 32,768. Multiply the value to store as an integer: -0.125*32768. == -4096 and divide to retrieve it: -4096/32768. == -0.125.
I already know the concept of negative binary numbers, The 0 at the position most significant bit represents that the binary is positive and 1 at the position of most significant bit represents that the binary is negative.
BUT THE PROBLEM THAT INTIMIDATED ME TO ASK A QUESTION ON STACKOVERFLOW IS THIS:
what about the times that we might want to represent a huge number that it's representation has occurred to have 1 in msb.
let me explain it in this way: by considering the above rule for making negative counterparts of our binary numbers we could say that ;
in an 8-bit system we have, For example, a value of positive 12 (decimal) would be written as 00001100 in binary, but negative 12 (decimal) would be written as
10001100 but what makes me confused a bit is that 10001100 could also be interpreted as 268 in decimal while we know that its the negative form of 12 in binary using this method of conversion.
I just want to know how to deal with this tricky, two-faced possible ways of interpreting a binary number, just like the example i gave above(it seem's to be negative, OH! but wait it might also not be:).
It depends on the type you use. If you're using an 8-bit representation which is signed, then the largest number you can store is 1111111 (i.e. the first bit is set aside).
In our example, that would convert to an integer value of 127.
If you use an unsigned type, then you have the extra bit available, allowing you to store 11111111, or 255 expressed as an integer.
A strongly typed language with a good compiler should catch you trying to assign, say, 134 to a signed 8 bit integer and vomit errors all over you.
If you're doing something strange fiddling around with bits yourself, you're on your own! There's no way of reconstructing, post hoc, whether it was intended to be a negative or a large positive, so you'll have to choose a system and stick with it.
The general convention nowadays is to stick with signed representations always - although I have seen code (usually for extreme compute tasks like astrophysical calculations) use unsigned values simply to save memory. And of course images will use unsigned values by convention, usually.
I'm learning Integer data formats in a computer science book and as far as I understand that binary representation of a integer whether it is positive or negative is to have the leftmost bit (msb) be either a 0 for positive or 1 for negative, lets say in a 8-bit computer how would I know if it is talking about 10000010 - 130 in base 10 or if it is referring to negative 2?
I might be wrong, if i'm please correct me.
If you were to just see the string 10000010 somewhere, I don't know... written on a wall or something, how would you know how to interpret it?
You might say, hey, that's ten million and ten ( you thought it was base 10 ) or you might say hey, that's -126 ( you thought it was two's complement binary ), or you might say that's positive 130 ( you thought it was standard binary ).
It is, in a theoretical sense, up to whatever is doing the interpreting how it is interpreted.
So, when a computer is holding 8 bits of data, it's up to it how it interprets it.
Now if you're programming, you can tell the computer how you want something interpreted. For example, in c++
// char is 1 byte
unsigned char x = 130u;
Here I have told the compiler to put 130 unsigned into a byte, so the computer will store 10000010 and will interpret it as the value 130
Now consider
// char is 1 byte
char x = -126;
Here I have told the compiler to put -126 signed into a byte, so the computer will again store 10000010 but this time it will interpret it as the value -126.
Take a look at the answer posted to this question: Negative numbers are stored as 2's complement in memory, how does the CPU know if it's negative or positive?
The CPU uses something called an opcode in order to determine which function it will take when manipulating a memory location (in this case, the value 10000010). It is that function within the CPU that will either manipulate it as a negative or a positive number. The CPU doesn't have access to whether or not the number is signed or unsigned - it uses the op code when manipulating that number to determine whether or not it should be a signed or unsigned operation.
I have faced an interview question related to embedded systems and C/C++. The question is:
If we multiply 2 signed (2's complement) 16-bit data, what should be the size of resultant data?
I've started attempting it with an example of multiplying two signed 4-bit, so, if we multiply +7 and -7, we end up with -49, which requires 7 bits. But, I could not formulate a general relation.
I think I need to understand binary multiply deeply to solve this question.
First, n bits signed integer contains a value in the range -(2^(n-1))..+(2^(n-1))-1.
For example, for n=4, the range is -(2^3)..(2^3)-1 = -8..+7
The range of the multiplication result is -8*+7 .. -8*-8 = -56..+64.
+64 is more than 2^6-1 - it is 2^6 = 2^(2n-2) ! You'll need 2n-1 bits to store such POSITIVE integer.
Unless you're doing proprietary encoding (see next paragraph) - you'll need 2n bits:
One bit for the sign, and 2n-1 bits for the absolute value of the multiplication result.
If M is the result of the multiplication, you can store -M or M-1 instead. this can save you 1 bit.
This will depend on context. In C/C++, all intermediates smaller than int are promoted to int. So if int is larger than 16-bits, then the result will be a signed 32-bit integer.
However, if you assign it back to a 16-bit integer, it will truncate leaving only bottom 16 bits of the two's complement of the new number.
So if your definition of "result" is the intermediate immediately following the multiply, then the answer is the size of int. If you define the size as after you've stored it back to a 16-bit variable, then answer is the size of the 16-bit integer type.
5 (decimal) in binary 00000101
-5 (two's complement) in binary 11111011
but 11111011 is also 251 (decimal)!
How does computer discern one from another??
How does it know whether it's -5 or 251??
it's THE SAME 11111011
Thanks in advance!!
Signed bytes have a maximum of 127.
Unsigned bytes cannot be negative.
The compiler knows whether the variable holding that value is of signed or unsigend type, and treats it appropriately.
If your program chooses to treat the byte as signed, the run-time system decides whether the byte is to be considered positive or negative according to the high-order bit. A 1 in that high-order bit (bit 7, counting from the low-order bit 0) means the number is negative; a 0 in that bit position means the number is positive. So, in the case of 11111011, bit 7 is set to 1 and the number is treated, accordingly, as negative.
Because the sign bit takes up one bit position, the absolute magnitude of the number can range from 0 to 127, as was said before.
If your program chooses to treat the byte as unsigned, on the other hand, what would have been the sign bit is included in the magnitude, which can then range from 0 to 255.
Two's complement is designed to allow signed numbers to be added/substracted to one another in the same way unsigned numbers are. So there are only two cases where the signed-ness of numbers affect the computer at low level.
when there are overflows
when you are performing operations on mixed: one signed, one unsigned
Different processors take different tacks for this. WRT orverflows, the MIPS RISC architecture, for example, deals with overflows using traps. See http://en.wikipedia.org/wiki/MIPS_architecture#MIPS_I_instruction_formats
To the best of my knowledge, mixing signed and unsigned needs to avoided at a program level.
If you're asking "how does the program know how to interpret the value" - in general it's because you've told the compiler the "type" of the variable you assigned the value to. The program doesn't actually care if 00000101 as "5 decimal", it just has an unsigned integer with value 00000101 that it can perform operations legal for unsigned integers upon, and will behave in a given manner if you try to compare with or cast to a different "type" of variable.
At the end of the day everything in programming comes down to binary - all data (strings, numbers, images, sounds etc etc) and the compiled code just ends up as a large binary blob.