Why does Xilinx's Multiplier IP product bitwidth have an extra bit? - binary

Xilinx's complex multiplier IP documentation (PG104) has this to say about input and output bit-width setting:
Output Width: Selects the width of the output product real and imaginary
components. The values are automatically initialized to provide the full-precision
product when the A and B operand widths are set. The natural width of a complex
multiplication is the sum of the input widths plus one. If Output Width is set to be
less than this natural width, the least significant bits are truncated or rounded, as
selected by the next GUI field.
(Italics are mine.) So if I multiply an 8-bit number by another 8-bit number, it wants to have the full precision output be a 17-bit number. The inputs and outputs are assumed to be signed integers.
The largest magnitude signed number that 8 bits can represent is -128 (0x80). 128*128=16384 or 0x4000 which is 15 bits. Add a sign bit and we're safe with a 16-bit output.
The largest positive is 127 (0x3F). 127*127=16129 or 0x3F01. Again, safe with 16 bits.
What am I missing? Why do they insist on the extra bit?

It is a complex multiplication. That is sum of two multiplications.
pr = ar x br - ai x bi
pi = ar x bi + ai x br
The sum adds the extra bit.

Related

Given the following, how do I find the number of normalized floating-point numbers and why?

I'm trying to understand how floating point number arithmetic plays a role in computer science when using the binary system. I came across an excerpt from What Every Computer Scientist Should Know About Floating-Point Arithmetic which defines normalized numbers as unique floating-point numbers with the leading significand being non-zero. It goes on to say...
When β
= 2, p = 3, e min = -1 and e max = 2 there are 16 normalized floating-point numbers, as shown in Figure D-1.
Where β is the base, p is the precision, e min is the minimum exponent, and e max is the maximum exponent.
My attempt at understanding how he came to the conclusion of there being 16 normalized floating-point numbers was to multiply together the possible number of significands β^p and the possible number of exponents e max - e min + 1. My result was 32 possible normalized floating-point values. I am unsure of how to get the correct result of 16 normalized floating-point values as was declared in the paper above. I assumed negative floating-point values were excluded, however, I did not include them in my calculations.
This question is more geared toward mathematical formulae. But it will help me to better understand how floating-point arithmetic works in computer science.
I would like to know how to get the correct result of 16 normalized floating-point numbers and why.
Since the first bit is always 1, with 3 bits for the mantissa you have only two bits to vary, yielding 4 different mantissa values. Combined with 4 different exponent values that's 16. I haven't looked at the paper though.
My attempt at understanding how he came to the conclusion of there being 16 normalized floating-point numbers was to multiply together the possible number of significands β^p and the possible number of exponents e max - e min + 1
This is correct except that the number of possible significands is not βp in binary with an implicit leading 1. In these conditions, the number of possible significands is βp-1, encoded over p-1 bits.
In other words, the missing values for the possible significands have already been taken advantage of when the encoding reserved, say, 52 bits to encode a precision of 53 binary digits.

Calculate which numbers cause problems when converting decimal to floating point?

I know when converting some numbers from binary to floating point there can be problems.
For example: 1.4 results in 1.39999. 6.2 is 6.1999999, 6.6 is 6.599999, etc.
Is there a way to calculate which numbers will cause these problems? Like create some sort of spreadsheet or database with numbers from 1-50,000 which don't convert exactly?
The errors in floating point calculations are rooted in the way that floating point numbers are stored. Depending on which precision you are using (usually either single (float) or double-precision). Double-precision floats take more space, but are roughly twice as precise.
Floating point numbers are typically stored in an IEEE format and thus only the most significant bits are included.
In a string of bits stored in this way to represent a floating point number, there are different parts that comprise the number. First, there is a sign bit indicating whether the number is positive or negative. Next, the exponent portion of the number is stored (in single-precision floats, this takes 8 bits). After the exponent, all remaining bits represent the significant digits of the number. Thus, the bits 1 10000000 01000000000000000000000 represent -0.5 The sign bit (first one) signifies that the number is negative. The next eight bits are the exponent. Usually, the exponent is stored with a bias so that small numbers may be stored precisely in addition to large ones. To do this, the exponent used for calculations is -127 (if an 8 bit exponent) + the exponent bits interpreted as an integer (in our case 128). All remaining bits signify the significant digits of the number starting with the ones place and moving rightward from there we cut the value in half each time (1 is 1, but 01 is 1/2 and 001 is 1/4) in our case the total number is -0.5 (-1 for the sign bit * 2^1 for the exponent * 0.5 for the remaining bits)
For further example, here is a converter that uses checkboxes to indicate the bits. At worst, you could write a bot to test all of the combinations you care about.
In general, if it cannot be described with a fraction that is not directly made of combinations of powers of two, it will be rounded. (If you can't get it with 1/2 + 1/4 + 1/8 etc. it will probably be rounded)
Almost all numbers are troublesome. The minor exception are those, that when repeatedly multiplied by 2 get rid of the fractional part and end up being less than 2^24.
e.g.
0.125 -> 0.25 -> 0.5 -> 1.0 ! OK
6.4 --> 12.8 --> 25.6 -> 51.2 -> 102.4 : we got a cycle! (.4 .8 .6 .2 ... )
EDIT
Given the purpose/context of the issue, the inexactness seems to cause trouble when the floating point is rounded towards zero.
e.g.
10.2 == 10.199999809, when the next floating point would be
next == 10.200000763 <-- the difference to the wanted value is ~4 times higher
vs.
10.3 == 10.300000197, when the previous (rounded down fp would be)
prev == 10.299992370, <-- this would have been also ~4x further from away
Perhaps it's time to talk to the programmer of the CNC driver...

when two 16-bit signed data are multiplied, what should be the size of resultant?

I have faced an interview question related to embedded systems and C/C++. The question is:
If we multiply 2 signed (2's complement) 16-bit data, what should be the size of resultant data?
I've started attempting it with an example of multiplying two signed 4-bit, so, if we multiply +7 and -7, we end up with -49, which requires 7 bits. But, I could not formulate a general relation.
I think I need to understand binary multiply deeply to solve this question.
First, n bits signed integer contains a value in the range -(2^(n-1))..+(2^(n-1))-1.
For example, for n=4, the range is -(2^3)..(2^3)-1 = -8..+7
The range of the multiplication result is -8*+7 .. -8*-8 = -56..+64.
+64 is more than 2^6-1 - it is 2^6 = 2^(2n-2) ! You'll need 2n-1 bits to store such POSITIVE integer.
Unless you're doing proprietary encoding (see next paragraph) - you'll need 2n bits:
One bit for the sign, and 2n-1 bits for the absolute value of the multiplication result.
If M is the result of the multiplication, you can store -M or M-1 instead. this can save you 1 bit.
This will depend on context. In C/C++, all intermediates smaller than int are promoted to int. So if int is larger than 16-bits, then the result will be a signed 32-bit integer.
However, if you assign it back to a 16-bit integer, it will truncate leaving only bottom 16 bits of the two's complement of the new number.
So if your definition of "result" is the intermediate immediately following the multiply, then the answer is the size of int. If you define the size as after you've stored it back to a 16-bit variable, then answer is the size of the 16-bit integer type.

Trying to understand why page sizes are a power of 2?

I read this:
Recall that paging is implemented by
breaking up an address into a page and
offset number. It is most efficient to
break the address into X page bits and
Y offset bits, rather than perform
arithmetic on the address to calculate
the page number and offset. Because
each bit position represents a power
of 2, splitting an address between
bits results in a page size that is a
power of 2.
I don't quite understand this answer, can anyone give a simpler explanation?
If you are converting a (linear) address to page:offset, you want to divide the address by the page size and take the integer answer as the page, and the reminder as the offset.
This is done using integer division and modulus (MOD, "%") operators in your programming language.
A computer represents an address as a number, stored as binary bits.
Here's an example address: 12 is 1100 in binary.
If the page size is 3, then we'd need to calculate 12/3 and 12%3 to find the page and offset (4:0 respectively).
However, if the page size is 4 (a power of 2), then 4 in binary is 100, and integer division and modulus can be computed using special 'shortcuts': you can strip the last two binary digits to divide, and you can keep only the last two binary digits for modulus. So:
12/4 == 12>>2 (shifting to remove the last two digits)
12%4 == 12&(4-1) (4-1=3 is binary 11, and the '&' (AND) operator only keeps those)
Working with powers of two leads to more efficient hardware so that's what hardware designers do. Consider a cpu with a 32-bit address, and an n-bit page number:
+----------------------+--------------------+
| page number (n bits) | byte offset (32-n) |
+----------------------+--------------------+
This address is sent to the virtual memory unit, which directly separates the page number and byte offset without any arithmetic operations at all. Ie, it treats the 32-bit value as an array of bits, and has (more or less) a wire running directly to each bit. This allows the memory hardware to extract the page number and byte offset in parallel, without performing any arithmetic operations.
At the same time, this approach requires splitting on a bit boundary, which leads directly to power-of-2 page sizes.
If you have n binary digits at your disposal, then you can encode 2n different values.
Given an address, your description states some bits will be used for the page, and some for the offset. As you're using a whole number of binary bits for the for offset Y, then the page size is naturally a power of 2, specifically 2Y.
Because the representation of the data. The page offset sets the size of the page, and since the data is represented in binary, you will have a number n of bits to define the offset, so you will have pages with a size of 2^n.
Let's suppose you have an address like this 10011001, and you split it in 1001:1001 for page:offset. Since you have 4 bits to define the offset, your page's size is 2⁴.
If it is not exact power of 2 then some memory address will be invalid. eg if the page size is 5 byte then to distinguish each byte we need 3 bit in the offset part of the address. Because using 2 bit only 4 byte can be addressed. But using 3 bit offset for 5 byte page keep two address unused. thats why page size should be .......
since all addresses are in binary and are divided into f and d, f=frame no, d=offset. due to be the size in power of 2 a human no need to go for huge mathematical calculation, just by watching the address bits you can identify the f and d. If page size was 2^n the in physical address last n bits represents the offset and remaining bits represent page numbers.
Hopefully this answer will help you.
I realized what I didn't get was how the page sizes are a power of 2. I found out afterwards:
At the smallest level, the page is 1 bit (0 or 1) and the offset is 1 bit (0 or 1). Combining these together, the page size would be 2 x 2 (or 2^2), because of the number of bits they both have (2 each, so 2 x 2).
Now if the page/offset were larger, then it would be n x n -- n being the number of bits they both have.
The page no. is always in the power of 2 as (2^n).
There are several reasons:
1.The memory address is also in 2^n so it will be more easy to move a page.
2.There are two bits which represent the page:
a.Page no. b.Offset no
So,This is also the reason.
3.Memory is in quantised form like charge(q).

When I'm multiplying a float using multu, should I ignore the result in the LO register?

In our project, we take two floats from the user, store them in integer registers, and treat them as a IEEE 754 single precision floats, manipulating the bits by masking. So after I multiply the 23 bits of fraction value, should I take into account the result placed in the LO register if I want to return a single precision float (32 bits) as the product?
First off, I hope you mean 24 bits of value, since you'll need to include the implicit mantissa bit in your multiplication.
Second, if you you want your multiplication to be correctly rounded, as in IEEE-754, you will (sometimes) need the low part of the multiply in order to deliver the correct rounded result.
On the other hand, if you don't need to implement correct rounding, and you left-shift your fraction bits before multiplication, you will be able to ignore the low word of the result.