Can unsigned multiplication be performed using a signed multiplier? - binary

For example the RISC-V ISA define four different multiplication instructions:
MUL: signed x signed
MULH: signed x signed
MULHU: unsigned x unsigned
MULHSU: signed x unsigned
My question is: is it possible to reuse a single signed multiplier to perform all the instructions above? Or the opposite (using a unsigned multiplier).

Related

Unexpected CUDA_ERROR_INVALID_VALUE from cuLaunchKernel()

I'm trying to launch a kernel using the CUDA driver API. Specifically I'm calling
CUresult CUDAAPI cuLaunchKernel(
CUfunction f,
unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ,
unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ,
unsigned int sharedMemBytes,
CUstream hStream,
void **kernelParams,
void **extra);
I'm only using kernelParams, and passing nullptr for extra. Now, for one of my kernels, I get CUDA_ERROR_INVALID_VALUE.
The documentation says:
The error CUDA_ERROR_INVALID_VALUE will be returned if kernel parameters are specified with both kernelParams and extra (i.e. both kernelParams and extra are non-NULL).
well, I'm not doing that, and am still getting CUDA_ERROR_INVALID_VALUE. To be extra-safe, I synch'ed the stream right before launching the kernel - but to no avail.
What are the other reasons for getting CUDA_ERROR_INVALID_VALUE when trying to launch?
Apparently, you can get a CUDA_ERROR_INVALID_VALUE error in multiple cases involving issues with your kernelParams and/or extras arguments:
Both kernelParams and extras are null, but the kernel takes parameters.
Both kernelParams and extras are non-null (this is what's officially documented)
The number of elements in kernelParams before the terminating nullptr value doesn't match the number of kernel parameters.
and this is not an exhaustive list. Probably misusing extras can cause this too.

What is the difference between signed and unsigned binary

I've been reading a few sites, but none of them make sense to me. Is signed and unsigned binary them same as signed and unsigned variables. I'd be glad if you could help :)
The "signed" indicator means that the item can hold positive or negative values. "Unsigned" doesn't distinguish between positive and negative values. A signed/unsigned variable can refer to any numerical data type (such as binary, integer, float, etc). Each data type might be further defined as signed or unsigned.
For example, an 8-bit signed binary could hold values from 0-127, both positive and negative (1 bit is used for the sign and 7 bits for the value), while an 8-bit unsigned binary could hold values from 0-255 (nothing distinguishes whether or not the value should be considered positive or negative, though it is commonly assumed to be positive).
A signed binary is a specific data type of a signed variable.
Hope that helps!
A "signed" variable means that the value holds a positive or negative value using it's most significant bit (the last bit to the left), which is what we call the "signed bit". An "unsigned" variable does not, but instead the most significant bit is just the next power of two.
We call a signed bit that is 1 a negative number whereas on an unsigned number the bit would fall under the regular binary bit rules.
For example max values look like this: Unsigned Char 0b11111111 (0xFF in hex) = 255 in decimal, (128+64+32+16+8+4+2+1 = 255)
Signed Char 0b11111111 (0xFF in hex) = -127 in decimal, (-1 * (64+32+16+8+4+2+1) = - 127)
Additionally what you might see in code:
Unsigned Char 0b10000001 (0x81 in hex) = 129 in decimal, (128 + 1 = 129)
Signed Char 0b10000001 (0x81 in hex) = -1 in decimal, (-1 * 1)
(Note: char is one byte which means it has eight digits in binary that can be changed)
(For anyone who is wondering, 0b means the bit is in binary and 0x means it is in hex)
Signed and Unsigned Binary refers to the conversion that depends on sign of the binary represented. Whereas for the variables it refers to having the variable able to store the negative value or not.
In Binary for signed bit: We say 1 is negative and 0 is positive. So if you see second example, the first bit is 1 means? - right, its negative. And we dont include it for the conversion base2 to base10.
For example: 1001
In Unsigned bit (dont care about sign) : 9
For example: 1001
In Signed bit (MSB is a sign bit): -1
For variables is it very likely that stores negative numbers.
MSB: Most Significant Bit
It depends on the position or situation. Example,in assembly, We want to load byte have value: 0xFF(~11111111 in binary) from memory. $s3 have address of this value.
with func lbu( load byte unsignal ), it only allows to load unsignal binary: lb rt, offset(rs).
lbu $s0, 32($s3) : lbu will load value and 0-extend to 32 bit 0x000000FF which is interpreted as 255.
with func addi, it allows to load signal binary: lb rt, ofset(rs).
lb $s0, 32($s3) : lb will load value and 1-extend to 32 bit 0xFFFFFFFF which is interpreted as -1.

Can an unsigned long long int be used to store the output from clock64()?

I need to update a global array storing clock64() from different threads atomically. All of the atomic functions in CUDA support only unsigned for long long int sizes. But the return type of clock64() is signed. Is it safe to store the output from clock64() in an unsigned?
There are various atomic functions which support atomic operations on unsigned long long int (ie. a 64-bit unsigned integer), such as atomicCAS, atomicExch and atomicAdd. And if you have a cc3.5 or higher GPU you have even more options.
Referring to the documentation on clock64():
long long int clock64(); when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle.
So, since it is a 64-bit signed quantity, it is bit-wise identical to an unsigned long long int until it becomes negative. Let's assume the counter is reset to zero either at the start of your kernel, the start of the cuda context, or machine power-on. This counter will not become negative until around:
2^63(cycles)/1,000,000,000(cycles/s) = ~292 years after whichever of the above events is the actual reset point.
(I'm using 1GHz here as an estimate of the GPU core clock)
So for the first 200-300 years (after machine power-on, let's say), the clock64() function will not return a negative value. So I'd say it's pretty safe to consider it as "always" positive, and therefore always identical to unsigned long long int, meaning you can safely cast it to that, and use it in one of the atomic functions that support unsigned long long int.
On the other hand, it's probably not safe to cast it into an unsigned quantity. That arithmetic would be:
2^32(cycles)/1,000,000,000(cycles/s) = ~4 seconds (after machine power on)
So in about 4 seconds, the clock64() function will numerically exceed the value that can be safely recorded in an unsigned quantity.

cuda integer of 16 bits

is there any 16 bits long variable in CUDA? I need an unsigned integer of 16 bits. I've tried:
uint16
uint16_t
But no one is recognized by nvcc.
May be you should try ordinary c unsigned short?
CUDA 8 (compute capability 6.x) comes with half-precision intrinsics. You can use the 16-bit floating point data type half or the integral types short2 / char4. These mixed precision types are packed into 32-bit device registers, which can double your performance over just using unsigned short.

Funnel shift - what is it?

When reading through CUDA 5.0 Programming Guide I stumbled on a feature called "Funnel shift" which is present in 3.5 compute-capable device, but not 3.0. It contains an annotation "see reference manual", but when I search for the "funnel shift" term in the manual, I don't find anything.
I tried googling for it, but only found a mention on http://www.cudahandbook.com, in the chapter 8:
8.2.3 Funnel Shift (SM 3.5)
GK110 added a 64-bit “funnel shift” instruction that may be accessed with the following intrinsics:
__funnelshift_lc(): returns most significant 32 bits of a left funnel shift.
__funnelshift_rc(): returns least significant 32 bits of a right funnel shift.
These intrinsics are implemented as inline device
functions (using inline PTX assembler) in sm_35_intrinsics.h.
...but it still does not explain what the "left funnel shift" or "right funnel shift" is.
So, what is it and where does one need it?
In the case of CUDA, two 32-bit registers are concatenated together into a 64-bit value; that value is shifted left or right; and the most significant (for a left shift) or least significant (for right shift) 32 bits are returned.
The intrinsics from sm_35_intrinsics.h are as follows:
unsigned int __funnelshift_lc(unsigned int lo, unsigned int hi, unsigned int shift);
unsigned int __funnelshift_rc(unsigned int lo, unsigned int hi, unsigned int shift);
According to Andy Glew (dead link removed), applications for funnel shift include fast misaligned memcpy; and as njuffa mentions in the comments above, it can be used to implement rotate if the two input words are the same.