I am struggling to understand how to create a 2D array in MIPS using Single Point numbers.
I get how to create a 2D array by itself but can not figure out how to make it able to have single point numbers instead of just integers.
Single precision floating point numbers are 4 bytes long just like integer words, 4 bytes long. An array, whether 1D or 2D or 3D.., whether characters/bytes, integers or single precision floating point, simply uses memory — as much memory as needed for all the elements in the array.
Memory doesn't care how you use the bytes — it is up to the program to use the array in a meaningful way. If you want to copy the array to another memory location, the equivalent of memcpy or memmove is appropriate. If you want to do floating point arithmetic on elements of the array, then you would use floating point instructions.
Basically you don't have to do anything except have some memory to have single precision floating point array. You might want to initialize the array, though if it is declared as a global array (e.g. using .space 400), all the bytes will be zero in value, and that corresponds to the integer 0 as well as the floating point value 0.0. Just like an integer array, you would want the array to start aligned on a 4 byte boundary, which .space does not do, but .align 2 in front of that will work.
Related
If I have an encoder with 8 data inputs, what is its maximum number of outputs?
I know that an encoder is a combinational circuit that performs the reverse operation of a decoder. It has a maximum of 2^n input lines and ‘n’ output lines, hence it encodes the information from 2^n inputs into an n-bit code. Since I have 8 data input, the output will be 3, since 2^3 = 8. Is that the correct assumption?
Let's try to tease apart the concepts of one hot (decoded) lines and an encoding using a number of bits. Both these concepts are a way to represent information, but their form and typical usage is different.
One hot is a technique wherein at most one line is 1/true and all the other lines are 0/false. These one hot lines are not considered digits in a number, but rather individual signals or conditions (only one of which is can be true at any given time). This form is particularly useful in certain circuits, as each of the one hot lines can activate some other hardware. (A hardware lookup table (LUT), a RAM or ROM may use one-hot within its internal array indexing.)
Encoding is a technique where we use N lines as digits in an N-bit number, as would be found in a CPU register holding a number, or as we might write normal binary numbers in text. By contrast, in this form any of the N bits can be 1 (or 0).
Simple encoders & decoders translate between encoded form (N-bit numbers) and one hot form (2N lines).
... encoder ... has a maximum of 2^n input lines and ‘n’ output lines
In your statement, the 2^n input lines are in one hot form, while the output lines are normal numbers in binary (i.e. encoded).
Both the inputs (2^n lines) and the outputs (n lines) are capable of representing exactly 2^n different values! As a result, decode/encode is a 1:1 mapping, back & forth. (It would be an error to have multiple hots on the input side of such a decoder, and bad things would happen in a system that allowed that.)
In the formulas you're speaking to: 2N = V, and N = log2 ( V ) — N stands for number of bits (a bit is a binary digit), and V stands for number of values that can be represented in N bits.
(While the 2's in these formulas are for binary — substitute 2 with 10 for the same relationships for number of decimal digits vs. number of values those number of digits can represent/store/communicate).
In one hot form we need V number of lines, whereas in encoded form we need N lines (as bits/digits) to represent the same information (one of V different values).
Consider whether a number you're looking is a digit count (as with N) or a value count (as with V).
And bear in mind that in one hot form, we need one line for each possible value, V (whereas in encoded form we need N bits for V possible values).
A MIPS processor will feed the 6 bit opcode field into a lookup table of some sort, in order to determine which set of control signals to activate for any given instruction. (The opcode field is not one hot, but rather a bit field of N=6 bits).
These control signals are (also) not one hot, and the MIPS instruction decoder is not using a simple decoder, but rather a mapper that goes between encoded opcode values and effectively encoded control signals — this mapping is accomplished by lookup in a table.
These control signals are individual boolean values rather than as a set either one-hot or an encoded number. One hot may be used internally in indexing of this mapping. This mapping is basically an array lookup where the index is the opcode and each array element has all the individual control signal values appropriate its index.
(R-Type instructions all share a common opcode value, so when the R-Type opcode value is present, then additional lookup/mapping is done on the func bit field to generate the proper control signals.)
How to allocate pinned memory to a 2-dimensional array using CudaMallocHost?
Looking forward to any help!
(Host) memory is one-dimensional. Just like you allocate n * m * sizeof(T) bytes for a two-dimensional, n-by-m, array of type-T elements, with malloc() (or new[], or std::make_unique()) - you do the same with cudaMallocHost().
Now, it's true that the above is not the only way to model a two-dimensional array. As explained in the C FAQ, question 6.16, we may sometimes use an array-of-pointers, each of which points to a 1-D array of the minor dimension. This too can be done using cudaMallocHost() - again, by simply substituting it for malloc(). However, note this indirection has a performance penalty.
If you want array rows to be nicely-aligned, you might want to pad each row with some unused elements; but that's again true both for regular host-side memory allocation and for cudaMallocHost().
Is it possible to assign value to a texture memory, for a non-integer co-ordinate?
i.e. assume we have a 1 Dimensional texture memory array. I understand we can allocate array elements at integer co-ordinates. We can then READ values at fractional co-ordinates, using linear interpolation.
My question is: does CUDA allow the programmer to WRITE values to fractional co-ordinates?
Thanks.
It is not possible to write to fractional coordinates. There would be nowhere for the hardware to store the new values. Even though you can read with linear interpolation, the values between which interpolation is being performed can only be stored at integer locations in memory.
One way to implement this might be to write a kernel that reads your initial array of values and creates a higher resolution array with interpolated values. Then, you write your new values in this new array at the integer locations that are closest to the ones you actually want to write to.
The discussion is restricted to compute capability 2.x
Question 1
The size of a curandState is 48 bytes (measured by sizeof()). When an array of curandStates is allocated, is each element somehow padded (for example, to 64 bytes)? Or are they just placed contiguously in the memory?
Question 2
The OP of Passing structs to CUDA kernels states that "the align part was unnecessary". But without alignment, access to that structure will be divided into two consecutive access to a and b. Right?
Question 3
struct
{
double x, y, z;
}Position
Suppose each thread is accessing the structure above:
int globalThreadID=blockIdx.x*blockDim.x+threadIdx.x;
Position positionRegister=positionGlobal[globalThreadID];
To optimize memory access, should I simply use three separate double variables x, y, z to replace the structure?
Thanks for your time!
(1) They are placed contiguously in memory.
(2) If the array is in global memory, each memory transaction is 128 bytes, aligned to 128 bytes. You get two transactions only if a and b happen to span a 128-byte boundary.
(3) Performance can often be improved by using an struct of arrays instead of an array of structs. This justs means that you pack all your x together in an array, then y and so on. This makes sense when you look at what happens when all 32 threads in a warp get to the point where, for instance, x is needed. By having all the values packed together, all the threads in the warp can be serviced with as few transactions as possible. Since a global memory transaction is 128 bytes, this means that a single transaction can service all the threads if the value is a 32-bit word. The code example you gave might cause the compiler to keep the values in registers until they are needed.
Just like topic says. Can one access CUDA texture using integer coordinates?
ex.
tex2D(myTex, 1, 1);
I'd like to store float values in texture, and use it as my framebuffer.
I will pass it to OpenGL than to render on a screen.
Is this addressing possible? I don't want to interpolate between pixels. I want value from exactly specified point.
Note: there isn't really interpolation going on when you use the 0.5 offset notation for multi-dimensional textures (the actual pixel values start at (0.5, 0.5)). If you're really worried, set round-to-nearest point rather than default of bilinear.
If you use 1D textures instead (when the underlying data is 2D), you may lose performance due to lack of data locality in the other dimension.
If you want to use the texture cache without using any of the texture-specific operations such as interpolation, you can use tex1Dfetch(). This lets you index with integers.
The size limit is 2^27 elements, so you will be able to access 512 MB with floats, or 1GB with int2 [which can also be used to retrieve doubles via __hiloint2double()]. Larger data can be accessed by mapping multiple textures on top of it that cover the data.
You will have to map any multi-dimensional array accesses to the one-dimensional array supported by tex1Dfetch(). I have always used simple C macros for that.