How would one go about implementing an add immediate in Verilog for an ALU? - mips

I'm working with a 32-bit ALU for a MIPS processor.
I've read Pong Chu's book on verilog and other texts but I haven't really come across a concrete answer as to how exactly I would implement an add immediate with verilog?
for example with the asm code:
addi Y, A, immediate
add is as simple as y = a + b
but how do i interpret an immediate operand?

In overview, you can implement different operand capability for a function such as this in the following way:
Implement an add function where the operands are fed via a multiplexer. The multiplexer will have a few inputs, one of which will be an immediate value from your instruction word. Use the op code part of your instruction word to select which multiplexer input to use for the addition.
Other inputs to the multiplexer might be the output of a 'registers' memory, a forwarding path from somewhere else in your processor, etc.
I have not provided any code, but this would be completely dependent on what existing structure you already had. hopefully this overview will be enough to put you on the right track.
The wikipedia page on the MIPS architecture has a diagram showing multiplexers used in this way.

Related

Extending MIPS one-cycle data path to implement movcn

I have trouble implementing the movcn instruction in MIPS. (MIPS One-Cycle Datapath)
Here is how the instruction is defined:
R[rd] = R[rs] if R[rt] < 0
I am not sure what to use to compare if R[rt] < 0. Should I add a comparator in the path?
I think we're in the same UdeM class! Movcn isn't native to MIPS.
You already have a comparator in the datapath; the ALU. Consider that your read data 2 output from the Register File (RD2) should be changed to zero before being inputted into the ALU, if a certain signal is recieved indicating that the instruction is movcn.
I'm not gonna say anything else, but hopefully this helps you out enough to set you on the right track. Good luck with the homework, and godspeed.

How to transfer a part of a device array to another device array using PGI cuda fortran compiler?

Now I want to transfer a piece of a device array to another device array using following code:
program main
implicit none
integer :: a(5,5,5,5)
integer, device :: a_d(5,5,5,5),b_d(5,5,5,5)
a=0
a_d=a
b_d(1:2,:,:,:)=a_d(2:3,:,:,:)
end program
The pgi compiler returns following error for b_d(1:2,:,:,:)=a_d(2:3,:,:,:):
PGF90-S-0519-More than one device-resident object in assignment.
How to solve this problem or, is there an efficient way to transfer only a piece of a device array to another device array?
The documentation says the following:
3.4.2. Implicit Data Transfer in Expressions
Some limited data transfer can be enclosed within expressions. In
general, the rule of thumb is all arithmetic or operations must occur
on the host, which normally only allows one device array to appear on
the right-hand-side of an expression.
I'm pretty sure that means that a slicing assignment of the kind you are trying to do is not supported because you have a device array on the left hand side of the expression.
I am not sure what is the best way to solve this. cudaMemcpy2D will definitely work if you can deduce the pitch of the arrays. Hopefully someone from PGI can edit a solution into this community wiki entry.

How is 'pass by reference' implemented without actually passing an address to a function? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am well aware of the fact that in C and C++ everything is passed by value (even if that value is of reference type). I think (but I'm no expert there) the same is true for Java.
So, and that's why I include language-agnostic as a tag, in what language can I pass anything to a function without passing some value?
And if that exists, what does the mechanism look like? I thought hard about that, and I fail to come up with any mechanism that does not involve the passing of a value.
Even if the compiler optimizes in a way that I don't have a pointer/reference as a true variable in memory, it still has to calculate an address as an offset from the stack (frame) pointer - and pass that.
Anybody who could enlighten me?
From C perspective:
There are no references as a language level concept. Objects are referred to by pointing at them with pointers.
The value of a pointer is the address of the pointed object. Pointers are passed by value just like any other arguments. A pointed object is conceptually passed by reference.
At least from C++ perspective:
How is 'pass by reference' implemented [...] ?
Typically, by copying the address of the object.
... without actually passing an address to a function?
If a function invocation is expanded inline, there is no need to copy the address anywhere. Same applies to pointers too, because copies may be elided due to the as-if rule.
in what language can I pass anything to a function without passing some value?
Such language would have to have significantly difference concept of a function than C. There would have to be no stack frame push.
Function-like C pre-processor macros, as their name implies, are similar to functions, but their arguments are not passed around at runtime, because pre-processing happens before compilation.
On the other hand, you can have global variables. If you change the global state of the program, and call a function with no arguments, you have conceptually "passed the new global state to the function" without having passed any value.
At a machine-code level, "pass X by reference" is essentially "pass the address of X by value".
Pointers are values. Valuea ars values. Values have a unique identity, require storage.
References are not values. References have no identity. If we have:
int x=0;
int& y=x;
int& z=x;
both y and z are references to x, and they have no independent identity.
In comparison:
int x=0;
int* py=&x;
int* pz=&x;
both py and pz are pointers at x, and they have independent identity. You could modify py and not pz, you can get a size of them, you can memset them.
In some circumstances, at the machine code level, references are implemented the same way as pointers, except certain operations are never performed on them (like reaiming them).
But C++ is not defined in terms of machine code. It is defined innterms of the behaviour of an abstract machine. Compilers compile your code to operations on this abstract machine, which has no fixed calling convention (by the standard), no layout for references, no stack, no heap, etc. It then does arbitrary transformations on this that do not change the as-if behaviour (a common one is single assignment), rearranges things, and then at some point emits assembly/machine code that generates similar behaviour on the actual hardware you are running on.
Now the near universal way to compile C++ is the compilation unit/linker model, where functions are exported as symbols and a fixed ABI calling convention is provided for other compilation units to use them. Then at link stage the compilation units are connected together.
In those ABIs, references are passed as pointers.
How is 'pass by reference' implemented without actually passing an address to a function?
Within the context of the C languages, the short answers are:
In C, it is not.
In C++, a type followed by an ampersand (&) is a reference type.
For instance, int& is a reference to an int. When passing an argument
to a function that takes reference type, the object is truly passed
by reference. (More on this in the scholarly link below.)
But in truth, most of the confusion is semantics. Some of the confusion could be helped by:
1) Stop using the word emulated to describe passing an address.
2) Stop using the word reference to describe address
Or
3) Recognize that within the context of the C/C++ languages, in the
phrase pass-by-reference, the word reference is defined as: value of
address.
Beyond this, there are many examples of illusions and concepts created to convey impossible ideas. The concept of non-emulated pass-by-reference is arguably one of them, no matter how many scholarly papers or practical discussions.
This one (scholarly paper category) is yet another that presents a distinction between emulated and actual pass-by-reference in a discussion using both C & C++, but who's conclusions stick closely to reality. The following is an excerpt:
...Somehow, it is only a matter of how the concept of “passing by reference” is actually realized by a programming language: C implements this by using pointers and passing them by value to functions whereas C++ provides two implementations. From a side, it reuses the same mechanism derived from C (i.e., pointers + pass by value). On the other hand, C++ also provides a native “pass by reference” solution which makes use of the idea of reference types. Thus, even in C++ if you are passing a pointer à la C, you are not truly passing by reference, you are passing a pointer by value (that is, of course, unless you are passing a reference to a pointer! e.g., int*&).
Because of this potential ambiguity in the term “pass by reference”, perhaps it’s best to only use it in the context of C++ when you are using a reference type.
But as you, and others have already noted, in the concept of passing anything via an argument, whether value or reference, that something must by definition have a value.
What is meant by pass by value is that the object itself is passed.
In pass by pointer, we pass the value of the pointer to the object.
In pass by reference, we pass a reference (basically a pointer that we know points to an object) in the same way.
So yes, we always pass a value, but the question is what is the value? Not always the object itself. But when we say pass a variable by **, we give the information relative to the object we want to pass, not the value actually passed.

CUDA tridiagonal solver function (cusparse)

In my CUDA code I am using cusparse<t>gtsv() function (more precisely, cusparseZgtsv and cusparseZgtsvStridedBatch ones).
In the documentaion it is said, that this function solves the equation A*x=alpha *B. My question is - what is alpha? I didn't find it as an input parameter. I have no idea how to specify it. Is it always equals to 1?
I performed some testing (solved some random systems of equations where tridiagonal matrices were always diagonally dominant and checked my solution using direct matrix by vector multiplication).
It looks like in the current version alpha = 1 always, so one can just ignore it. I suspect that it will be added as an input parameter in future releases.

How do interpreters load their values?

I mean, interpreters work on a list of instructions, which seem to be composed more or less by sequences of bytes, usually stored as integers. Opcodes are retrieved from these integers, by doing bit-wise operations, for use in a big switch statement where all operations are located.
My specific question is: How do the object values get stored/retrieved?
For example, let's (non-realistically) assume:
Our instructions are unsigned 32 bit integers.
We've reserved the first 4 bits of the integer for opcodes.
If I wanted to store data in the same integer as my opcode, I'm limited to a 24 bit integer. If I wanted to store it in the next instruction, I'm limited to a 32 bit value.
Values like Strings require lots more storage than this. How do most interpreters get away with this in an efficient manner?
I'm going to start by assuming that you're interested primarily (if not exclusively) in a byte-code interpreter or something similar (since your question seems to assume that). An interpreter that works directly from source code (in raw or tokenized form) is a fair amount different.
For a typical byte-code interpreter, you basically design some idealized machine. Stack-based (or at least stack-oriented) designs are pretty common for this purpose, so let's assume that.
So, first let's consider the choice of 4 bits for op-codes. A lot here will depend on how many data formats we want to support, and whether we're including that in the 4 bits for the op code. Just for the sake of argument, let's assume that the basic data types supported by the virtual machine proper are 8-bit and 64-bit integers (which can also be used for addressing), and 32-bit and 64-bit floating point.
For integers we pretty much need to support at least: add, subtract, multiply, divide, and, or, xor, not, negate, compare, test, left/right shift/rotate (right shifts in both logical and arithmetic varieties), load, and store. Floating point will support the same arithmetic operations, but remove the logical/bitwise operations. We'll also need some branch/jump operations (unconditional jump, jump if zero, jump if not zero, etc.) For a stack machine, we probably also want at least a few stack oriented instructions (push, pop, dupe, possibly rotate, etc.)
That gives us a two-bit field for the data type, and at least 5 (quite possibly 6) bits for the op-code field. Instead of conditional jumps being special instructions, we might want to have just one jump instruction, and a few bits to specify conditional execution that can be applied to any instruction. We also pretty much need to specify at least a few addressing modes:
Optional: small immediate (N bits of data in the instruction itself)
large immediate (data in the 64-bit word following the instruction)
implied (operand(s) on top of stack)
Absolute (address specified in 64 bits following instruction)
relative (offset specified in or following instruction)
I've done my best to keep everything about as minimal as is at all reasonable here -- you might well want more to improve efficiency.
Anyway, in a model like this, an object's value is just some locations in memory. Likewise, a string is just some sequence of 8-bit integers in memory. Nearly all manipulation of objects/strings is done via the stack. For example, let's assume you had some classes A and B defined like:
class A {
int x;
int y;
};
class B {
int a;
int b;
};
...and some code like:
A a {1, 2};
B b {3, 4};
a.x += b.a;
The initialization would mean values in the executable file loaded into the memory locations assigned to a and b. The addition could then produce code something like this:
push immediate a.x // put &a.x on top of stack
dupe // copy address to next lower stack position
load // load value from a.x
push immediate b.a // put &b.a on top of stack
load // load value from b.a
add // add two values
store // store back to a.x using address placed on stack with `dupe`
Assuming one byte for each instruction proper, we end up around 23 bytes for the sequence as a whole, 16 bytes of which are addresses. If we use 32-bit addressing instead of 64-bit, we can reduce that by 8 bytes (i.e., a total of 15 bytes).
The most obvious thing to keep in mind is that the virtual machine implemented by a typical byte-code interpreter (or similar) isn't all that different from a "real" machine implemented in hardware. You might add some instructions that are important to the model you're trying to implement (e.g., the JVM includes instructions to directly support its security model), or you might leave out a few if you only want to support languages that don't include them (e.g., I suppose you could leave out a few like xor if you really wanted to). You also need to decide what sort of virtual machine you're going to support. What I've portrayed above is stack-oriented, but you can certainly do a register-oriented machine if you prefer.
Either way, most of object access, string storage, etc., comes down to them being locations in memory. The machine will retrieve data from those locations into the stack/registers, manipulate as appropriate, and store back to the locations of the destination object(s).
Bytecode interpreters that I'm familiar with do this using constant tables. When the compiler is generating bytecode for a chunk of source, it is also generating a little constant table that rides along with that bytecode. (For example, if the bytecode gets stuffed into some kind of "function" object, the constant table will go in there too.)
Any time the compiler encounters a literal like a string or a number, it creates an actual runtime object for the value that the interpreter can work with. It adds that to the constant table and gets the index where the value was added. Then it emits something like a LOAD_CONSTANT instruction that has an argument whose value is the index in the constant table.
Here's an example:
static void string(Compiler* compiler, int allowAssignment)
{
// Define a constant for the literal.
int constant = addConstant(compiler, wrenNewString(compiler->parser->vm,
compiler->parser->currentString, compiler->parser->currentStringLength));
// Compile the code to load the constant.
emit(compiler, CODE_CONSTANT);
emit(compiler, constant);
}
At runtime, to implement a LOAD_CONSTANT instruction, you just decode the argument, and pull the object out of the constant table.
Here's an example:
CASE_CODE(CONSTANT):
PUSH(frame->fn->constants[READ_ARG()]);
DISPATCH();
For things like small numbers and frequently used values like true and null, you may devote dedicated instructions to them, but that's just an optimization.