What if RISC-V function has too many arguments? - parameter-passing

Register
ABI
Name
Description
Saver
x10–11
a0–1
Function
arguments/return values
Caller
x12–17
a2–7
Function
arguments
Caller
What if I have ten arguments, where is the place for values to be saved?
I can't find information about this.

The remaining arguments that don't fit in registers will be passed on stack, just like any other calling conventions. Where else can they be? See the ABI:
Scalars that are at most XLEN bits wide are passed in a single argument register, or on the stack by value if none is available. When passed in registers or on the stack, integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. When passed in registers or on the stack, floating-point types narrower than XLEN bits are widened to XLEN bits, with the upper bits undefined.
Scalars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.
Scalars wider than 2×XLEN are passed by reference and are replaced in the argument list with the address.
Aggregates whose total size is no more than XLEN bits are passed in a register, with the fields laid out as though they were passed in memory. If no register is available, the aggregate is passed on the stack. Aggregates whose total size is no more than 2×XLEN bits are passed in a pair of registers; if only one register is available, the first XLEN bits are passed in a register and the remaining bits are passed on the stack. If no registers are available, the aggregate is passed on the stack. Bits unused due to padding, and bits past the end of an aggregate whose size in bits is not divisible by XLEN, are undefined.
Aggregates or scalars passed on the stack are aligned to the greater of the type alignment and XLEN bits, but never more than the stack alignment.
...
In the base integer calling convention, variadic arguments are passed in the same manner as named arguments, with one exception. Variadic arguments with 2×XLEN-bit alignment and size at most 2×XLEN bits are passed in an aligned register pair (i.e., the first register in the pair is even-numbered), or on the stack by value if none is available. After a variadic argument has been passed on the stack, all future arguments will also be passed on the stack (i.e. the last argument register may be left unused due to the aligned register pair rule).
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc

Related

How to define, allocate, and also initialise the values of an array of user defined length

I am quite new to MIPS32 and am working on an assignment that requires me to first ask the user for the length of the array they would like to define, and then ask them what the respective values are. I have written a rough C code which does the same, which is as follows
int main()
{
int N;
scanf("%d\n", &N); // will be the first line, will tell us the number of inputs we will get
int i=0, A[N]; // (1)
// Loop to enter the values to be sorted into an array we have made A[]. the values are entered as a1, a2.... and so on.
while(i != N)
{
scanf("%d\n", &A[i]);
i++;
}
}
I am mainly having trouble with how I write the code above, mainly line (1) in MIPS32. I know that defining the size of the array in the data section itself is not an option, but I am unsure about how to dynamically define an array of size N and then also store values into the array. Any help or advice on what I can do would be really helpful.
Arrays can be stored in global, stack or heap memory.
Global memory
Global memory is essentially fixed-sized at program build time — you put a label in your .data and reserve some constant amount of space, using .space or other data directive.
One approach here is to have a maximum (say 100), so reserve space for that many, and program a limit test to make sure the code doesn't try to use more than the pre-defined maximum.
As an exception, the last global data item can be used to to store an array of relatively unknown size.  This happens to work in QtSpim and MARS, because a fair amount of space behind the global data is there for use to use.  This approach is not very professional, since the code can't really know at what size this will no longer work, but is valid approach for sample toy programs and throw away assignments.  Put a label at the end of your global data and reserve no space or just one word of space.
Integer element arrays have alignment requirements, so when putting global data after string data often requires use of alignment (as a separate directive or by reserving a word, e.g. .word, which will inject alignment automatically).
Heap memory
Heap memory can be allocated using MARS/QtSpim syscall #9.  If the allocation fails, the size was too large, though if it succeeds you have all the space that was asked for.  The syscall #9 returns a pointer to the newly allocated memory in $v0, and you will generally want to store that value somewhere (register or global) for later use.
Stack memory
The stack grows in the downward direction: stack memory can be allocated by decrementing the stack pointer.  The stack pointer — after a decrement — refers to the newly allocated memory.  You can decrement the stack pointer by a fixed amount or by a variable amount.  In your case, you would use a variable amount.  It is generally required that the stack pointer maintain alignment, so in computing the amount to decrement, we would round up.  If you need multiple entities, you can decrement the stack pointer multiple times, or, sum the sizes together and decrement once, which would be the more common approach.
Before (or as) a function returns to its caller, the stack pointer must be returned to the value it had upon function entry.  This releases any allocated stack memory and returns to the caller the same stack environment that it had when it made the function call.  It should stand to reason that it would be a logic error to return released memory to a caller, so this approach cannot be used within a function that needs to return an array to its caller.
Any function that uses syscall #10 to terminate the program does not have to honor this requirement, since the program terminates immediately upon that syscall.  This approach is often used to exit the main — MARS requires it, since it doesn't "call" the main, whereas QtSpim, by default, inserts a small startup stub that does "call" main.

Why MIPS doesn't take additional function arguments in $v0 and $v1

According to the MIPS documentation, functions output is stored in $v0-$v1 (up to 64 bits), and the function arguments are given in $a0-$a3, where any additional arguments are written to the stack.
Since the function is allowed to overwrite the values of $v0-$v1, wouldn't it be better to pass the function fifth argument (if such exist) on $v0?
What is the motivation for using the stack in this case?
You are right that the $v registers are available to be used to pass parameters.
MIPS has, at times, updated the calling convention, for example: the "MIPS EABI 32-bit Calling Convention", redefines 4 of the original $t registers, $8-$11, as additional argument registers, to pass up to 8 integer arguments in total.
We might also consider that $at aka $1 — the assembler temp — is also available at the point for parameter passing.
However, object model invocations, e.g. those involving vtables, thunks and other stubs such as long calls, perhaps cross library (DLL) calls, can require an available register or two that are scratch, so it would not necessarily be best to use every one of the scratch registers for arguments.
Discussion
In general, other than that I'm not sure why they don't just get rid of most of the $t registers (and $v registers) and make them all $a registers — these would only be used when needed, and otherwise those unused argument registers would serve the same purpose as $t registers.  The more parameters, the fewer scratch registers — though in both caller and callee — but I think tradeoff can be made instead of guaranteeing some larger minimum number of scratch registers as in current ABIs.
Still, without some bare minimum number of scratch registers, you would sometimes end up using memory, spilling already computed arguments to memory in order to have free registers to compute the last couple of parameters, only to have to reload those spilled values back into registers.  If that were to happen, might as well have passed some of them in memory in the first place, especially since the callee may also have to store some of the arguments to memory anyway (e.g. the callee is not a leaf function, and parameters are needed after further calls).
8 argument registers is probably already on the tapering end of the curve of usefulness, so past thereabouts adding more probably has negligible returns on real code bases.
Also, a language can invent/define its own calling convention: these calling conventions are the standard for C language interoperability.  Even the C compiler can use custom calling conventions when it is certain that such language interoperability is not required, as we can also do in assembly when we know more details about function implementations (i.e. their internal register usages) than just the function signature.
Nicely collected set details on various calling conventions:
https://www.dyncall.org/docs/manual/manualse11.html
Addendum:
Let's assume a machine with only 2 registers, call them A & B, and it uses both to pass parameters.  Let's say a first parameter is computed into A (using B register as scratch if needed).  In computing the value of the 2nd parameter, for B, it may run out of scratch registers, especially if the expression for that actual argument is complicated.  When out of registers, we spill something to memory, here let's say, the already computed A.  Now the parameter for B can be computed with that extra register. However, the A parameter value, now in memory, needs to return back to the A register before the call.  Thus, this is worse than passing A in memory b/c the caller has to do both a store and a load, whereas passing in memory means just the store.
Now add to that situation that the callee may have to store the parameter to memory as well (various possible reasons).  That means another store to memory.  So, in total, if the above scenario coincides with this one, then a store, a load and another store — contrasted with memory parameter passing, which would have just the one store by the caller.

What is the difference SSE and SSEUP in x86-64-psABI chapter 3.2.3?

In x86-64-psABI(https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf), chapter 3.2.3, it defines some classes corresponding to AMD64 register.
1)What is the difference between SSE and SSEUP? SSEUP said "The class consists of types that fit into a vector register and can be passed and returned in the upper bytes of it" What does "can be passed and returned in the upper bytes of it" means?
2)What is the difference between X87 X87UP and COMPLEX_X87? They both looked identical.
3.2.3 Parameter Passing
After the argument values have been computed, they are placed either in regis- ters or pushed on the stack. The way how values are passed is described in the following sections.
Definitions We first define a number of classes to classify arguments. The classes are corresponding to AMD64 register classes and defined as:
INTEGER This class consists of integral types that fit into one of the general purpose registers.
SSE The class consists of types that fit into a vector register.
SSEUP Theclassconsistsoftypesthatfitintoavectorregisterandcanbepassed
and returned in the upper bytes of it.
X87, X87UP These classes consists of types that will be returned via the x87 FPU.
COMPLEX_X87 This class consists of types that will be returned via the x87 FPU.
NO_CLASS This class is used as initializer in the algorithms. It will be used for padding and empty structures and unions.
MEMORY This class consists of types that will be passed and returned in memory via the stack.
The SSE registers have been extended from 128 (xmm) to 256 (ymm) and 512 bits (zmm).
The ABI doesn't try to use them horizontally but vertically first: if you have two __m128 they are not passed in a single ymm register but in two.
However types such as __m256 or __m512, instead, are passed in a ymm or zmm.
The SSUP classification is there to model this, the lower 128 bits of a SSE register are the lower bytes.
I think it is also assumed that have 256 or 512 bits can only be used with CPUs that have 256 or 512 bit registers.
I don't think it is legal to pass the four 128 bit chunks of a __m512 in three xmm registers (the first fully used and the other two only used in their upper part).
The wording "that fit into a vector register" seems to imply so.

How to know how many arguments takes a function?

I have this function:
BOOL WINAPI MyFunction(WORD a, WORD b, WORD *c, WORD *d)
When disassembling, I'm getting something like this:
PUSH EBP
MOV ESP, EBP
SUB ESP, C
...
LEAVE
RETN C
As far as I know, the SUB ESP, C means that the function takes 12 bytes for all it's arguments, right? Each argument is 4-byte, and there're 4 arguments so shouldn't this function be disassembled as SUB ESP, 10?
Also, if I don't know about the C header of the function, how can I know the size of each parameter (not the size of all the parameters)?
No, the SUB instruction only tells you that the function needs 12 bytes for its local variables. Inferring the arguments requires looking at the code that calls this function. You'll see it setting up the stack before the CALL instruction.
In the specific case of a WINAPI function (aka __stdcall), the RET instruction gives you information since that calling convention requires the function to clean-up the stack before it returns. So a RET 0x0C tells you that the arguments required 12 bytes. Otherwise an accidental match with the stack frame size. Which usually means it takes 3 arguments, it depends on the argument types. A WORD size argument gets promoted to a 32-bit value so the signature you theorized is not a match.
If the convention call uses the stack (as it seems) to pass parameters, you can figure out how many parameters and what size they have.
For "how many", you can look at the operand of the RET instruction, if any (stdcall convention). This will give you how many bytes parameters are using. Of course this data alone if of not much use.
You have to read the function code and search for memory references like this [EBP+n] where n is a positive offset from the value of EBP. Positive offsets are addressing parameters, and negative offsets are addressing local variables (created with the SUB ESP,x instruction)
Hopefully, you will able to spot all distinct parameters. If the function has been complied with optimizations, this may be hard to figure out.
For size and type, more inverse engineering is needed. Look at the instructions that use addressed parameters. If you find something like dword ptr [ebp+n] then that parameter is 32-bit long. word ptr [ebp+n] tels you that the parameter is 16-bit long, and byte ptr [ebp+n] means a byte size parameter.
For byte and word sized parameters, the most plausible options are char/unsigned char and short/unsigned short.
For double word sized parameters, type may be int/unsigned int/long/unsigned long, but it may be a pointer as well. To differentiate a pointer from a plain integer, you will have to look further, to see if the dword read from the parameter is being used as a memory address itself to access memory (i.e. it's being dereferenciated).
To tell signedness of a parameter, you have to search for a code fragment in which a particular parameter is compared against some other value, and then a conditional jump is issued. The particular condition used in the jump will tell you if the comparison was performed taking the sign into account or not. For example: a comparison with a JA / JB / JAE / JBE conditional jumps indicate an unsigned comparison and hence, an unsigned parameter. Conditional jumps as JG / JE / JGE / JLE indicate signed parameter involved in the comparison.
That depends on your ABI.
In your case, it seems you're using Windows x86 (32 bit), which allows several C calling conventions. Some pass parameters in registers, others on the stack.
If the parameters are passed on the stack, they will be above the frame pointer, so subtracting from the stack pointer is used to make space for local variables, not to read the function parameters.

varargs works in mips

According to mips abi, caller put the first few arguments in GPRs for performance, and don't push these arguments into stack frame.
but when i use varargs api(stdarg.h) to define a function with variable argument list, such as void func(int type, ...);, the api works.
I find out stdarg.h apis only search the arguments in stack,
If the compiler only push the first few argument into GPRs, why does stdarg.h work?
did i miss something about the ABIs?
Conventions for variadic functions are described in the MIPS ELF ABI, page 3-46. Basically, when the called function is variadic (its declared argument list ends with a '...'), then the compiler adds some code which writes the first arguments (passed in registers) into the stack. The stack frame always includes some space for the first four arguments (precisely, for the four words which are passed in registers $4 to $7). Thus, the caller needs not be aware of whether the function was variadic or not (except possibly for floating point arguments; and, anyway, it is best if both caller and callee see and use the same prototype).
If you compile a C variadic function and look at the produced assembly, you will see, near the beginning of the functions, lines like:
sw $5,52($sp)
sw $6,56($sp)
sw $7,60($sp)
which correspond to that argument-to-stack process.