According to mips abi, caller put the first few arguments in GPRs for performance, and don't push these arguments into stack frame.
but when i use varargs api(stdarg.h) to define a function with variable argument list, such as void func(int type, ...);, the api works.
I find out stdarg.h apis only search the arguments in stack,
If the compiler only push the first few argument into GPRs, why does stdarg.h work?
did i miss something about the ABIs?
Conventions for variadic functions are described in the MIPS ELF ABI, page 3-46. Basically, when the called function is variadic (its declared argument list ends with a '...'), then the compiler adds some code which writes the first arguments (passed in registers) into the stack. The stack frame always includes some space for the first four arguments (precisely, for the four words which are passed in registers $4 to $7). Thus, the caller needs not be aware of whether the function was variadic or not (except possibly for floating point arguments; and, anyway, it is best if both caller and callee see and use the same prototype).
If you compile a C variadic function and look at the produced assembly, you will see, near the beginning of the functions, lines like:
sw $5,52($sp)
sw $6,56($sp)
sw $7,60($sp)
which correspond to that argument-to-stack process.
Related
I want to detect all occurences of function pointers, calls as well as assignments. I am able to detect indirect calls but how to do the assignment ones? Is there a char Iterator in the Instruction.h header or something which iterates character by character for each instruction in the .ll file?
Any help or links will be appreciated.
Start with unoptimized LLVM IR, then scan the use-list for all functions.
In the comments we established that this is for a school project, not a production system. The difference when building a production system is that you should rely only on the properties that are guaranteed, while for research or prototyping or schoolwork, you can build something that happens to work by relying on properties that are not guaranteed. That's what we're going to do here.
It so happens (but is not guaranteed!) that as clang converts C to LLVM IR, it will[1] emit a store for every function pointer used. As long as we don't run the LLVM optimizer, they will still be there.
The "forwards" direction would be to look into instructions and see whether any of them do the action you want (assign or call a function pointer). I've got a better idea: let's do it backwards. Every llvm::Value has a list of all places where that value is used, called the use-list. For example %X = add i32 %Y, %Z, %X would appear in the use-list of %Y because %X is using %Y.
Starting with your Module, iterate over every function (ie. for (auto &F : M->functions()) {), then scan the use-list for the function (ie. for (const auto *U : F.users())) and look at those Values (ie. U.get()). If the user value is an Instruction, you can query which Function this Instruction belongs to -- the Instruction has a parent BasicBlock and the BasicBlock has a parent Function. If it's a ConstantExpr, you'll need to recurse through that ConstantExpr's use-list until you find all the instructions using those constants. If the instruction is a direct call, you want to skip it (unless it's a direct call with an argument that is also a function pointer, like somefun(1, &somefun, 2);).
This will miss any code that uses C typed function pointers but never point to any functions, for instance C code like void (*fun_ptr)(int) = NULL;. If this is important to you then I suggest writing a Clang AST tool with the RecursiveASTVisitor instead of using LLVM.
[1] clang is free not to, for instance given if (0) { /* ... */ } clang could skip emitting LLVM IR for the disabled code because it's faster to do that. If your function pointer use was in there, LLVM would never get the opportunity to see it.
According to the MIPS documentation, functions output is stored in $v0-$v1 (up to 64 bits), and the function arguments are given in $a0-$a3, where any additional arguments are written to the stack.
Since the function is allowed to overwrite the values of $v0-$v1, wouldn't it be better to pass the function fifth argument (if such exist) on $v0?
What is the motivation for using the stack in this case?
You are right that the $v registers are available to be used to pass parameters.
MIPS has, at times, updated the calling convention, for example: the "MIPS EABI 32-bit Calling Convention", redefines 4 of the original $t registers, $8-$11, as additional argument registers, to pass up to 8 integer arguments in total.
We might also consider that $at aka $1 — the assembler temp — is also available at the point for parameter passing.
However, object model invocations, e.g. those involving vtables, thunks and other stubs such as long calls, perhaps cross library (DLL) calls, can require an available register or two that are scratch, so it would not necessarily be best to use every one of the scratch registers for arguments.
Discussion
In general, other than that I'm not sure why they don't just get rid of most of the $t registers (and $v registers) and make them all $a registers — these would only be used when needed, and otherwise those unused argument registers would serve the same purpose as $t registers. The more parameters, the fewer scratch registers — though in both caller and callee — but I think tradeoff can be made instead of guaranteeing some larger minimum number of scratch registers as in current ABIs.
Still, without some bare minimum number of scratch registers, you would sometimes end up using memory, spilling already computed arguments to memory in order to have free registers to compute the last couple of parameters, only to have to reload those spilled values back into registers. If that were to happen, might as well have passed some of them in memory in the first place, especially since the callee may also have to store some of the arguments to memory anyway (e.g. the callee is not a leaf function, and parameters are needed after further calls).
8 argument registers is probably already on the tapering end of the curve of usefulness, so past thereabouts adding more probably has negligible returns on real code bases.
Also, a language can invent/define its own calling convention: these calling conventions are the standard for C language interoperability. Even the C compiler can use custom calling conventions when it is certain that such language interoperability is not required, as we can also do in assembly when we know more details about function implementations (i.e. their internal register usages) than just the function signature.
Nicely collected set details on various calling conventions:
https://www.dyncall.org/docs/manual/manualse11.html
Addendum:
Let's assume a machine with only 2 registers, call them A & B, and it uses both to pass parameters. Let's say a first parameter is computed into A (using B register as scratch if needed). In computing the value of the 2nd parameter, for B, it may run out of scratch registers, especially if the expression for that actual argument is complicated. When out of registers, we spill something to memory, here let's say, the already computed A. Now the parameter for B can be computed with that extra register. However, the A parameter value, now in memory, needs to return back to the A register before the call. Thus, this is worse than passing A in memory b/c the caller has to do both a store and a load, whereas passing in memory means just the store.
Now add to that situation that the callee may have to store the parameter to memory as well (various possible reasons). That means another store to memory. So, in total, if the above scenario coincides with this one, then a store, a load and another store — contrasted with memory parameter passing, which would have just the one store by the caller.
I was just wondering, If I have this ASM function:
PUSH EBP
MOV EBP, ESP
SUB ESP, 8
LEAVE
RETN 8
That does nothing and takes two 4-bytes arguments. It seems that the first argument is at EBP+8 and the second at EBP+12. But, how to know that? Because if the function takes three 4-bytes parameters, then the third will be at EBP+16. Will the first argument be always at EBP+8 and then I just have to add the argument size to get the next one? If yes, why 8?
Thanks in advance.
It's at 8 because, generally, EBP+0 = caller's saved EBP, EBP+4 = return address, EBP+8 = first stack based argument.
Also, offsets like this are normally expressed in hexadecimal values so the 2nd stack based argument will be at EBP+C and the third will be at EBP+10.
A good way (not 100% though) to deduce the calling convention of the function is to see how callers of the function prepare registers and/or the stack just prior to calling the function (and also just after the function returns).
The first stack argument will always be at [EBP+8] when using a stack frame, but calling conventions can pass arguments in both registers (general purpose and SIMD) and on the stack.
This your example assume you use a standardized convention such as __stdcall, __cdecl but arguments in __fastcall and VC++13's new __vectorcall will be in general purpose and SIMD registers respectively (and the registers themselves differ based on ABI Sys-V vs MS).
Layout of function arguments depends on calling convention being used for this function. And the calling convention can be anything that the function creator was potent to imagine.
I have this function:
BOOL WINAPI MyFunction(WORD a, WORD b, WORD *c, WORD *d)
When disassembling, I'm getting something like this:
PUSH EBP
MOV ESP, EBP
SUB ESP, C
...
LEAVE
RETN C
As far as I know, the SUB ESP, C means that the function takes 12 bytes for all it's arguments, right? Each argument is 4-byte, and there're 4 arguments so shouldn't this function be disassembled as SUB ESP, 10?
Also, if I don't know about the C header of the function, how can I know the size of each parameter (not the size of all the parameters)?
No, the SUB instruction only tells you that the function needs 12 bytes for its local variables. Inferring the arguments requires looking at the code that calls this function. You'll see it setting up the stack before the CALL instruction.
In the specific case of a WINAPI function (aka __stdcall), the RET instruction gives you information since that calling convention requires the function to clean-up the stack before it returns. So a RET 0x0C tells you that the arguments required 12 bytes. Otherwise an accidental match with the stack frame size. Which usually means it takes 3 arguments, it depends on the argument types. A WORD size argument gets promoted to a 32-bit value so the signature you theorized is not a match.
If the convention call uses the stack (as it seems) to pass parameters, you can figure out how many parameters and what size they have.
For "how many", you can look at the operand of the RET instruction, if any (stdcall convention). This will give you how many bytes parameters are using. Of course this data alone if of not much use.
You have to read the function code and search for memory references like this [EBP+n] where n is a positive offset from the value of EBP. Positive offsets are addressing parameters, and negative offsets are addressing local variables (created with the SUB ESP,x instruction)
Hopefully, you will able to spot all distinct parameters. If the function has been complied with optimizations, this may be hard to figure out.
For size and type, more inverse engineering is needed. Look at the instructions that use addressed parameters. If you find something like dword ptr [ebp+n] then that parameter is 32-bit long. word ptr [ebp+n] tels you that the parameter is 16-bit long, and byte ptr [ebp+n] means a byte size parameter.
For byte and word sized parameters, the most plausible options are char/unsigned char and short/unsigned short.
For double word sized parameters, type may be int/unsigned int/long/unsigned long, but it may be a pointer as well. To differentiate a pointer from a plain integer, you will have to look further, to see if the dword read from the parameter is being used as a memory address itself to access memory (i.e. it's being dereferenciated).
To tell signedness of a parameter, you have to search for a code fragment in which a particular parameter is compared against some other value, and then a conditional jump is issued. The particular condition used in the jump will tell you if the comparison was performed taking the sign into account or not. For example: a comparison with a JA / JB / JAE / JBE conditional jumps indicate an unsigned comparison and hence, an unsigned parameter. Conditional jumps as JG / JE / JGE / JLE indicate signed parameter involved in the comparison.
That depends on your ABI.
In your case, it seems you're using Windows x86 (32 bit), which allows several C calling conventions. Some pass parameters in registers, others on the stack.
If the parameters are passed on the stack, they will be above the frame pointer, so subtracting from the stack pointer is used to make space for local variables, not to read the function parameters.
I have a function that is called by main. Assume that function's name is funct1. funct1 calls another function named read_input.
Now assume that funct1 starts as follows:
push %rbp
push %rbx
sub $0x28, %rsp
mov $rsp, %rsi
callq 4014f0 read_input
cmpl $0x0, (%rsp)
jne (some terminating function)
So just a few of questions:
In this case, does read_input only have one argument, which is
%rbx?
Furthermore, if the stack pointer is being decreased by
0x28, this means a string of size 0x28 is getting pushed onto the
stack? (I know it's a string).
And what is the significance of
mov %rsp, %rsi before calling a function?
And lastly, when read_input returns, where is the return value put?
Thank you and sorry for the questions but I am just starting to learn x86!
It looks like your code is using the Linux/AMD ABI. I'll answer your questions in that context.
No, rbx is a callee-saved (nonvolatile) register. Your function is saving it so that it doesn't disturb the caller's value. It's not being restored in the code you've shown, but that's because you haven't shown the whole function. If there's more to this function, and I think there is, it's because rbx is being used somewhere later on in this routine.
Yes, space for 0x28 bytes of data is being made on the stack. Assuming read_input is taking a string as a parameter, your description is reasonable. It's not necessarily accurate, however. Some of that data might be used for other local variables aside from just the buffer being allocated to pass to read_input.
This instruction is putting a pointer to the newly allocated stack buffer into rsi. rsi is the second parameter register for the AMD x64 calling convention. That means you're going to be calling read_input with whatever the first parameter passed to this function is, along with a pointer to your new stack buffer.
In rax, if it's a 64-bit value or smaller, in rax & rdx if it's larger. Or if it's floating point, in xmm0, ymm0, or st(0). You probably should look at a description of your calling convention to get a handle on this stuff - there's a great PDF file at this link. Check out Table 4.