How to detect function pointers from assignment statements in LLVM IR? - function

I want to detect all occurences of function pointers, calls as well as assignments. I am able to detect indirect calls but how to do the assignment ones? Is there a char Iterator in the Instruction.h header or something which iterates character by character for each instruction in the .ll file?
Any help or links will be appreciated.

Start with unoptimized LLVM IR, then scan the use-list for all functions.
In the comments we established that this is for a school project, not a production system. The difference when building a production system is that you should rely only on the properties that are guaranteed, while for research or prototyping or schoolwork, you can build something that happens to work by relying on properties that are not guaranteed. That's what we're going to do here.
It so happens (but is not guaranteed!) that as clang converts C to LLVM IR, it will[1] emit a store for every function pointer used. As long as we don't run the LLVM optimizer, they will still be there.
The "forwards" direction would be to look into instructions and see whether any of them do the action you want (assign or call a function pointer). I've got a better idea: let's do it backwards. Every llvm::Value has a list of all places where that value is used, called the use-list. For example %X = add i32 %Y, %Z, %X would appear in the use-list of %Y because %X is using %Y.
Starting with your Module, iterate over every function (ie. for (auto &F : M->functions()) {), then scan the use-list for the function (ie. for (const auto *U : F.users())) and look at those Values (ie. U.get()). If the user value is an Instruction, you can query which Function this Instruction belongs to -- the Instruction has a parent BasicBlock and the BasicBlock has a parent Function. If it's a ConstantExpr, you'll need to recurse through that ConstantExpr's use-list until you find all the instructions using those constants. If the instruction is a direct call, you want to skip it (unless it's a direct call with an argument that is also a function pointer, like somefun(1, &somefun, 2);).
This will miss any code that uses C typed function pointers but never point to any functions, for instance C code like void (*fun_ptr)(int) = NULL;. If this is important to you then I suggest writing a Clang AST tool with the RecursiveASTVisitor instead of using LLVM.
[1] clang is free not to, for instance given if (0) { /* ... */ } clang could skip emitting LLVM IR for the disabled code because it's faster to do that. If your function pointer use was in there, LLVM would never get the opportunity to see it.

Related

Why MIPS doesn't take additional function arguments in $v0 and $v1

According to the MIPS documentation, functions output is stored in $v0-$v1 (up to 64 bits), and the function arguments are given in $a0-$a3, where any additional arguments are written to the stack.
Since the function is allowed to overwrite the values of $v0-$v1, wouldn't it be better to pass the function fifth argument (if such exist) on $v0?
What is the motivation for using the stack in this case?
You are right that the $v registers are available to be used to pass parameters.
MIPS has, at times, updated the calling convention, for example: the "MIPS EABI 32-bit Calling Convention", redefines 4 of the original $t registers, $8-$11, as additional argument registers, to pass up to 8 integer arguments in total.
We might also consider that $at aka $1 — the assembler temp — is also available at the point for parameter passing.
However, object model invocations, e.g. those involving vtables, thunks and other stubs such as long calls, perhaps cross library (DLL) calls, can require an available register or two that are scratch, so it would not necessarily be best to use every one of the scratch registers for arguments.
Discussion
In general, other than that I'm not sure why they don't just get rid of most of the $t registers (and $v registers) and make them all $a registers — these would only be used when needed, and otherwise those unused argument registers would serve the same purpose as $t registers.  The more parameters, the fewer scratch registers — though in both caller and callee — but I think tradeoff can be made instead of guaranteeing some larger minimum number of scratch registers as in current ABIs.
Still, without some bare minimum number of scratch registers, you would sometimes end up using memory, spilling already computed arguments to memory in order to have free registers to compute the last couple of parameters, only to have to reload those spilled values back into registers.  If that were to happen, might as well have passed some of them in memory in the first place, especially since the callee may also have to store some of the arguments to memory anyway (e.g. the callee is not a leaf function, and parameters are needed after further calls).
8 argument registers is probably already on the tapering end of the curve of usefulness, so past thereabouts adding more probably has negligible returns on real code bases.
Also, a language can invent/define its own calling convention: these calling conventions are the standard for C language interoperability.  Even the C compiler can use custom calling conventions when it is certain that such language interoperability is not required, as we can also do in assembly when we know more details about function implementations (i.e. their internal register usages) than just the function signature.
Nicely collected set details on various calling conventions:
https://www.dyncall.org/docs/manual/manualse11.html
Addendum:
Let's assume a machine with only 2 registers, call them A & B, and it uses both to pass parameters.  Let's say a first parameter is computed into A (using B register as scratch if needed).  In computing the value of the 2nd parameter, for B, it may run out of scratch registers, especially if the expression for that actual argument is complicated.  When out of registers, we spill something to memory, here let's say, the already computed A.  Now the parameter for B can be computed with that extra register. However, the A parameter value, now in memory, needs to return back to the A register before the call.  Thus, this is worse than passing A in memory b/c the caller has to do both a store and a load, whereas passing in memory means just the store.
Now add to that situation that the callee may have to store the parameter to memory as well (various possible reasons).  That means another store to memory.  So, in total, if the above scenario coincides with this one, then a store, a load and another store — contrasted with memory parameter passing, which would have just the one store by the caller.

Cuda _sync functions, how to handle unknown thread mask?

This question is about adapting to the change in semantics from lock step to independent program counters. Essentially, what can I change calls like int __all(int predicate); into for volta.
For example, int __all_sync(unsigned mask, int predicate);
with semantics:
Evaluate predicate for all non-exited threads in mask and return non-zero if and only if predicate evaluates to non-zero for all of them.
The docs assume that the caller knows which threads are active and can therefore populate mask accurately.
a mask must be passed that specifies the threads participating in the call
I don't know which threads are active. This is in a function that is inlined into various places in user code. That makes one of the following attractive:
__all_sync(UINT32_MAX, predicate);
__all_sync(__activemask(), predicate);
The first is analogous to a case declared illegal at https://forums.developer.nvidia.com/t/what-does-mask-mean-in-warp-shuffle-functions-shfl-sync/67697, quoting from there:
For example, this is illegal (will result in undefined behavior for warp 0):
if (threadIdx.x > 3) __shfl_down_sync(0xFFFFFFFF, v, offset, 8);
The second choice, this time quoting from __activemask() vs __ballot_sync()
The __activemask() operation has no such reconvergence behavior. It simply reports the threads that are currently converged. If some threads are diverged, for whatever reason, they will not be reported in the return value.
The operating semantics appear to be:
There is a warp of N threads
M (M <= N) threads are enabled by compile time control flow
D (D subset of M) threads are converged, as a runtime property
__activemask returns which threads happen to be converged
That suggests synchronising threads then using activemask,
__syncwarp();
__all_sync(__activemask(), predicate);
An nvidia blog post says that is also undefined, https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
Calling the new __syncwarp() primitive at line 10 before __ballot(), as illustrated in Listing 11, does not fix the problem either. This is again implicit warp-synchronous programming. It assumes that threads in the same warp that are once synchronized will stay synchronized until the next thread-divergent branch. Although it is often true, it is not guaranteed in the CUDA programming model.
That marks the end of my ideas. That same blog concludes with some guidance on choosing a value for mask:
Don’t just use FULL_MASK (i.e. 0xffffffff for 32 threads) as the mask value. If not all threads in the warp can reach the primitive according to the program logic, then using FULL_MASK may cause the program to hang.
Don’t just use __activemask() as the mask value. __activemask() tells you what threads happen to be convergent when the function is called, which can be different from what you want to be in the collective operation.
Do analyze the program logic and understand the membership requirements. Compute the mask ahead based on your program logic.
However, I can't compute what the mask should be. It depends on the control flow at the call site that the code containing __all_sync was inlined into, which I don't know. I don't want to change every function to take an unsigned mask parameter.
How do I retrieve semantically correct behaviour without that global transform?
TL;DR: In summary, the correct programming approach will most likely be to do the thing you stated you don't want to do.
Longer:
This blog specifically suggests an opportunistic method for handling an unknown thread mask: precede the desired operation with __activemask() and use that for the desired operation. To wit (excerpting verbatim from the blog):
int mask = __match_any_sync(__activemask(), (unsigned long long)ptr);
That should be perfectly legal.
You might ask "what about item 2 mentioned at the end of the blog?" I think if you read that carefully and taking into account the previous usage I just excerpted, it's suggesting "don't just use __activemask()" if you intend something different. That reading seems evident from the full text there. That doesn't abrogate the legality of the previous construct.
You might ask "what about incidental or enforced divergence along the way?" (i.e. during the processing of my function which is called from elsewhwere)
I think you have only 2 options:
grab the value of __activemask() at entry to the function. Use it later when you call the sync operation you desire. That is your best guess as to the intent of the calling environment. CUDA doesn't guarantee that this will be correct, however this should certainly be legal if you don't have enforced divergence at the point of your sync function call.
Make the intent of the calling environment clear - add a mask parameter to your function and rewrite the code everywhere (which you've stated you don't want to do).
There is no way to deduce the intent of the calling environment from within your function, if you permit the possibility of warp divergence prior to entry to your function, which obscures the calling environment intent. To be clear, CUDA with the Volta execution model permits the possibility of warp divergence at any time. Therefore, the correct approach is to rewrite the code to make the intent at the call site explicit, rather than trying to deduce it from within the called function.

How to transfer a part of a device array to another device array using PGI cuda fortran compiler?

Now I want to transfer a piece of a device array to another device array using following code:
program main
implicit none
integer :: a(5,5,5,5)
integer, device :: a_d(5,5,5,5),b_d(5,5,5,5)
a=0
a_d=a
b_d(1:2,:,:,:)=a_d(2:3,:,:,:)
end program
The pgi compiler returns following error for b_d(1:2,:,:,:)=a_d(2:3,:,:,:):
PGF90-S-0519-More than one device-resident object in assignment.
How to solve this problem or, is there an efficient way to transfer only a piece of a device array to another device array?
The documentation says the following:
3.4.2. Implicit Data Transfer in Expressions
Some limited data transfer can be enclosed within expressions. In
general, the rule of thumb is all arithmetic or operations must occur
on the host, which normally only allows one device array to appear on
the right-hand-side of an expression.
I'm pretty sure that means that a slicing assignment of the kind you are trying to do is not supported because you have a device array on the left hand side of the expression.
I am not sure what is the best way to solve this. cudaMemcpy2D will definitely work if you can deduce the pitch of the arrays. Hopefully someone from PGI can edit a solution into this community wiki entry.

Flexible programing for inverse function or root finding in Freepascal

I have a huge lib of math functions, like pdf or cdf of statistical distributions. But often e.g. the inverse cdf can be only calculated numerically, e.g. using Newton-Raphson or bisection, in the latter we would need to check if cdf(x) is > or < then the target y0.
However, many functions have further parameters like a Gaussian distribution having certain mean and sigma, so cdf is cdf(x,mean,sigma). Whereas other functions, such as standard normal cdf, have no further parameters, or some have even 3 or 4 further parameters.
A similar problem would happen if you want to apply bisection for either linear functions (2 parameters) or parabolas (3 parameters). Or if you want not the inverse function, but e.g. the integral of f.
The easiest implementation would be to define cdf as global function f(x); and to check for >y0 or global variables.
However, this is a very old-fashioned way, and Freepascal also supports procedural parameters, for calls like x=icdf(0.9987,#cdfStdNorm)
Even overloading is supported to allow calls like x2=icdf(0.9987,0,2,#cdfNorm) to pass also mean and sigma.
But this ends up still in two separate code blocks (even whole functions), because in one case we need to call cdf only with x, and in 2nd example also with mean and sigma.
Is there an elegant solution for this problem in Freepascal? Maybe using variant records? Or an object-oriented approach? I have no glue about OO, but I know the variant object style would require to change at least the headers of many functions because I want to apply the technique not only for inverse cdf calculation, but also to numerical integration, root finding, optimization, etc.
Or is it "best" just to define a real function type with e.g. x + 5 parameters (maybe as array), and to ignore the unused parameters? But for me it looks that then I would need many "wrapper" functions or to re-code all the existing functions (to use the arrays, even if they are sometimes not needed!).
Maybe macros can help as well? Any Freepascal hints are very welcome!
If you make it a (function .. of object), mean and sigma could be part of the class, and the function could internally just access it. Only the really changing parameters during the iteration would be parameters. (read: x)
Anonymous methods as talked about by David and Rudy is a further step to avoid having to declare a class for each such invocation, but that is convenience thing and IMHO not the core of the question. At the expense of declaring the class, your core code is free of global variable use and anonymous methods might also come with a performance cost, depending on usage.
Free Pascal also supports nested functions (function... is nested), which is the original Pascal closure-like way which was never adopted by Pascal compilers from Borland. A nested procedure passed as callback can access local variables in the procedure where it was declared. The Free Pascal numlib numeric math package uses this in some cases for similar cases like yours. For math it is even more natural.
Delphi never implements old constructs because borrowing syntax from other languages looks better on bulletlists and keeps the subscriptions flowing.

What is the difference between a function and a subroutine?

What is the difference between a function and a subroutine? I was told that the difference between a function and a subroutine is as follows:
A function takes parameters, works locally and does not alter any value or work with any value outside its scope (high cohesion). It also returns some value. A subroutine works directly with the values of the caller or code segment which invoked it and does not return values (low cohesion), i.e. branching some code to some other code in order to do some processing and come back.
Is this true? Or is there no difference, just two terms to denote one?
I disagree. If you pass a parameter by reference to a function, you would be able to modify that value outside the scope of the function. Furthermore, functions do not have to return a value. Consider void some_func() in C. So the premises in the OP are invalid.
In my mind, the difference between function and subroutine is semantic. That is to say some languages use different terminology.
A function returns a value whereas a subroutine does not. A function should not change the values of actual arguments whereas a subroutine could change them.
Thats my definition of them ;-)
If we talk in C, C++, Java and other related high level language:
a. A subroutine is a logical construct used in writing Algorithms (or flowcharts) to designate processing functionality in one place. The subroutine provides some output based on input where the processing may remain unchanged.
b. A function is a realization of the Subroutine concept in the programming language
Both function and subroutine return a value but while the function can not change the value of the arguments coming IN on its way OUT, a subroutine can. Also, you need to define a variable name for outgoing value, where as for function you only need to define the ingoing variables. For e.g., a function:
double multi(double x, double y)
{
double result;
result = x*y;
return(result)
}
will have only input arguments and won't need the output variable for the returning value. On the other hand same operation done through a subroutine will look like this:
double mult(double x, double y, double result)
{
result = x*y;
x=20;
y = 2;
return()
}
This will do the same as the function did, that is return the product of x and y but in this case you (1) you need to define result as a variable and (2) you can change the values of x and y on its way back.
One of the differences could be from the origin where the terminology comes from.
Subroutine is more of a computer architecture/organization terminology which means a reusable group of instructions which performs one task. It is is stored in memory once, but used as often as necessary.
Function got its origin from mathematical function where the basic idea is mapping a set of inputs to a set of permissible outputs with the property that each input is related to exactly one output.
In terms of Visual Basic a subroutine is a set of instructions that carries out a well defined task. The instructions are placed within Sub and End Sub statements.
Functions are similar to subroutines, except that the functions return a value. Subroutines perform a task but do not report anything to the calling program. A function commonly carries out some calculations and reports the result to the caller.
Based on Wikipedia subroutine definition:
In computer programming, a subroutine is a sequence of program
instructions that perform a specific task, packaged as a unit. This
unit can then be used in programs wherever that particular task should
be performed.
Subroutines may be defined within programs, or separately in libraries
that can be used by many programs. In different programming languages,
a subroutine may be called a procedure, a function, a routine, a
method, or a subprogram. The generic term callable unit is sometimes
used.
In Python, there is no distinction between subroutines and functions.
In VB/VB.NET function can return some result/data, and subroutine/sub can't.
In C# both subroutine and function referred to a method.
Sometimes in OOP the function that belongs to the class is called a method.
There is no more need to distinguish between function, subroutine and procedure because of hight level languages abstract that difference, so in the end, there is very little semantic difference between those two.
Yes, they are different, similar to what you mentioned.
A function has deterministic output and no side effects.
A subroutine does not have these restrictions.
A classic example of a function is int multiply(int a, int b)
It is deterministic as multiply(2, 3) will always give you 6.
It has no side effects because it does not modify any values outside its scope, including the values of a and b.
An example of a subroutine is void consume(Food sandwich)
It has no output so it is not a function.
It has side effects as calling this code will consume the sandwich and you can't call any operations on the same sandwich anymore.
You can think of a function as f(x) = y, or for the case of multiply, f(a, b) = c. Yes, this is programming and not math. But math models and begs to be used. So we use math in cs. If you are interested to know why the distinction between function and subroutine, you should check out functional programming. It works like magic.
From the view of the user, there is no difference between a programming function and a subroutine but in theory, there definitely is!
The concept itself is different between a subroutine and a function. Formally, the OP's definition is correct. Subroutines don't take arguments or give return values by formal semantics. That's just an interpretion with conventions. And variables in subroutines are accessible in other subroutines of the same file although this can be achieved as well in C with some difficulties.
Summary:
Subroutines work only based on side-effects, in the view of the programming language you are programming with. The concept itself has no explicit arguments or return values. You have to use side effects to simulate them.
Functions are mappings of input to output value(s) in the original sense, some kind of general substitution operation. In the adopted sense of the programming world, functions are an abstraction of subroutines with information about return value and arguments, inspired by mathematical functions. The additional formal abstraction differentiates a function from a subroutine in programming context.
Details:
The subroutine originally is simply a repeatable snippet of code which you can call in between other code. It originates in Assembly or Machine language programming and designates the instruction sequence itself. In the light of this meaning, Perl also uses the term subroutine for its callable code snippets.
Subroutines are concrete objects.
This is what I understood: the concept of a (pure) function is a mathematical concept which is a special case of mathematical relations with an own formal notation. You have an input or argument and it is defined what value is represented by the function with the given argument. The original function concept is entirely unrelated to instructions or calculations. Mathematical operations (or instructions in the programming world) only are a popular formal representation (description) of the actual mapping. The original function term itself is not defined as code. Calculations do not constitute the function, so that functions actually don't have any computational overhead because they are direct mappings. Function complexity considerations only arrived as there is an overhead to find the mapping.
Functions are abstract objects.
Now, since the whole PC-stuff is running on small machine instructions, the easiest way to model (or instantiate) mathematics is with a sequence of instructions itself. Computer Science has been founded by mathematicians (noteworthy: Alan Turing) and the first programming concepts are based on it so there is a need to bring mathematics into the machine. That's how I imagine the reason why "function" is the name of something which is implemented as subroutine and why the term "pure" function was coined to differentiate the original function concept from the overly broad term-use in programming languages.
Note: in Assembly Language Programming, it is typically said, that a subroutine has been passed arguments and gives a return value. This is an interpretation on top of the concrete formal semantics. Calling conventions specify the location where values, to be considered as arguments and return values, should be written to before calling a subroutine or returning. The call itself takes only a subroutine address, and has no formal arguments or return values.
PS: functions in programming languages don't necessarily need to be a subroutine (even though programming language terminology developed this way). Functions in functional programming languages can be constant variables, arrays or hash tables. Isn't every datastructure in ECMAScript a function?
The difference is isolation. A subroutine is just a piece of the program that begins with a label and ends with a go to. A function is outside the namespace of the rest of the program. It is like a separate program that can have the same variable names as used in the calling program, and whatever it does to them does not affect the state of those variables with the same name in the calling program.
From a coding perspective, the isolation means that you don’t have to use the variable names that are local to the function.
Sub double:
a = a + a
Return
fnDouble(whatever):
whatever = whatever + whatever
Return whatever
The subroutine works only on a. If you want to double b you have to set a = b before calling the subroutine. Then you may need to set a to null or zero after. Then when you want to double c you have to again set a to equal c.
Also the sub might have in it some other variable, z, that is changed when the sub is jumped to, which is a bit dangerous.
The essential is isolation of names to the function (unless declared global in the function.)
I am writing this answer from a VBA for excel perspective. If you are writing a function then you can use it as an expression i. e. you can call it from any cell in excel.
eg: normal vlookup function in excel cannot look up values > 256 characters. So I used this function:
Function MyVlookup(Lval As Range, c As Range, oset As Long) As Variant
Dim cl As Range
For Each cl In c.Columns(1).Cells
If UCase(Lval) = UCase(cl) Then
MyVlookup = cl.Offset(, oset - 1)
Exit Function
End If
Next
End Function
This is not my code. Got it from another internet post. It works fine.
But the real advantage is I can now call it from any cell in excel. If wrote a subroutine I couldn't do that.
Every subroutine performs some specific task. For some subroutines, that task is to compute or retrieve some data value. Subroutines of this type are called functions. We say that a function returns a value. Generally, the returned value is meant to be used somehow in the program that calls the function.