Constructor/Function overload signature lookup time complexity? - constructor

I was reading up on the std::string class in C++ and noticed there are quite a few different constructors available giving us a wide set of initialization features. This got me wondering how a compiler picks which constructor to choose when given parameters, or in the case of overloads, how a compiler matches a function signature with a given set of parameters.
If we have the following functions declared in pseudo-code:
function f1(int numberHere) {
//....do something
}
function f1(int numberHere, string stringHere) {
//....do something
}
And I decide to call f1(4), there are obviously two options to choose from, but what if there are 10000 options/signatures? Would it take proportionally longer? If so, what takes longer? Does the compiler have some sneaky O(n) way to index overloads such that it can call the right one in O(1) time once the program is running or would it compile in O(1) no matter how many overloads exist but take longer to run the finished result because of on-the-fly signature matching?
Can this question even be answered effectively?
Thanks!

Matching function signatures is actually not different from any other search or lookup problem. There are three basic ways to do it depending on the data structure you are storing the available function signatures in:
Use an unsorted list or array and get O(n) time complexity.
Use a sorted array or a tree-like structure and get O(log(n)). (You can sort by type of 1st argument, then 2nd and so on, assuming that each type has an integer id assigned to it.)
Use a hash map and get O(1).
But I doubt that time complxity has any practical relevance in this case. It describes the asymptotic behaviour of algorithms for large values of n. Even for n=100, an unsorted array search might be faster than hash map lookup because it has less overhead.
And from a usability point of view it is a very bad idea to design an API having functions with 10 or even 100 overloads.

Related

TFPCustomHashTable constructor use 196613 constant. Why use this particular value?

Following code is part of contnrs unit of FreePascal
constructor TFPCustomHashTable.Create;
begin
CreateWith(196613,#RSHash);
end;
I am curious about 196613. I know it is hash table size. Is there any particular reason why this value is used?
In my test, constructor took about 3-4 ms to execute, which in my particular situation is not accepted. I suspect this is related to this constant value.
Update:
My question are:
Why 196613 is chosen? Why not other prime number?
Does it affect constructor call execution time?
196613 is one of the numbers being recommended for hash tables sizes (prime and far away from powers of two), for more info see e.g. https://planetmath.org/goodhashtableprimes.
It affects the constructor call execution time, yes. You can always construct TFPCustomHashTable using CreateWith and pass a size of your choice (any number is fine, as resizing algorithm checks for suitable sizes anyways) as well as your own hash function (or the pre-defined RSHash):
MyHashTable:=TFPCustomHashTable.CreateWith(193,#RSHash);
Keep in mind though, that resizing a hash table is an expensive operation as it requires to recalculate the hash function for all elements, so starting with a too small value is neither a good idea.

Why did the designer make vector, map, and set functions in clojure?

Rich made vector, map, and set functions, while list, and sequence are not functions.
Why cannot all these collections be function to make it consistent?
Further, why don't we make all these compose data as a function which maps position to it's internal data?
If we make all these compose data as function then there will be only function and atom data in clojure. This will minimize the fundamental elements in that language right?
I believe a minimal, best only 2, set of fundamental elements would make the language simpler, more expressive and more flexible. Is this correct?
Vectors, maps, and sets are all associative data structures. Maps are the most obvious; they simply associate arbitrary keys with arbitrary values. A vector can be thought of as a map whose key set must be the set of all nonnegative integers less than the vector's size. Finally, sets can be thought of as maps that map keys to themselves.
It's important to understand that the sequential nature of a vector and the associative nature of a vector are two orthogonal things. It's a data structure that's designed to be good at supporting both abstractions (to some extent; for instance, you can't efficiently insert at the beginning of a vector).
Lists are simpler than vectors; they are finite sequential data structures, nothing more. A list can't efficiently return the element at a particular index, so it doesn't expose that functionality as part of its core interface. Of course, you can get an element of a list by index using nth, but in that case, you're explicitly treating it as a sequence, not as an associative structure.
So to answer your question, the IFn implementations for vectors, maps, and sets are there because of the extremely close relationship between the idea of an associative data structure and the idea of a pure function. Lists and other sequences are not inherently associative, so for consistency, they do not implement IFn.
Elogent's answer is excellent. There is one more reason that it wouldn't make sense for lists to be functions:
Literal lists already have a different, very important role, so they can't also be treated as functions in the way that vectors are.
Let's start with a vector containing two functions, partial and +, and a number, 5. We can treat the vector as a function, as you know, to return the value indexed by its argument:
user=> ([partial + 5] 2)
5
So far, so good. Suppose we want to use a list (partial + 5) in place of the vector, as you suggested, to return the value 5. Will we get an error message? No! But we won't get 5 as the result, either:
user=> ((partial + 5) 2)
7
What happened? (partial + 5) returned a function--the function that adds 5 to its single argument--and then this function was applied to the argument 2.
When a list is evaluated, its first element is evaluated, and should return a function. If the first element is a symbol, it's evaluated, and then the function that's its value is applied to the arguments, which are the other elements of the list. If the first argument of a list is itself a list, then it is evaluated in the same way that it would be evaluated if it were at the top level. The entire expression in that inner list should return a function, which will then be applied to the other elements of the outer list.
Since an inner list that's the first element of list that's being evaluated already has this role, it can't also play the kind of role that vectors that are first elements play.

Octave force deepcopy

The question
What are the ways of coercing octave to create a real copy of whatever object? Structures are the main interest.
My underlying problem
In my problem I'm obtaining a rather large structure from another function in a loop but for the current task only a few pieces of it are needed. For example:
for i=1:many
res=solver(params);
store1{i}=res.string1;
store2{i}=res.arr(:,1);
end
res is a sizable chunk of data and due to lazy-copy those store-s are references to tiny portions of bytes in that chunk. After I store those tiny portions, I don't need res itself, however, since middle of that chunk is referenced by store, the memory area is unfit for res obtained on the next iteration (they are of the same size) and thus another sizable piece of memory is allocated, which is then again crossed by few tiny links an so on.
Without storing parts of res, the program successfully keeps the memory consumption same after first couple of iterations.
So how do I make a complete copy of structure field?
I've tried using struct-related functions like rmfield but those keep references instead of their own objects.
I've tried to wrap the assignment of in its own function:
new_struct=copy( rmfield(old_struct,"bigdata"));
function c=copy(a);
c=a;
end;
This by the way doesn't work even for arrays.
I'm interested in method applicable to any generic variable.
Minimal working example of the problem
a=cell(3,1);
for i=1:length(a);
r=rand(100000,1000);
a{i}=r(1:100,end);
whos; fflush(stdout);
pause(2);
end;
The above code will cause memory usage to gradually grow by far more than 8.08 kb reported by whos due to references stored by a{i} blocking bigger memory block than they actually need. If you force the proper copy, the problem is not present.
Numerical arrays
For numeric types addition of zero is enough to warrant a new array.
c=a+0;
Strings
For string which is 1 x n char array, something along the following lines will work:
c=[a "a"](1:end-1);
Multidimensional char arrays will require concatenation with a column:
c=[a true(size(a,1),1)](:,1:end-1);
Here true is used to generate dummy array of size compatible with char. (There seems to be no procedural method of generating char array of arbitrary size) char(zeros(size(a,1),1)) and char(true(size(a,1),1)) caused excess memory usage during their creation on some calls.
Note that empty concatenation c=[a ""]; will not result in a copying. Also it is possible to do c=[a+0 ""]; which will result in a copying due to +0 but that one infers type conversions to and from double which is 8 times larger in size. (char(zeros( doesn't seem to cause that)
Other types
In general you can use casting for the types allowed by it in order to not tailor the expressions manually as I had to do above:
typelist={"double","single","char"}; %full list of supported types is available in the link
class_of_a = typelist{ isa(a,typelist) };
c=typecast( [typecast(a,'single'); single(1)] (1:end-1), class_of_a);
Single is seemingly smallest datatype available in octave.
Note that logical is not supported by this method.
Copying structures
Apparently you'd have to write your own function to go around struct fields, copy them with above methods and recursively go to substructs.
(As it doesn't involve complexities relevant here, I'd rather leave that to be done by those who actually needs that, my own problem being solved by +0's.)

Guidelines for listing the order of function arguments

Are there any rules that you follow to determine the order of function arguments? For example, float pow(float x, float exponent) vs float pow(float exponent, float x). For concreteness, C++ could be used, but the question is valid for all programming languages.
My main concern is from the usability point of view, not runtime performance.
Edit:
Some possible bases for ordering could be:
Inputs versus Output
The way a "formula" is usually written, i.e., arguments from left-to-write.
Specificity to the argument to the context of the function, i.e., whether it is a "general" argument, e.g., a singleton object of the system, or specific.
In the example you cite, I think the order was decided on the basis of the mathematical notation xexponent, in which the base is written before the exponent and becomes the left parameter.
I'm not aware of any really sound general principle other than to try to imagine what your users will expect and/or easily remember. People aren't even wholly agreed whether you should write (source, destination) or (destination, source) when copying (compare std::copy with std::memcpy), although I'm pretty sure that the former is now much more common.
There are a whole lot of general conventions, though, followed to different extents by different people:
if the function is considered primarly to act upon a particular object, put it first
parameters that are considered to "configure" the operation of the function come after parameters that are considered the main subject of the function.
out-params come last (but I suspect some people follow the reverse)
To some extent it doesn't really matter -- namely the extent to which your users have IDEs that tell them the parameter order as they type the function name.

What is (or should be) the cyclomatic complexity of a virtual function call?

Cyclomatic Complexity provides a rough metric for how hard to understand a given function is, or how much potential for containing bugs it has. In the implementations I've read about, usually all of the basic control flow constructs (if, case, while, for, etc.) increase the complexity of a function by 1. It appears to me given that cyclomatic complexity is intended to determine "the number of linearly independent paths through a program's source code" that virtual function calls should increase the cyclomatic complexity of a function as well, because of the ambiguity of which implementation will be called at runtime (the call creates another branch in the path of execution).
However, penalizing the function the same amount that one would if it contained an equivalent switch statement (one point for every 'case' keyword, with one case keyword for every class in the hierarchy implementing the virtual function in question) feels overly harsh, because a virtual function call is generally regarded as much better programming practice.
What should the cost in cyclomatic complexity of a virtual function call be? I'm not sure if my reasoning is an argument against the utility of cyclomatic complexity as a metric or one against the use of virtual functions or something different.
Edit: After people's responses I realized that it shouldn't add to cyclomatic complexity because we could consider the virtual function call equivalent to a call to a global function that contains the massive switch statement. Even though that function will get a bad score, it only exists once in the program, whereas replacing each virtual function call directly with switch statement would cause the cost many times.
Cyclomatic complexity usually does not apply across function call boundaries, but is an intra-function metric. Hence, virtual calls do not count any more than non-virtual, static function calls.
A virtual function call does not increase the cyclomatic complexity, because the "ambiguity [over] of which implementation will be called" is external to the function call. Once the objects value is set, there is no ambiguity. We know exactly what methods will be called.
BaseClass baseObj = null;
// this part has multiple paths & add to CC
if (x == y)
baseObj = new Derived1();
else
baseObj = new Derived2();
// this part has one path and does not add to the CC
baseObj.virtualMethod1();
baseObj.virtualMethod2();
baseObj.virtualMethod3();
virtual function calls should increase
the cyclomatic complexity of a
function as well, because of the
ambiguity of which implementation will
be called at runtime
Ah, but it isn't ambiguous at runtime (unless you're doing metaprogramming / monkey patching); it's completely determined by the type/class of the receiver.
I'm not a big fan of cyclomatic complexity, but in this case you're calling a function. It will do approximately the same thing (unless the class hierarchy design is really screwed up), with some variations depending on that calls it. Thing is, if you call any function, you can get some varied behavior depending on the arguments you pass in, and this isn't counted in CC.
Therefore, I'd completely ignore that cost.