I'm looking at the recent c++14 overloads for std::equal, and I can't figure out just what they do and are used for...
The two overloads are:
template< class InputIt1, class InputIt2 >
bool equal( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2 );
template< class InputIt1, class InputIt2, class BinaryPredicate >
bool equal( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
BinaryPredicate p );
I fully understand the traditional std::equal that uses just one InputIt2, but the second InputIt2 last2 is twisting my brain. Could someone explain and give an example of this?
The new overloads are actually pretty great. You pass in two full ranges, beginning and end, and rather than run off the end of the shorter one and invoke undefined behaviour, the algorithm stops.
Such improvements were also added to std::mismatch and std::is_permutation. You can read more about this in the proposal
For std::equal, the algorithm will simply return false if the lengths are not equal.
For std::mismatch, if the algorithm hits the end of one range, it will return that iterator and the corresponding iterator from the other range.
For std::is_permutation, the algorithm will also simply return false if the ranges are not equal in length.
For reasoning about why, consider that the programmer checking the length is not necessarily possible or cheap. A range obtained from a std::list without the original list would need to be traversed through to get the size. A range that uses an InputIterator, such as for reading from standard input, is potentially infinite until it hits an end, and it is only allowed to be traversed through once, so the algorithm could no longer use it after you do that. Thank Benjamin Lindley below for that last example.
Related
I was reading up on the std::string class in C++ and noticed there are quite a few different constructors available giving us a wide set of initialization features. This got me wondering how a compiler picks which constructor to choose when given parameters, or in the case of overloads, how a compiler matches a function signature with a given set of parameters.
If we have the following functions declared in pseudo-code:
function f1(int numberHere) {
//....do something
}
function f1(int numberHere, string stringHere) {
//....do something
}
And I decide to call f1(4), there are obviously two options to choose from, but what if there are 10000 options/signatures? Would it take proportionally longer? If so, what takes longer? Does the compiler have some sneaky O(n) way to index overloads such that it can call the right one in O(1) time once the program is running or would it compile in O(1) no matter how many overloads exist but take longer to run the finished result because of on-the-fly signature matching?
Can this question even be answered effectively?
Thanks!
Matching function signatures is actually not different from any other search or lookup problem. There are three basic ways to do it depending on the data structure you are storing the available function signatures in:
Use an unsorted list or array and get O(n) time complexity.
Use a sorted array or a tree-like structure and get O(log(n)). (You can sort by type of 1st argument, then 2nd and so on, assuming that each type has an integer id assigned to it.)
Use a hash map and get O(1).
But I doubt that time complxity has any practical relevance in this case. It describes the asymptotic behaviour of algorithms for large values of n. Even for n=100, an unsorted array search might be faster than hash map lookup because it has less overhead.
And from a usability point of view it is a very bad idea to design an API having functions with 10 or even 100 overloads.
Are there any rules that you follow to determine the order of function arguments? For example, float pow(float x, float exponent) vs float pow(float exponent, float x). For concreteness, C++ could be used, but the question is valid for all programming languages.
My main concern is from the usability point of view, not runtime performance.
Edit:
Some possible bases for ordering could be:
Inputs versus Output
The way a "formula" is usually written, i.e., arguments from left-to-write.
Specificity to the argument to the context of the function, i.e., whether it is a "general" argument, e.g., a singleton object of the system, or specific.
In the example you cite, I think the order was decided on the basis of the mathematical notation xexponent, in which the base is written before the exponent and becomes the left parameter.
I'm not aware of any really sound general principle other than to try to imagine what your users will expect and/or easily remember. People aren't even wholly agreed whether you should write (source, destination) or (destination, source) when copying (compare std::copy with std::memcpy), although I'm pretty sure that the former is now much more common.
There are a whole lot of general conventions, though, followed to different extents by different people:
if the function is considered primarly to act upon a particular object, put it first
parameters that are considered to "configure" the operation of the function come after parameters that are considered the main subject of the function.
out-params come last (but I suspect some people follow the reverse)
To some extent it doesn't really matter -- namely the extent to which your users have IDEs that tell them the parameter order as they type the function name.
I have a question that I cannot answer myself, but it seems like a fundamentally good question to clear up:
Why do some languages restrict the data returned from a function to a single item?
Is this serving some benefit? Or is it a practice brought over from Maths?
An example being (in Scala):
def login(username: String, password: String): User
If I wanted to return multiple items I cannot say it in the same manner as I just did for the input arguments (now entering imaginary Scala land)
def login(username: String, password: String): (User, Context, String)
Or even with named data returned:
def login(username: String, password: String): (user: User, context: Context, serverMessage: String)
There is no relationship: as observed, an arbitrary number of values can be returned, even if they must be "packaged" into a single value.
Imagine a language that can only accept a single tuple and can only return a single tuple from a function (the tuples can be any size). These functions then resemble math function transforming a vector from one space to another.
However, some reasons why it might be so:
Most functions only return one value, which may be a collection of values (object, sequence, etc.) Decompositions of the single value is supported in a number of languages, even though "only one value is returned".
The calling conventions and signatures are simpler: there is no special case/overhead to signal that n-values are being returned: there is no need to use part of the stack to return multiple values, a single register will do.
The need to fit in with the target architecture: earlier, especially lower-level languages, were heavily influenced by the computer architecture. In the case of Scala, for instance, it must work on the JVM.
It's just how the language was designed. Many (most?) languages borrow heavily -- syntax and/or methodologies -- from existing languages. Sometimes this is good, sometimes it is not so good. C# appeased Java appeased C++ appeased C, for instance: it's all about the market share.
It Just Works.
Even while "returning only one value", programming languages already have different ways of dealing with it. As noted in the post, some languages allow decomposition (the tuple returned as "decomposed" into it's two values during an assignment):
def multiMath (i):
return (i + i, i * i)
double, squared = multiMath(4)
# doubled is 8
# squared is 16
Additionally, other languages like C# which lacks decomposition, allow pass-by-reference (or emulate it in with mutation of an object):
void multiMath (int a, out int doubled, out int squared) {
doubled = a + a;
squared = a * a;
}
int d, s;
multiMath(4, out d, out s);
// d is now 8
// s is now 16
And, of course... ;-)
class ANewClassForThisFunctionsReturn {
...
}
There are likely more methods I am not aware of.
Happy coding.
Because tipicaly returned data is assigned to a variable. And thy are only few languages that can assign two variables in a single sentence.
A = sum(1,2)
B,C = dateTime()
Technically they are no problem to return more than one parameter because parameters are stacked, the issue is on assignment. Here a sample of this needed:
/* div example */
#include <stdio.h>
#include <stdlib.h>
int main ()
{
div_t divresult;
divresult = div (38,5);
printf ("38 div 5 => %d, remainder %d.\n", divresult.quot, divresult.rem);
return 0;
}
V.S
long quot, rem;
quot, rem = div(38,5)
I already posted a question about function equality. It quickly concluded that general function equality is an incredibly hard problem and might be mathematically disprovable.
I would like to stub up a function
function equal(f, g, domain) {
}
f & g are halting functions that take one argument. Their argument is an natural number. These functions will return a boolean.
If no domain is passed then you may assume the domain defaults to all natural numbers.
The structure of domain is whatever is most convenient for the equal function.
Another important fact is that f & g are deterministic. and will consistantly return the same boolean m for f(n).
You may assume that f and g always return and don't throw any exceptions or crash due to errors as long as their input is within the domain
The question is language-agnostic and Asking for an implementation of the equal function. i'm not certain whether SO is the right place for this anymore.
f & g have no side-effects. and the domain does not have to be finite.
It's still not possible.
You could test both functions for some finite number of inputs and check them for equality on those inputs. If they are unequal for any input then the two functions are not identical. If they are equal in every case you tested then there is a reasonable chance that they are the same function, but you can't be completely certain.
In general it is infeasible to test every possible input unless the domain is small. If the domain is a 32 bit integer and your function is quite fast to evaluate then it might be feasible to check every possible input.
I believe the following to be the best you can do without doing static analysis on the source code:
function equal(f, g, domain) {
var n;
for (n in domain) {
if (f(domain[n]) != g(domain[n])) return false;
}
return true;
}
Note that this assumes the domain to be finite.
If the domain is not finite, Rice's theorem prevents such an algorithm from existing:
If we let f and g be the implementations and F and G be the mathematical functions these implementations calculate the values of, then it's Rice's theorem says that it's impossible to determine if f calculates G or g calculates F, as these are non-trivial properties of the implementations.
For further detail, see my answer to the previous question.
Depending on your use-case, you might be able to do some assumptions about f & g . Maybe in your case, they apply under specific conditions what might make it solvable.
In other cases, the only thing what I might recommend is fuzzy testing , on Abstract Syntax Tree or other representation.
The word seems to get used in a number of contexts. The best I can figure is that they mean a variable that can't change. Isn't that what constants/finals (darn you Java!) are for?
An invariant is more "conceptual" than a variable. In general, it's a property of the program state that is always true. A function or method that ensures that the invariant holds is said to maintain the invariant.
For instance, a binary search tree might have the invariant that for every node, the key of the node's left child is less than the node's own key. A correctly written insertion function for this tree will maintain that invariant.
As you can tell, that's not the sort of thing you can store in a variable: it's more a statement about the program. By figuring out what sort of invariants your program should maintain, then reviewing your code to make sure that it actually maintains those invariants, you can avoid logical errors in your code.
It is a condition you know to always be true at a particular place in your logic and can check for when debugging to work out what has gone wrong.
The magic of wikipedia: Invariant (computer science)
In computer science, a predicate that,
if true, will remain true throughout a
specific sequence of operations, is
called (an) invariant to that
sequence.
This answer is for my 5 year old kid. Do not think of an invariant as a constant or fixed numerical value. But it can be. However, it is more than that.
Rather, an invariant is something like of a fixed relationship between varying entities. For example, your age will always be less than that compared to your biological parents. Both your age, and your parent's age changes in the passage of time, but the relationship that i mentioned above is an invariant.
An invariant can also be a numerical constant. For example, the value of pi is an invariant ratio between the circle's circumference over its diameter. No matter how big or small the circle is, that ratio will always be pi.
I usually view them more in terms of algorithms or structures.
For example, you could have a loop invariant that could be asserted--always true at the beginning or end of each iteration. That is, if your loop was supposed to process a collection of objects from one stack to another, you could say that |stack1|+|stack2|=c, at the top or bottom of the loop.
If the invariant check failed, it would indicate something went wrong. In this example, it could mean that you forgot to push the processed element onto the final stack, etc.
As this line states:
In computer science, a predicate that, if true, will remain true throughout a specific sequence of operations, is called (an) invariant to that sequence.
To better understand this hope this example in C++ helps.
Consider a scenario where you have to get some values and get the total count of them in a variable called as count and add them in a variable called as sum
The invariant (again it's more like a concept):
// invariant:
// we have read count grades so far, and
// sum is the sum of the first count grades
The code for the above would be something like this,
int count=0;
double sum=0,x=0;
while (cin >> x) {
++count;
sum+=x;
}
What the above code does?
1) Reads the input from cin and puts them in x
2) After one successful read, increment count and sum = sum + x
3) Repeat 1-2 until read stops ( i.e ctrl+D)
Loop invariant:
The invariant must be True ALWAYS. So initially you start out your code with just this
while(cin>>x){
}
This loop reads data from standard input and stores in x. Well and good. But the invariant becomes false because the first part of our invariant wasn't followed (or kept true).
// we have read count grades so far, and
How to keep the invariant true?
Simple! increment count.
So ++count; would do good!. Now our code becomes something like this,
while(cin>>x){
++count;
}
But
Even now our invariant (a concept which must be TRUE) is False because now we didn't satisfy the second part of our invariant.
// sum is the sum of the first count grades
So what to do now?
Add x to sum and store it in sum ( sum+=x) and the next time
cin>>x will read a new value into x.
Now our code becomes something like this,
while(cin>>x){
++count;
sum+=x;
}
Let's check
Whether code matches our invariant
// invariant:
// we have read count grades so far, and
// sum is the sum of the first count grades
code:
while(cin>>x){
++count;
sum+=x;
}
Ah!. Now the loop invariant is True always and code works fine.
The above example was taken and modified from the book Accelerated C++ by Andrew-koening and Barbara-E
Something that doesn't change within a block of code
All the answers here are great, but i felt that i can shed more light on the matter:
Invariant from a language point of view means something that never changes. The concept though comes actually from math, it's one of the popular proof techniques when combined with induction.
Here is how a proof goes, If you can find an invariant that is in the initial state, And that this invariant persists regardless of any [legal] transformation applied to the state, then you can prove that If a certain state does not have this invariant then it can never occur, no matter what sequence of transformations are applied to the initial state.
Now the previous way of thinking (again combined with induction) makes it possible to predicate the logic of computer software. Especially important when the execution goes in loops, in which an invariant can be used to prove that a certain loop will yield a certain result or that it will never change the state of a program in a certain way.
When invariant is used to predicate a loop logic its called loop invariant. It can be used outside loops, but for loops it is really important, because you often have a lot of possibilities, or an infinite number of possibilities.
Notice that i use the word "predicate" the logic of a computer software, and not prove. And that's because while in math invariant can be used as a proof, it can never prove that the computer software when executed will yield what is expected, due to the fact that the software is executed on top of many abstractions, that can never be proved that they will yield what is expected (think of the hardware abstraction for example).
Finally while theoretically and rigorously predicting software logic is only important for high critical applications like Medical, and Military ones. Invariant can still be used to aid the typical programmer when debugging. It can be used to know where at a certain location The program failed because it has failed to maintain a certain invariant - many of us use it anyway without giving a thought about it.
Class Invariant
Class Invariant is a condition which should be always true before and after calling relevant function
For example balanced tree has an Invariant which is called isBalanced. When you modify your tree through some methods (e.g. addNode, removeNode...) - isBalanced should be always true before and after modifying the tree
Following on from what it is, invariants are quite useful in writing clean code, since knowing conceptually what invariants should be present in your code allows you to easily decide how to organize your code to reach those aims. As mentioned ealier, they're also useful in debugging, as checking to see if the invariant's being maintained is often a good way of seeing if whatever manipulation you're attempting to perform is actually doing what you want it to.
It's typically a quantity that does not change under certain mathematical operations.
An example is a scalar, which does not change under rotations. In magnetic resonance imaging, for example, it is useful to characterize a tissue property by a rotational invariant, because then its estimation ideally does not depend on the orientation of the body in the scanner.
The ADT invariant specifes relationships
among the data fields (instance variables)
that must always be true before and after
the execution of any instance method.
There is an excellent example of an invariant and why it matters in the book Java Concurrency in Practice.
Although Java-centric, the example describes some code that is responsible for calculating the factors of a provided integer. The example code attempts to cache the last number provided, and the factors that were calculated to improve performance. In this scenario there is an invariant that was not accounted for in the example code which has left the code susceptible to race conditions in a concurrent scenario.