I have a question that I cannot answer myself, but it seems like a fundamentally good question to clear up:
Why do some languages restrict the data returned from a function to a single item?
Is this serving some benefit? Or is it a practice brought over from Maths?
An example being (in Scala):
def login(username: String, password: String): User
If I wanted to return multiple items I cannot say it in the same manner as I just did for the input arguments (now entering imaginary Scala land)
def login(username: String, password: String): (User, Context, String)
Or even with named data returned:
def login(username: String, password: String): (user: User, context: Context, serverMessage: String)
There is no relationship: as observed, an arbitrary number of values can be returned, even if they must be "packaged" into a single value.
Imagine a language that can only accept a single tuple and can only return a single tuple from a function (the tuples can be any size). These functions then resemble math function transforming a vector from one space to another.
However, some reasons why it might be so:
Most functions only return one value, which may be a collection of values (object, sequence, etc.) Decompositions of the single value is supported in a number of languages, even though "only one value is returned".
The calling conventions and signatures are simpler: there is no special case/overhead to signal that n-values are being returned: there is no need to use part of the stack to return multiple values, a single register will do.
The need to fit in with the target architecture: earlier, especially lower-level languages, were heavily influenced by the computer architecture. In the case of Scala, for instance, it must work on the JVM.
It's just how the language was designed. Many (most?) languages borrow heavily -- syntax and/or methodologies -- from existing languages. Sometimes this is good, sometimes it is not so good. C# appeased Java appeased C++ appeased C, for instance: it's all about the market share.
It Just Works.
Even while "returning only one value", programming languages already have different ways of dealing with it. As noted in the post, some languages allow decomposition (the tuple returned as "decomposed" into it's two values during an assignment):
def multiMath (i):
return (i + i, i * i)
double, squared = multiMath(4)
# doubled is 8
# squared is 16
Additionally, other languages like C# which lacks decomposition, allow pass-by-reference (or emulate it in with mutation of an object):
void multiMath (int a, out int doubled, out int squared) {
doubled = a + a;
squared = a * a;
}
int d, s;
multiMath(4, out d, out s);
// d is now 8
// s is now 16
And, of course... ;-)
class ANewClassForThisFunctionsReturn {
...
}
There are likely more methods I am not aware of.
Happy coding.
Because tipicaly returned data is assigned to a variable. And thy are only few languages that can assign two variables in a single sentence.
A = sum(1,2)
B,C = dateTime()
Technically they are no problem to return more than one parameter because parameters are stacked, the issue is on assignment. Here a sample of this needed:
/* div example */
#include <stdio.h>
#include <stdlib.h>
int main ()
{
div_t divresult;
divresult = div (38,5);
printf ("38 div 5 => %d, remainder %d.\n", divresult.quot, divresult.rem);
return 0;
}
V.S
long quot, rem;
quot, rem = div(38,5)
Related
I'm looking at the recent c++14 overloads for std::equal, and I can't figure out just what they do and are used for...
The two overloads are:
template< class InputIt1, class InputIt2 >
bool equal( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2 );
template< class InputIt1, class InputIt2, class BinaryPredicate >
bool equal( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
BinaryPredicate p );
I fully understand the traditional std::equal that uses just one InputIt2, but the second InputIt2 last2 is twisting my brain. Could someone explain and give an example of this?
The new overloads are actually pretty great. You pass in two full ranges, beginning and end, and rather than run off the end of the shorter one and invoke undefined behaviour, the algorithm stops.
Such improvements were also added to std::mismatch and std::is_permutation. You can read more about this in the proposal
For std::equal, the algorithm will simply return false if the lengths are not equal.
For std::mismatch, if the algorithm hits the end of one range, it will return that iterator and the corresponding iterator from the other range.
For std::is_permutation, the algorithm will also simply return false if the ranges are not equal in length.
For reasoning about why, consider that the programmer checking the length is not necessarily possible or cheap. A range obtained from a std::list without the original list would need to be traversed through to get the size. A range that uses an InputIterator, such as for reading from standard input, is potentially infinite until it hits an end, and it is only allowed to be traversed through once, so the algorithm could no longer use it after you do that. Thank Benjamin Lindley below for that last example.
The concept of lambdas (anonymous functions) is very clear to me. And I'm aware of polymorphism in terms of classes, with runtime/dynamic dispatch used to call the appropriate method based on the instance's most derived type. But how exactly can a lambda be polymorphic? I'm yet another Java programmer trying to learn more about functional programming.
You will observe that I don't talk about lambdas much in the following answer. Remember that in functional languages, any function is simply a lambda bound to a name, so what I say about functions translates to lambdas.
Polymorphism
Note that polymorphism doesn't really require the kind of "dispatch" that OO languages implement through derived classes overriding virtual methods. That's just one particular kind of polymorphism, subtyping.
Polymorphism itself simply means a function allows not just for one particular type of argument, but is able to act accordingly for any of the allowed types. The simplest example: you don't care for the type at all, but simply hand on whatever is passed in. Or, to make it not quite so trivial, wrap it in a single-element container. You could implement such a function in, say, C++:
template<typename T> std::vector<T> wrap1elem( T val ) {
return std::vector(val);
}
but you couldn't implement it as a lambda, because C++ (time of writing: C++11) doesn't support polymorphic lambdas.
Untyped values
...At least not in this way, that is. C++ templates implement polymorphism in rather an unusual way: the compiler actually generates a monomorphic function for every type that anybody passes to the function, in all the code it encounters. This is necessary because of C++' value semantics: when a value is passed in, the compiler needs to know the exact type (its size in memory, possible child-nodes etc.) in order to make a copy of it.
In most newer languages, almost everything is just a reference to some value, and when you call a function it doesn't get a copy of the argument objects but just a reference to the already-existing ones. Older languages require you to explicitly mark arguments as reference / pointer types.
A big advantage of reference semantics is that polymorphism becomes much easier: pointers always have the same size, so the same machine code can deal with references to any type at all. That makes, very uglily1, a polymorphic container-wrapper possible even in C:
typedef struct{
void** contents;
int size;
} vector;
vector wrap1elem_by_voidptr(void* ptr) {
vector v;
v.contents = malloc(sizeof(&ptr));
v.contents[0] = ptr;
v.size = 1;
return v;
}
#define wrap1elem(val) wrap1elem_by_voidptr(&(val))
Here, void* is just a pointer to any unknown type. The obvious problem thus arising: vector doesn't know what type(s) of elements it "contains"! So you can't really do anything useful with those objects. Except if you do know what type it is!
int sum_contents_int(vector v) {
int acc = 0, i;
for(i=0; i<v.size; ++i) {
acc += * (int*) (v.contents[i]);
}
return acc;
}
obviously, this is extremely laborious. What if the type is double? What if we want the product, not the sum? Of course, we could write each case by hand. Not a nice solution.
What would we better is if we had a generic function that takes the instruction what to do as an extra argument! C has function pointers:
int accum_contents_int(vector v, void* (*combine)(int*, int)) {
int acc = 0, i;
for(i=0; i<v.size; ++i) {
combine(&acc, * (int*) (v.contents[i]));
}
return acc;
}
That could then be used like
void multon(int* acc, int x) {
acc *= x;
}
int main() {
int a = 3, b = 5;
vector v = wrap2elems(a, b);
printf("%i\n", accum_contents_int(v, multon));
}
Apart from still being cumbersome, all the above C code has one huge problem: it's completely unchecked if the container elements actually have the right type! The casts from *void will happily fire on any type, but in doubt the result will be complete garbage2.
Classes & Inheritance
That problem is one of the main issues which OO languages solve by trying to bundle all operations you might perform right together with the data, in the object, as methods. While compiling your class, the types are monomorphic so the compiler can check the operations make sense. When you try to use the values, it's enough if the compiler knows how to find the method. In particular, if you make a derived class, the compiler knows "aha, it's ok to call that method from the base class even on a derived object".
Unfortunately, that would mean all you achieve by polymorphism is equivalent to compositing data and simply calling the (monomorphic) methods on a single field. To actually get different behaviour (but controlledly!) for different types, OO languages need virtual methods. What this amounts to is basically that the class has extra fields with pointers to the method implementations, much like the pointer to the combine function I used in the C example – with the difference that you can only implement an overriding method by adding a derived class, for which the compiler again knows the type of all the data fields etc. and you're safe and all.
Sophisticated type systems, checked parametric polymorphism
While inheritance-based polymorphism obviously works, I can't help saying it's just crazy stupid3 sure a bit limiting. If you want to use just one particular operation that happens to be not implemented as a class method, you need to make an entire derived class. Even if you just want to vary an operation in some way, you need to derive and override a slightly different version of the method.
Let's revisit our C code. On the face of it, we notice it should be perfectly possible to make it type-safe, without any method-bundling nonsense. We just need to make sure no type information is lost – not during compile-time, at least. Imagine (Read ∀T as "for all types T")
∀T: {
typedef struct{
T* contents;
int size;
} vector<T>;
}
∀T: {
vector<T> wrap1elem(T* elem) {
vector v;
v.contents = malloc(sizeof(T*));
v.contents[0] = &elem;
v.size = 1;
return v;
}
}
∀T: {
void accum_contents(vector<T> v, void* (*combine)(T*, const T*), T* acc) {
int i;
for(i=0; i<v.size; ++i) {
combine(&acc, (*T) (v[i]));
}
}
}
Observe how, even though the signatures look a lot like the C++ template thing on top of this post (which, as I said, really is just auto-generated monomorphic code), the implementation actually is pretty much just plain C. There are no T values in there, just pointers to them. No need to compile multiple versions of the code: at runtime, the type information isn't needed, we just handle generic pointers. At compile time, we do know the types and can use the function head to make sure they match. I.e., if you wrote
void evil_sumon (int* acc, double* x) { acc += *x; }
and tried to do
vector<float> v; char acc;
accum_contents(v, evil_sumon, acc);
the compiler would complain because the types don't match: in the declaration of accum_contents it says the type may vary, but all occurences of T do need to resolve to the same type.
And that is exactly how parametric polymorphism works in languages of the ML family as well as Haskell: the functions really don't know anything about the polymorphic data they're dealing with. But they are given the specialised operators which have this knowledge, as arguments.
In a language like Java (prior to lambdas), parametric polymorphism doesn't gain you much: since the compiler makes it deliberately hard to define "just a simple helper function" in favour of having only class methods, you can simply go the derive-from-class way right away. But in functional languages, defining small helper functions is the easiest thing imaginable: lambdas!
And so you can do incredible terse code in Haskell:
Prelude> foldr (+) 0 [1,4,6]
11
Prelude> foldr (\x y -> x+y+1) 0 [1,4,6]
14
Prelude> let f start = foldr (\_ (xl,xr) -> (xr, xl)) start
Prelude> :t f
f :: (t, t) -> [a] -> (t, t)
Prelude> f ("left", "right") [1]
("right","left")
Prelude> f ("left", "right") [1, 2]
("left","right")
Note how in the lambda I defined as a helper for f, I didn't have any clue about the type of xl and xr, I merely wanted to swap a tuple of these elements which requires the types to be the same. So that would be a polymorphic lambda, with the type
\_ (xl, xr) -> (xr, xl) :: ∀ a t. a -> (t,t) -> (t,t)
1Apart from the weird explicit malloc stuff, type safety etc.: code like that is extremely hard to work with in languages without garbage collector, because somebody always needs to clean up memory once it's not needed anymore, but if you didn't watch out properly whether somebody still holds a reference to the data and might in fact need it still. That's nothing you have to worry about in Java, Lisp, Haskell...
2There is a completely different approach to this: the one dynamic languages choose. In those languages, every operation needs to make sure it works with any type (or, if that's not possible, raise a well-defined error). Then you can arbitrarily compose polymorphic operations, which is on one hand "nicely trouble-free" (not as trouble-free as with a really clever type system like Haskell's, though) but OTOH incurs quite a heavy overhead, since even primitive operations need type-decisions and safeguards around them.
3I'm of course being unfair here. The OO paradigm has more to it than just type-safe polymorphism, it enables many things e.g. old ML with it's Hindler-Milner type system couldn't do (ad-hoc polymorphism: Haskell has type classes for that, SML has modules), and even some things that are pretty hard in Haskell (mainly, storing values of different types in a variable-size container). But the more you get accustomed to functional programming, the less need you will feel for such stuff.
In C++ polymorphic (or generic) lambda starting from C++14 is a lambda that can take any type as an argument. Basically it's a lambda that has auto parameter type:
auto lambda = [](auto){};
Is there a context that you've heard the term "polymorphic lambda"? We might be able to be more specific.
The simplest way that a lambda can be polymorphic is to accept arguments whose type is (partly-)irrelevant to the final result.
e.g. the lambda
\(head:tail) -> tail
has the type [a] -> [a] -- e.g. it's fully-polymorphic in the inner type of the list.
Other simple examples are the likes of
\_ -> 5 :: Num n => a -> n
\x f -> f x :: a -> (a -> b) -> b
\n -> n + 1 :: Num n => n -> n
etc.
(Notice the Num n examples which involve typeclass dispatch)
Hi everyone I've a question about templates in c++.
I would like to explain what I wonder via an example. Lets max() will be our template function:
template <typename Type>
Type max(Type tX, Type tY)
{
return (tX > tY) ? tX : tY;
}
Now, when i call this max in my main, for each call does the compiler generate the function
and replaces the templates type with actual types ?
I mean;
int main()
{
int result1,result2;
float result3;
result1=max(3,5);
result2=max(10,12);
result3=max(4.5,12.2);
return 0;
}
In here max will be copied 3 times and replaced its parameters or something else ? Is there anyone who can help me ? Thanks in advance.
My understanding is that a compiler typically resolves templates once per data type per compilation unit. And the linker does clever stuff to stop code bloat: i.e. multiple copies of the same function across all compilation units are condensed into 1. The early Microsoft C++ linkers didn't bother doing any such thing and the generated code was large.
In your example I would expect two functions to be generated; one with two arguments and one with four.
Ah - I see you've edited the post to have two floating point parameters in the last case rather than four integral types.
How is this possible, what is going on there?
Is there a name for this?
What other languages have this same behavior?
Any without the strong typing system?
This behaviour is really simple and intuitive if you look at the types. To avoid the complications of infix operators like +, I'm going to use the function plus instead. I'm also going to specialise plus to work only on Int, to reduce the typeclass line noise.
Say we have a function plus, of type Int -> Int -> Int. One way to read that is "a function of two Ints that returns an Int". But that notation is a little clumsy for that reading, isn't it? The return type isn't singled out specially anywhere. Why would we write function type signatures this way? Because the -> is right associative, an equivalent type would be Int -> (Int -> Int). This looks much more like it's saying "a function from an Int to (a function from an Int to an Int)". But those two types are in fact exactly the same, and the latter interpretation is the key to understanding how this behaviour works.
Haskell views all functions as being from a single argument to a single result. There may be computations you have in mind where the result of the computation depends on two or more inputs (such as plus). Haskell says that the function plus is a function that takes a single input, and produces an output which is another function. This second function takes a single input and produces an output which is a number. Because the second function was computed by first (and will be different for different inputs to the first function), the "final" output can depend on both the inputs, so we can implement computations with multiple inputs with these functions that take only single inputs.
I promised this would be really easy to understand if you looked at the types. Here's some example expressions with their types explicitly annotated:
plus :: Int -> Int -> Int
plus 2 :: Int -> Int
plus 2 3 :: Int
If something is a function and you apply it to an argument, to get the type of the result of that application all you need to do is remove everything up to the first arrow from the function's type. If that leaves a type that has more arrows, then you still have a function! As you add arguments the right of an expression, you remove parameter types from the left of its type. The type makes it immediately clear what the type of all the intermediate results are, and why plus 2 is a function which can be further applied (its type has an arrow) and plus 2 3 is not (its type doesn't have an arrow).
"Currying" is the process of turning a function of two arguments into a function of one argument that returns a function of another argument that returns whatever the original function returned. It's also used to refer to the property of languages like Haskell that automatically have all functions work this way; people will say that Haskell "is a curried language" or "has currying", or "has curried functions".
Note that this works particularly elegantly because Haskell's syntax for function application is simple token adjacency. You are free to read plus 2 3 as the application of plus to 2 arguments, or the application of plus to 2 and then the application of the result to 3; you can mentally model it whichever way most fits what you're doing at the time.
In languages with C-like function application by parenthesised argument list, this breaks down a bit. plus(2, 3) is very different from plus(2)(3), and in languages with this syntax the two versions of plus involved would probably have different types. So languages with that kind of syntax tend not to have all functions be curried all the time, or even to have automatic currying of any function you like. But such languages have historically also tended not to have functions as first class values, which makes the lack of currying a moot point.
In Haskell, all functions take exactly 1 input, and produce exactly 1 output. Sometimes, the input to or output of a function can be another function. The input to or output of a function can also be a tuple. You can simulate a function with multiple inputs in one of two ways:
Use a tuple as input
(in1, in2) -> out
Use a function as output*
in1 -> (in2 -> out)
Likewise, you can simulate a function with multiple outputs in one of two ways:
Use a tuple as output*
in -> (out1, out2)
Use a function as a "second input" (a la function-as-output)
in -> ((out1 -> (out2 -> a)) -> a)
*this way is typically favored by Haskellers
The (+) function simulates taking 2 inputs in the typical Haskell way of producing a function as output. (Specializing to Int for ease of communication:)
(+) :: Int -> (Int -> Int)
For the sake of convenience, -> is right-associative, so the type signature for (+) can also be written
(+) :: Int -> Int -> Int
(+) is a function that takes in a number, and produces another function from number to number.
(+) 5 is the result of applying (+) to the argument 5, therefore, it is a function from number to number.
(5 +) is another way to write (+) 5
2 + 3 is another way of writing (+) 2 3. Function application is left-associative, so this is another way of writing (((+) 2) 3). In other words: Apply the function (+) to the input 2. The result will be a function. Take that function, and apply it to the input 3. The result of that is a number.
Therefore, (+) is a function, (5 +) is a function, and (+) 2 3 is a number.
In Haskell, you can take a function of two arguments, apply it to one argument, and get a function of one argument. In fact, strictly speaking, + isn't a function of two arguments, it's a function of one argument that returns a function of one argument.
In layman's terms, the + is the actual function and it is waiting to receive a certain number of parameters (in this case 2 or more) until it returns. If you don't give it two or more parameters, then it will remain a function waiting for another parameter.
It's called Currying
Lots of functional languages (Scala,Scheme, etc.)
Most functional languages are strongly typed, but this is good in the end because it reduces errors, which works well in enterprise or critical systems.
As a side note, the language Haskell is named after Haskell Curry, who re-discovered the phenomenon of Functional Currying while working on combinatory logic.
Languages like Haskell or OCaml have a syntax that lends itself particularily to currying, but you can do it in other languages with dynamic typing, like currying in Scheme.
Basically, I wonder if a language exists where this code will be invalid because even though counter and distance are both int under the hood, they represent incompatible types in the real world:
#include <stdio.h>
typedef int counter;
typedef int distance;
int main() {
counter pies = 1;
distance lengthOfBiscuit = 4;
printf("total pies: %d\n", pies + lengthOfBiscuit);
return 0;
}
That compiles with no warnings with "gcc -pedantic -Wall" and all other languages where I've tried it. It seems like it would be a good idea to disallow accidentally adding a counter and a distance, so where is the language support?
(Incidentally, the real-life example that prompted this quesion was web dev work in PHP and Python -- I was trying to make "HTML-escaped string", "SQL-escaped string" and "raw dangerous user input" incompatible, but the best I can seem to get is apps hungarian notation as suggested here --> http://www.joelonsoftware.com/articles/Wrong.html <-- and that still relies on human checking ("wrong code looks wrong") rather than compiler support ("wrong code is wrong"))
Haskell can do this, with GeneralizedNewtypeDeriving you can treat wrapped values as the underlying thing, whilst only exposing what you need:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
newtype Counter = Counter Int deriving Num
newtype Distance = Distance Int deriving Num
main :: IO ()
main = print $ Counter 1 + Distance 2
Now you get the error:
Add.hs:6:28:
Couldn't match expected type ‘Counter’ with actual type ‘Distance’
In the second argument of ‘(+)’, namely ‘Distance 2’
In the second argument of ‘($)’, namely ‘Counter 1 + Distance 2’
You can still "force" the underlying data type with "coerce", or by unwrapping the Ints explicitly.
I should add that any language with "real" types should be able to do this.
In Ada you can have types that use the same representation, but are still distinct types. What a "strong typedef" would be (if it existed) in C or C++.
In your case, you could do
type counter is new Integer;
type distance is new Integer;
to create two new types that behave like integers, but cannot be mixed.
Derived types and sub types in Ada
You could ccreate an object wrapping the undelying type in a member variable and define operations (even in the form of functions) that make sense on that type (e.g. LEngth would define "plus" allowing addition to another length, but for angle).
A drawback of this approach is you have to create a wrapper for each underlying type you care about and define the appropriate operations for each sensible combination, which might be tedious and possibly error-prone.
In C++, you could check out BOOST support for dimensions. The example given is designed primarily for physical dimensions, but I think you could adapt it to many others as well.