Why aren't unary functions usable in postfix notation? - function

I see those two main advantages to postfix over prefix notations for unary functions:
nice formatting, no nesting
maps better the Input -> Function -> Ouput way of thinking about data processing.
Here is an example:
def plus2(v: Int) = v + 2
def double(v: Int) = v * 2
def square(v: Int) = v * v
// prefix
square(
double(
plus2(1)
)
)
// postfix
1.plus2
.double
.square
This can be emulated as in the Java stream api with method chaining or other user techniques. But I'm not familiar with any programming language that offers first-class support for postfix function application.
What are the programming language design reasons not to offer first-class postfix notation support for unary functions? It seems trivial to support.

I'm not familiar with any programming language that allows this.
Your example syntax is very similar to method chaining. In object-oriented languages, a method with no arguments is effectively a unary operator on whatever type the method is declared on. So here's your computation in Java, with the same declarations, in postfix order:
class Example {
final int x;
Example(int x) { this.x = x; }
Example add2() { return new Example(x + 2); }
Example mult2() { return new Example(x * 2); }
Example square() { return new Example(x * x); }
public static void main(String[] args) {
Example result =
new Example(1)
.add2()
.mult2()
.square();
}
}
You need the brackets () to call them, of course, but it's still postfix order. Unfortunately, you can't adapt this to use static methods instead of instance methods, at least not without abusing Optional or Stream like this:
Optional.of(1)
.map(StaticExample::add2)
.map(StaticExample::mult2)
.map(StaticExample::square)
.get()
I guess the reason OOP languages don't make it easier to use static methods this way is because it would be strange to have special syntax which privileges static methods over instance methods. The point of OOP is to do things with instances and polymorphism.
It's also possible in functional languages, using functions instead of methods: here's F#:
let add2 x = x * 2
let double x = x * 2
let square x = x * x
let result =
1
|> add2
|> double
|> square
Here the forward pipe operator |> serves a different semantic role to the . in Java, but the effect is the same: the functions are written in postfix order.
I guess the main reason that the forward pipe operator only tends to exist in functional languages is because partial application allows you to write a pipeline-style computation using non-unary functions. For example:
nums
|> List.filter isEven
|> List.map square
Here, List.filter and List.map take two arguments, but if you call them with one argument then they return a unary function. Non-functional languages tend not to have partial application (at least not so easily), so a forward pipe operator would be less useful.
There is also the less-well-known concatenative programming paradigm, where everything is naturally done in postfix order, with no extra syntax required. Here's my own toy language named fffff:
(2 +) >!add2
(2 *) >!double
(# *) >!square
1 add2 double square >result
Here, even the assignments like >result are done in postfix order.

Related

using special function such as __add__ in cython cdef classes

I am wanting to create a cython object that can has convenient operations such as addition multiplication and comparisons. But when I compile such classes they all seem to have a lot of python overhead.
A simple example:
%%cython -a
cdef class Pair:
cdef public:
int a
int b
def __init__(self, int a, int b):
self.a = a
self.b = b
def __add__(self, Pair other):
return Pair(self.a + other.a, self.b + other.b)
p1 = Pair(1, 2)
p2 = Pair(3, 4)
p3 = p1+p2
print(p3.a, p3.b)
But I end up getting quite large readouts from the annotated compiler
It seems like the __add__ function is converting objects from python floats to cython doubles and doing a bunch of type checking. Am I doing something wrong?
There's likely a couple of issues:
I'm assuming that you're using Cython 0.29.x (and not the newer Cython 3 alpha). See https://cython.readthedocs.io/en/stable/src/userguide/special_methods.html#arithmetic-methods
This means that you can’t rely on the first parameter of these methods being “self” or being the right type, and you should test the types of both operands before deciding what to do
It is likely treating self as untyped and thus accessing a and b as Python attributes.
The Cython 3 alpha treats special methods differently (see https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#arithmetic-methods) so you could also consider upgrading to that.
Although the call to __init__ has C typed arguements it's still a Python call so you can't avoid boxing and unboxing the arguments to Python ints. You could avoid this call and do something like:
cdef Pair res = Pair.__new__()
res.a = ... # direct assignment to attribute

In what terminology context do functions fall under?

Now I understand that Defining is to Types as Declaring is to Variables. But which one (Declare or Define) do functions/procedures/methods/subroutines fall under? Or do they have their own terminology?
In C and C++ you can declare a function (a function prototype) like this:
int function(int);
And then you can define it later, say, at the end of the file:
int function(int param) {
printf("This is the param: %d", param);
return 0;
}
So you can say that functions in C and C++ can fit into the terminology of both types and variables. It depends on the language you're using too, but this how I learned it.

Conversion of lambda expressions to Func

Given the following:
open System.Linq
let seqA = { 1..10 }
this works:
seqA.All (fun n -> n > 0)
However this doesn't:
let abc = fun n -> n > 0
seqA.All (abc)
Why does F# offer implicit conversion from lambda expressions to Funcs but not from functions? Pointers to the documentation where I can read up on what's going on here are welcome. :-)
This is covered in the (rather involved) section of the spec on Method Resolution and again in Type-directed Conversions at Member Invocations. Quoting from the latter:
As described in Method Application Resolution (see §14.4), two
type-directed conversions are applied at method invocations.
The first type-directed conversion converts anonymous function
expressions and other function-valued arguments to delegate types.
Given:
A formal parameter of delegate type D
An actual argument farg of known type ty1 -> ... -> tyn -> rty
Precisely n arguments to the Invoke method of delegate type D
Then:
The parameter is interpreted as if it were written:
new D(fun arg1 ... argn -> farg arg1 ... argn)
It seems to suggest this conversion would be applied to any function value, but observation suggests it's applied only to anonymous functions.

Function objects in C++ (C++11)

I am reading about boost::function and I am a bit confused about its use and its relation to other C++ constructs or terms I have found in the documentation, e.g. here.
In the context of C++ (C++11), what is the difference between an instance of boost::function, a function object, a functor, and a lambda expression? When should one use which construct? For example, when should I wrap a function object in a boost::function instead of using the object directly?
Are all the above C++ constructs different ways to implement what in functional languages is called a closure (a function, possibly containing captured variables, that can be passed around as a value and invoked by other functions)?
A function object and a functor are the same thing; an object that implements the function call operator operator(). A lambda expression produces a function object. Objects with the type of some specialization of boost::function/std::function are also function objects.
Lambda are special in that lambda expressions have an anonymous and unique type, and are a convenient way to create a functor inline.
boost::function/std::function is special in that it turns any callable entity into a functor with a type that depends only on the signature of the callable entity. For example, lambda expressions each have a unique type, so it's difficult to pass them around non-generic code. If you create an std::function from a lambda then you can easily pass around the wrapped lambda.
Both boost::function and the standard version std::function are wrappers provided by the li­brary. They're potentially expensive and pretty heavy, and you should only use them if you actually need a collection of heterogeneous, callable entities. As long as you only need one callable entity at a time, you are much better off using auto or templates.
Here's an example:
std::vector<std::function<int(int, int)>> v;
v.push_back(some_free_function); // free function
v.push_back(&Foo::mem_fun, &x, _1, _2); // member function bound to an object
v.push_back([&](int a, int b) -> int { return a + m[b]; }); // closure
int res = 0;
for (auto & f : v) { res += f(1, 2); }
Here's a counter-example:
template <typename F>
int apply(F && f)
{
return std::forward<F>(f)(1, 2);
}
In this case, it would have been entirely gratuitous to declare apply like this:
int apply(std::function<int(int,int)>) // wasteful
The conversion is unnecessary, and the templated version can match the actual (often unknowable) type, for example of the bind expression or the lambda expression.
Function Objects and Functors are often described in terms of a
concept. That means they describe a set of requirements of a type. A
lot of things in respect to Functors changed in C++11 and the new
concept is called Callable. An object o of callable type is an
object where (essentially) the expression o(ARGS) is true. Examples
for Callable are
int f() {return 23;}
struct FO {
int operator()() const {return 23;}
};
Often some requirements on the return type of the Callable are added
too. You use a Callable like this:
template<typename Callable>
int call(Callable c) {
return c();
}
call(&f);
call(FO());
Constructs like above require you to know the exact type at
compile-time. This is not always possible and this is where
std::function comes in.
std::function is such a Callable, but it allows you to erase the
actual type you are calling (e.g. your function accepting a callable
is not a template anymore). Still calling a function requires you to
know its arguments and return type, thus those have to be specified as
template arguments to std::function.
You would use it like this:
int call(std::function<int()> c) {
return c();
}
call(&f);
call(FO());
You need to remember that using std::function can have an impact on
performance and you should only use it, when you are sure you need
it. In almost all other cases a template solves your problem.

What is Map/Reduce?

I hear a lot about map/reduce, especially in the context of Google's massively parallel compute system. What exactly is it?
From the abstract of Google's MapReduce research publication page:
MapReduce is a programming model and
an associated implementation for
processing and generating large data
sets. Users specify a map function
that processes a key/value pair to
generate a set of intermediate
key/value pairs, and a reduce function
that merges all intermediate values
associated with the same intermediate
key.
The advantage of MapReduce is that the processing can be performed in parallel on multiple processing nodes (multiple servers) so it is a system that can scale very well.
Since it's based from the functional programming model, the map and reduce steps each do not have any side-effects (the state and results from each subsection of a map process does not depend on another), so the data set being mapped and reduced can each be separated over multiple processing nodes.
Joel's Can Your Programming Language Do This? piece discusses how understanding functional programming was essential in Google to come up with MapReduce, which powers its search engine. It's a very good read if you're unfamiliar with functional programming and how it allows scalable code.
See also: Wikipedia: MapReduce
Related question: Please explain mapreduce simply
Map is a function that applies another function to all the items on a list, to produce another list with all the return values on it. (Another way of saying "apply f to x" is "call f, passing it x". So sometimes it sounds nicer to say "apply" instead of "call".)
This is how map is probably written in C# (it's called Select and is in the standard library):
public static IEnumerable<R> Select<T, R>(this IEnumerable<T> list, Func<T, R> func)
{
foreach (T item in list)
yield return func(item);
}
As you're a Java dude, and Joel Spolsky likes to tell GROSSLY UNFAIR LIES about how crappy Java is (actually, he's not lying, it is crappy, but I'm trying to win you over), here's my very rough attempt at a Java version (I have no Java compiler, and I vaguely remember Java version 1.1!):
// represents a function that takes one arg and returns a result
public interface IFunctor
{
object invoke(object arg);
}
public static object[] map(object[] list, IFunctor func)
{
object[] returnValues = new object[list.length];
for (int n = 0; n < list.length; n++)
returnValues[n] = func.invoke(list[n]);
return returnValues;
}
I'm sure this can be improved in a million ways. But it's the basic idea.
Reduce is a function that turns all the items on a list into a single value. To do this, it needs to be given another function func that turns two items into a single value. It would work by giving the first two items to func. Then the result of that along with the third item. Then the result of that with the fourth item, and so on until all the items have gone and we're left with one value.
In C# reduce is called Aggregate and is again in the standard library. I'll skip straight to a Java version:
// represents a function that takes two args and returns a result
public interface IBinaryFunctor
{
object invoke(object arg1, object arg2);
}
public static object reduce(object[] list, IBinaryFunctor func)
{
if (list.length == 0)
return null; // or throw something?
if (list.length == 1)
return list[0]; // just return the only item
object returnValue = func.invoke(list[0], list[1]);
for (int n = 1; n < list.length; n++)
returnValue = func.invoke(returnValue, list[n]);
return returnValue;
}
These Java versions need generics adding to them, but I don't know how to do that in Java. But you should be able to pass them anonymous inner classes to provide the functors:
string[] names = getLotsOfNames();
string commaSeparatedNames = (string)reduce(names,
new IBinaryFunctor {
public object invoke(object arg1, object arg2)
{ return ((string)arg1) + ", " + ((string)arg2); }
}
Hopefully generics would get rid of the casts. The typesafe equivalent in C# is:
string commaSeparatedNames = names.Aggregate((a, b) => a + ", " + b);
Why is this "cool"? Simple ways of breaking up larger calculations into smaller pieces, so they can be put back together in different ways, are always cool. The way Google applies this idea is to parallelization, because both map and reduce can be shared out over several computers.
But the key requirement is NOT that your language can treat functions as values. Any OO language can do that. The actual requirement for parallelization is that the little func functions you pass to map and reduce must not use or update any state. They must return a value that is dependent only on the argument(s) passed to them. Otherwise, the results will be completely screwed up when you try to run the whole thing in parallel.
After getting most frustrated with either very long waffley or very short vague blog posts I eventually discovered this very good rigorous concise paper.
Then I went ahead and made it more concise by translating into Scala, where I've provided the simplest case where a user simply just specifies the map and reduce parts of the application. In Hadoop/Spark, strictly speaking, a more complex model of programming is employed that require the user to explicitly specify 4 more functions outlined here: http://en.wikipedia.org/wiki/MapReduce#Dataflow
import scalaz.syntax.id._
trait MapReduceModel {
type MultiSet[T] = Iterable[T]
// `map` must be a pure function
def mapPhase[K1, K2, V1, V2](map: ((K1, V1)) => MultiSet[(K2, V2)])
(data: MultiSet[(K1, V1)]): MultiSet[(K2, V2)] =
data.flatMap(map)
def shufflePhase[K2, V2](mappedData: MultiSet[(K2, V2)]): Map[K2, MultiSet[V2]] =
mappedData.groupBy(_._1).mapValues(_.map(_._2))
// `reduce` must be a monoid
def reducePhase[K2, V2, V3](reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)])
(shuffledData: Map[K2, MultiSet[V2]]): MultiSet[V3] =
shuffledData.flatMap(reduce).map(_._2)
def mapReduce[K1, K2, V1, V2, V3](data: MultiSet[(K1, V1)])
(map: ((K1, V1)) => MultiSet[(K2, V2)])
(reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)]): MultiSet[V3] =
mapPhase(map)(data) |> shufflePhase |> reducePhase(reduce)
}
// Kinda how MapReduce works in Hadoop and Spark except `.par` would ensure 1 element gets a process/thread on a cluster
// Furthermore, the splitting here won't enforce any kind of balance and is quite unnecessary anyway as one would expect
// it to already be splitted on HDFS - i.e. the filename would constitute K1
// The shuffle phase will also be parallelized, and use the same partition as the map phase.
abstract class ParMapReduce(mapParNum: Int, reduceParNum: Int) extends MapReduceModel {
def split[T](splitNum: Int)(data: MultiSet[T]): Set[MultiSet[T]]
override def mapPhase[K1, K2, V1, V2](map: ((K1, V1)) => MultiSet[(K2, V2)])
(data: MultiSet[(K1, V1)]): MultiSet[(K2, V2)] = {
val groupedByKey = data.groupBy(_._1).map(_._2)
groupedByKey.flatMap(split(mapParNum / groupedByKey.size + 1))
.par.flatMap(_.map(map)).flatten.toList
}
override def reducePhase[K2, V2, V3](reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)])
(shuffledData: Map[K2, MultiSet[V2]]): MultiSet[V3] =
shuffledData.map(g => split(reduceParNum / shuffledData.size + 1)(g._2).map((g._1, _)))
.par.flatMap(_.map(reduce))
.flatten.map(_._2).toList
}
Map is a native JS method that can be applied to an array. It creates a new array as a result of some function mapped to every element in the original array. So if you mapped a function(element) { return element * 2;}, it would return a new array with every element doubled. The original array would go unmodified.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map
Reduce is a native JS method that can also be applied to an array. It applies a function to an array and has an initial output value called an accumulator. It loops through each element in the array, applies a function, and reduces them to a single value (which begins as the accumulator). It is useful because you can have any output you want, you just have to start with that type of accumulator. So if I wanted to reduce something into an object, I would start with an accumulator {}.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce?v=a