How to use NSString as key in Objective-C++ std::map - stl

I'm starting work on an Objective-C++ project, getting a feel for how the synthesis of the two languages feels before I do any heavy-duty design. I am very intrigued by how Automated Reference Counting has been integrated with C++: we get the equivalent of smart pointers for NSObjects that handle retain/release properly in STL containers (cf. David Chisnall's article at http://www.informit.com/articles/article.aspx?p=1745876&seqNum=3).
I want to use STL map as a typesafe mapping from NSStrings to C++ values. I can declare a mapping as
std::map<NSString*, MyType> mapping
With ARC, this mapping handles the memory management properly. But it doesn't follow NSString value semantics properly, because it's using pointer comparisons instead of -[NSString compare:].
What's the best way to get an STL map to use string comparison instead of pointer comparison?
Should I try to specialize std::less<NSString*>?
Should I declare an explicit comparator like std::map<NSString*, MyType, MyCompare>?
Should I wrap the NSString* keys in a smart pointer that implements operator<?

You'd want a custom comparison object that calls NSString's compare function, something like this:
#include <functional>
#include <map>
struct CompareNSString: public std::binary_function<NSString*, NSString*, bool> {
bool operator()(NSString* lhs, NSString* rhs) const {
if (rhs != nil)
return (lhs == nil) || ([lhs compare: rhs] == NSOrderedAscending);
else
return false;
}
};
std::map<NSString*, MyType, CompareNSString> mapping;

pointer comparisons of unmanaged NSString instances are fine, if they are all NSString literals. iow, this would work in MRC under those conditions, provided of course there are no duplicate string values, and value is what is being compared.
if not, see Ross' generally more useful answer (+1).

Related

Why aren't unary functions usable in postfix notation?

I see those two main advantages to postfix over prefix notations for unary functions:
nice formatting, no nesting
maps better the Input -> Function -> Ouput way of thinking about data processing.
Here is an example:
def plus2(v: Int) = v + 2
def double(v: Int) = v * 2
def square(v: Int) = v * v
// prefix
square(
double(
plus2(1)
)
)
// postfix
1.plus2
.double
.square
This can be emulated as in the Java stream api with method chaining or other user techniques. But I'm not familiar with any programming language that offers first-class support for postfix function application.
What are the programming language design reasons not to offer first-class postfix notation support for unary functions? It seems trivial to support.
I'm not familiar with any programming language that allows this.
Your example syntax is very similar to method chaining. In object-oriented languages, a method with no arguments is effectively a unary operator on whatever type the method is declared on. So here's your computation in Java, with the same declarations, in postfix order:
class Example {
final int x;
Example(int x) { this.x = x; }
Example add2() { return new Example(x + 2); }
Example mult2() { return new Example(x * 2); }
Example square() { return new Example(x * x); }
public static void main(String[] args) {
Example result =
new Example(1)
.add2()
.mult2()
.square();
}
}
You need the brackets () to call them, of course, but it's still postfix order. Unfortunately, you can't adapt this to use static methods instead of instance methods, at least not without abusing Optional or Stream like this:
Optional.of(1)
.map(StaticExample::add2)
.map(StaticExample::mult2)
.map(StaticExample::square)
.get()
I guess the reason OOP languages don't make it easier to use static methods this way is because it would be strange to have special syntax which privileges static methods over instance methods. The point of OOP is to do things with instances and polymorphism.
It's also possible in functional languages, using functions instead of methods: here's F#:
let add2 x = x * 2
let double x = x * 2
let square x = x * x
let result =
1
|> add2
|> double
|> square
Here the forward pipe operator |> serves a different semantic role to the . in Java, but the effect is the same: the functions are written in postfix order.
I guess the main reason that the forward pipe operator only tends to exist in functional languages is because partial application allows you to write a pipeline-style computation using non-unary functions. For example:
nums
|> List.filter isEven
|> List.map square
Here, List.filter and List.map take two arguments, but if you call them with one argument then they return a unary function. Non-functional languages tend not to have partial application (at least not so easily), so a forward pipe operator would be less useful.
There is also the less-well-known concatenative programming paradigm, where everything is naturally done in postfix order, with no extra syntax required. Here's my own toy language named fffff:
(2 +) >!add2
(2 *) >!double
(# *) >!square
1 add2 double square >result
Here, even the assignments like >result are done in postfix order.

In what terminology context do functions fall under?

Now I understand that Defining is to Types as Declaring is to Variables. But which one (Declare or Define) do functions/procedures/methods/subroutines fall under? Or do they have their own terminology?
In C and C++ you can declare a function (a function prototype) like this:
int function(int);
And then you can define it later, say, at the end of the file:
int function(int param) {
printf("This is the param: %d", param);
return 0;
}
So you can say that functions in C and C++ can fit into the terminology of both types and variables. It depends on the language you're using too, but this how I learned it.

Function objects in C++ (C++11)

I am reading about boost::function and I am a bit confused about its use and its relation to other C++ constructs or terms I have found in the documentation, e.g. here.
In the context of C++ (C++11), what is the difference between an instance of boost::function, a function object, a functor, and a lambda expression? When should one use which construct? For example, when should I wrap a function object in a boost::function instead of using the object directly?
Are all the above C++ constructs different ways to implement what in functional languages is called a closure (a function, possibly containing captured variables, that can be passed around as a value and invoked by other functions)?
A function object and a functor are the same thing; an object that implements the function call operator operator(). A lambda expression produces a function object. Objects with the type of some specialization of boost::function/std::function are also function objects.
Lambda are special in that lambda expressions have an anonymous and unique type, and are a convenient way to create a functor inline.
boost::function/std::function is special in that it turns any callable entity into a functor with a type that depends only on the signature of the callable entity. For example, lambda expressions each have a unique type, so it's difficult to pass them around non-generic code. If you create an std::function from a lambda then you can easily pass around the wrapped lambda.
Both boost::function and the standard version std::function are wrappers provided by the li­brary. They're potentially expensive and pretty heavy, and you should only use them if you actually need a collection of heterogeneous, callable entities. As long as you only need one callable entity at a time, you are much better off using auto or templates.
Here's an example:
std::vector<std::function<int(int, int)>> v;
v.push_back(some_free_function); // free function
v.push_back(&Foo::mem_fun, &x, _1, _2); // member function bound to an object
v.push_back([&](int a, int b) -> int { return a + m[b]; }); // closure
int res = 0;
for (auto & f : v) { res += f(1, 2); }
Here's a counter-example:
template <typename F>
int apply(F && f)
{
return std::forward<F>(f)(1, 2);
}
In this case, it would have been entirely gratuitous to declare apply like this:
int apply(std::function<int(int,int)>) // wasteful
The conversion is unnecessary, and the templated version can match the actual (often unknowable) type, for example of the bind expression or the lambda expression.
Function Objects and Functors are often described in terms of a
concept. That means they describe a set of requirements of a type. A
lot of things in respect to Functors changed in C++11 and the new
concept is called Callable. An object o of callable type is an
object where (essentially) the expression o(ARGS) is true. Examples
for Callable are
int f() {return 23;}
struct FO {
int operator()() const {return 23;}
};
Often some requirements on the return type of the Callable are added
too. You use a Callable like this:
template<typename Callable>
int call(Callable c) {
return c();
}
call(&f);
call(FO());
Constructs like above require you to know the exact type at
compile-time. This is not always possible and this is where
std::function comes in.
std::function is such a Callable, but it allows you to erase the
actual type you are calling (e.g. your function accepting a callable
is not a template anymore). Still calling a function requires you to
know its arguments and return type, thus those have to be specified as
template arguments to std::function.
You would use it like this:
int call(std::function<int()> c) {
return c();
}
call(&f);
call(FO());
You need to remember that using std::function can have an impact on
performance and you should only use it, when you are sure you need
it. In almost all other cases a template solves your problem.

What is Map/Reduce?

I hear a lot about map/reduce, especially in the context of Google's massively parallel compute system. What exactly is it?
From the abstract of Google's MapReduce research publication page:
MapReduce is a programming model and
an associated implementation for
processing and generating large data
sets. Users specify a map function
that processes a key/value pair to
generate a set of intermediate
key/value pairs, and a reduce function
that merges all intermediate values
associated with the same intermediate
key.
The advantage of MapReduce is that the processing can be performed in parallel on multiple processing nodes (multiple servers) so it is a system that can scale very well.
Since it's based from the functional programming model, the map and reduce steps each do not have any side-effects (the state and results from each subsection of a map process does not depend on another), so the data set being mapped and reduced can each be separated over multiple processing nodes.
Joel's Can Your Programming Language Do This? piece discusses how understanding functional programming was essential in Google to come up with MapReduce, which powers its search engine. It's a very good read if you're unfamiliar with functional programming and how it allows scalable code.
See also: Wikipedia: MapReduce
Related question: Please explain mapreduce simply
Map is a function that applies another function to all the items on a list, to produce another list with all the return values on it. (Another way of saying "apply f to x" is "call f, passing it x". So sometimes it sounds nicer to say "apply" instead of "call".)
This is how map is probably written in C# (it's called Select and is in the standard library):
public static IEnumerable<R> Select<T, R>(this IEnumerable<T> list, Func<T, R> func)
{
foreach (T item in list)
yield return func(item);
}
As you're a Java dude, and Joel Spolsky likes to tell GROSSLY UNFAIR LIES about how crappy Java is (actually, he's not lying, it is crappy, but I'm trying to win you over), here's my very rough attempt at a Java version (I have no Java compiler, and I vaguely remember Java version 1.1!):
// represents a function that takes one arg and returns a result
public interface IFunctor
{
object invoke(object arg);
}
public static object[] map(object[] list, IFunctor func)
{
object[] returnValues = new object[list.length];
for (int n = 0; n < list.length; n++)
returnValues[n] = func.invoke(list[n]);
return returnValues;
}
I'm sure this can be improved in a million ways. But it's the basic idea.
Reduce is a function that turns all the items on a list into a single value. To do this, it needs to be given another function func that turns two items into a single value. It would work by giving the first two items to func. Then the result of that along with the third item. Then the result of that with the fourth item, and so on until all the items have gone and we're left with one value.
In C# reduce is called Aggregate and is again in the standard library. I'll skip straight to a Java version:
// represents a function that takes two args and returns a result
public interface IBinaryFunctor
{
object invoke(object arg1, object arg2);
}
public static object reduce(object[] list, IBinaryFunctor func)
{
if (list.length == 0)
return null; // or throw something?
if (list.length == 1)
return list[0]; // just return the only item
object returnValue = func.invoke(list[0], list[1]);
for (int n = 1; n < list.length; n++)
returnValue = func.invoke(returnValue, list[n]);
return returnValue;
}
These Java versions need generics adding to them, but I don't know how to do that in Java. But you should be able to pass them anonymous inner classes to provide the functors:
string[] names = getLotsOfNames();
string commaSeparatedNames = (string)reduce(names,
new IBinaryFunctor {
public object invoke(object arg1, object arg2)
{ return ((string)arg1) + ", " + ((string)arg2); }
}
Hopefully generics would get rid of the casts. The typesafe equivalent in C# is:
string commaSeparatedNames = names.Aggregate((a, b) => a + ", " + b);
Why is this "cool"? Simple ways of breaking up larger calculations into smaller pieces, so they can be put back together in different ways, are always cool. The way Google applies this idea is to parallelization, because both map and reduce can be shared out over several computers.
But the key requirement is NOT that your language can treat functions as values. Any OO language can do that. The actual requirement for parallelization is that the little func functions you pass to map and reduce must not use or update any state. They must return a value that is dependent only on the argument(s) passed to them. Otherwise, the results will be completely screwed up when you try to run the whole thing in parallel.
After getting most frustrated with either very long waffley or very short vague blog posts I eventually discovered this very good rigorous concise paper.
Then I went ahead and made it more concise by translating into Scala, where I've provided the simplest case where a user simply just specifies the map and reduce parts of the application. In Hadoop/Spark, strictly speaking, a more complex model of programming is employed that require the user to explicitly specify 4 more functions outlined here: http://en.wikipedia.org/wiki/MapReduce#Dataflow
import scalaz.syntax.id._
trait MapReduceModel {
type MultiSet[T] = Iterable[T]
// `map` must be a pure function
def mapPhase[K1, K2, V1, V2](map: ((K1, V1)) => MultiSet[(K2, V2)])
(data: MultiSet[(K1, V1)]): MultiSet[(K2, V2)] =
data.flatMap(map)
def shufflePhase[K2, V2](mappedData: MultiSet[(K2, V2)]): Map[K2, MultiSet[V2]] =
mappedData.groupBy(_._1).mapValues(_.map(_._2))
// `reduce` must be a monoid
def reducePhase[K2, V2, V3](reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)])
(shuffledData: Map[K2, MultiSet[V2]]): MultiSet[V3] =
shuffledData.flatMap(reduce).map(_._2)
def mapReduce[K1, K2, V1, V2, V3](data: MultiSet[(K1, V1)])
(map: ((K1, V1)) => MultiSet[(K2, V2)])
(reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)]): MultiSet[V3] =
mapPhase(map)(data) |> shufflePhase |> reducePhase(reduce)
}
// Kinda how MapReduce works in Hadoop and Spark except `.par` would ensure 1 element gets a process/thread on a cluster
// Furthermore, the splitting here won't enforce any kind of balance and is quite unnecessary anyway as one would expect
// it to already be splitted on HDFS - i.e. the filename would constitute K1
// The shuffle phase will also be parallelized, and use the same partition as the map phase.
abstract class ParMapReduce(mapParNum: Int, reduceParNum: Int) extends MapReduceModel {
def split[T](splitNum: Int)(data: MultiSet[T]): Set[MultiSet[T]]
override def mapPhase[K1, K2, V1, V2](map: ((K1, V1)) => MultiSet[(K2, V2)])
(data: MultiSet[(K1, V1)]): MultiSet[(K2, V2)] = {
val groupedByKey = data.groupBy(_._1).map(_._2)
groupedByKey.flatMap(split(mapParNum / groupedByKey.size + 1))
.par.flatMap(_.map(map)).flatten.toList
}
override def reducePhase[K2, V2, V3](reduce: ((K2, MultiSet[V2])) => MultiSet[(K2, V3)])
(shuffledData: Map[K2, MultiSet[V2]]): MultiSet[V3] =
shuffledData.map(g => split(reduceParNum / shuffledData.size + 1)(g._2).map((g._1, _)))
.par.flatMap(_.map(reduce))
.flatten.map(_._2).toList
}
Map is a native JS method that can be applied to an array. It creates a new array as a result of some function mapped to every element in the original array. So if you mapped a function(element) { return element * 2;}, it would return a new array with every element doubled. The original array would go unmodified.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map
Reduce is a native JS method that can also be applied to an array. It applies a function to an array and has an initial output value called an accumulator. It loops through each element in the array, applies a function, and reduces them to a single value (which begins as the accumulator). It is useful because you can have any output you want, you just have to start with that type of accumulator. So if I wanted to reduce something into an object, I would start with an accumulator {}.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce?v=a

What Does "Overloaded"/"Overload"/"Overloading" Mean?

What does "Overloaded"/"Overload" mean in regards to programming?
It means that you are providing a function (method or operator) with the same name, but with a different signature.
For example:
void doSomething();
int doSomething(string x);
int doSomething(int a, int b, int c);
Basic Concept
Overloading, or "method overloading" is the name of the concept of having more than one methods with the same name but with different parameters.
For e.g. System.DateTime class in c# have more than one ToString method. The standard ToString uses the default culture of the system to convert the datetime to string:
new DateTime(2008, 11, 14).ToString(); // returns "14/11/2008" in America
while another overload of the same method allows the user to customize the format:
new DateTime(2008, 11, 14).ToString("dd MMM yyyy"); // returns "11 Nov 2008"
Sometimes parameter name may be the same but the parameter types may differ:
Convert.ToInt32(123m);
converts a decimal to int while
Convert.ToInt32("123");
converts a string to int.
Overload Resolution
For finding the best overload to call, compiler performs an operation named "overload resolution". For the first example, compiler can find the best method simply by matching the argument count. For the second example, compiler automatically calls the decimal version of replace method if you pass a decimal parameter and calls string version if you pass a string parameter. From the list of possible outputs, if compiler cannot find a suitable one to call, you will get a compiler error like "The best overload does not match the parameters...".
You can find lots of information on how different compilers perform overload resolution.
A function is overloaded when it has more than one signature. This means that you can call it with different argument types. For instance, you may have a function for printing a variable on screen, and you can define it for different argument types:
void print(int i);
void print(char i);
void print(UserDefinedType t);
In this case, the function print() would have three overloads.
It means having different versions of the same function which take different types of parameters. Such a function is "overloaded". For example, take the following function:
void Print(std::string str) {
std::cout << str << endl;
}
You can use this function to print a string to the screen. However, this function cannot be used when you want to print an integer, you can then make a second version of the function, like this:
void Print(int i) {
std::cout << i << endl;
}
Now the function is overloaded, and which version of the function will be called depends on the parameters you give it.
Others have answered what an overload is. When you are starting out it gets confused with override/overriding.
As opposed to overloading, overriding is defining a method with the same signature in the subclass (or child class), which overrides the parent classes implementation. Some language require explicit directive, such as virtual member function in C++ or override in Delphi and C#.
using System;
public class DrawingObject
{
public virtual void Draw()
{
Console.WriteLine("I'm just a generic drawing object.");
}
}
public class Line : DrawingObject
{
public override void Draw()
{
Console.WriteLine("I'm a Line.");
}
}
An overloaded method is one with several options for the number and type of parameters. For instance:
foo(foo)
foo(foo, bar)
both would do relatively the same thing but one has a second parameter for more options
Also you can have the same method take different types
int Convert(int i)
int Convert(double i)
int Convert(float i)
Just like in common usage, it refers to something (in this case, a method name), doing more than one job.
Overloading is the poor man's version of multimethods from CLOS and other languages. It's the confusing one.
Overriding is the usual OO one. It goes with inheritance, we call it redefinition too (e.g. in https://stackoverflow.com/users/3827/eed3si9n's answer Line provides a specialized definition of Draw().