C++0x bind and function without copy-construction - function

I'm currently trying out a few of the new C++0x features, namely std::function and std::bind. These two functions seem rather suitable for a event-delegate-system for C++ that works like in C♯. I've tried myself to create something like delegates before, but the Hacks I would have needed for member-function-pointers were to much for me…
During my tests I noticed that std::bind copies every object you bind. While that surely enhances safety - can't delete a still registered eventhandler :) - it's also a problem with stateful objects. Is there a way to deactivate the copying - or at least a way to obtain the encapsulated object from the std::function again?
PS: Is there a reference for the features that are going to be included in C++0x (hopefully C++11!) In the end it's at major parts of TR1 and a few additions…
I tried cppreference.org, but they are still at an early stage at documentation, cplusplus.com on the other seems to not even have started on covering C++0x.

If you want to avoid copying use std::ref and/or std::cref. They wrap the object into a pseudoreference

It isn't quite right that:
I noticed that std::bind copies every
object you bind.
At least that isn't the intended specification. You should be able to move a non-copyable object into a bind:
std::bind(f, std::unique_ptr<int>(new int(3)))
However, now that the move-only object is stored in the binder, it is an lvalue. Therefore you can only call it if f accepts an lvalue move-only object (say by lvalue reference). If this is not acceptable, and if the source object outlives the binder, then use of std::ref is another good solution (as mentioned by Armen).
If you need to copy the bound object, then all of its bound arguments must be copyable. But if you only move construct the bound object, then it will only move construct its bound arguments.
The best reference is N3242. There isn't a good and comprehensive tutorial that I'm aware of yet. I might start with the boost documentation with the understanding that std::bind has been adapted to work with rvalue-refs as much as possible.

I have created a move compatable version of bind. there are still lots of problems with it like the binders constructor and a few buggylines here and there etc but it seems to work
check it out here
http://code-slim-jim.blogspot.jp/2012/11/perfect-forwarding-bind-compatable-with.html

Related

How are implemented classes in dynamic languages?

How are implemented classes in dynamic languages ?
I know that Javascript is using a prototype pattern (there is 'somewhere' a container of unbound JS functions, which are bind when calling them through an object), but I have no idea of how it works in other languages.
I'm curious about this, because I can't think of an efficient way to have native bound methods without wasting memory and/or cpu by copying members for each instance.
(By bound method, I mean that the following code should work :)
class Foo { function bar() : return 42; };
var test = new Foo();
var method = test.bar;
method() == 42;
This highly depends on the language and the implementation. I'll tell you what I know about CPython and PyPy.
The general idea, which is also what CPython does for the most part, goes like this:
Every object has a class, specifically a reference to that class object.
Apart from instance members, which are obviously stored in the individual object, the class also has members. This includes methods, so methods don't have a per-object cost.
A class has a method resolution order (MRO) determined by the inheritance relationships, wherein each base class occurs exactly once. If we didn't have multiple inheritance, this would simply be a reference to the base class, but this way the MRO is hard to figure out on the fly (you'd have to start from the most derived class every time).
(Classes are also objects and have classes themselves, but we'll gloss over that for now.)
If attribute lookup on an object fails, the same attribute is looked up on the classes in the MRO, in the order specified by the MRO. (This is the default behavior, which can be changed by defining magic methods like __getattr__ and __getattribute__.)
So far so simple, and not really an explanation for bound methods. I just wanted to make sure we're talking about the same thing. The missing piece is descriptors. The descriptor protocol is defined in the "deep magic" section of the language reference, but the short and simple story is that lookup on a class can be hijacked by the object it results in via a __get__ method. More importantly, this __get__ method is told whether the lookup started on an instance or on the "owner" (the class).
In Python 2, we have an ugly and unnecessary UnboundMethod descriptor which (apart from the __get__ method) simply wraps the function to throw errors on Class.method(self) if self is not of an acceptable type. In Python 3, the __get__ is simply part of all function objects, and unbound methods are gone. In both cases, the __get__ method returns itself when you look it up on a class (so you can use Class.method, which is useful in a few cases) and a "bound method" object when you look it up on an object. This bound method object does nothing more than storing the raw function and the instance, and passing the latter as first argument to the former in its __call__ (special method overriding the function call syntax).
So, for CPython: While there is a cost to bound methods, it's smaller than you might think. Only two references are needed space-wise, and the CPU cost is limited to a small memory allocation, and an extra indirection when calling. Note though that this cost applies to all method calls, not just those which actually make use of bound method features. a.f() has to call the descriptor and use its return value, because in a dynamic language we don't know if it's monkey-patched to do something different.
In PyPy, things are more interesting. As it's an implementation which doesn't compromise on correctness, the above model is still correct for reasoning about semantics. However, it's actually faster. Apart from the fact that the JIT compiler inlines and then eliminates the entire mess described above in most cases, they also tackle the problem on bytecode level. There are two new bytecode instructions, which preserve the semantics but omit the allocation of the bound method object in the case of a.f(). There is also a method cache which can simplify the lookup process, but requires some additional bookkeeping (though some of that bookkeeping is already done for the JIT).

Code optimization - Unused methods

How can I tell if a method will never be used ?
I know that for dll files and libraries you can't really know if someone else (another project) will ever use the code.
In general I assume that anything public might be used somewhere else.
But what about private methods ? Is it safe to assume that if I don't see an explicit call to that method, it won't be used ?
I assume that for private methods it's easier to decide. But is it safe to decide it ONLY for private methods ?
Depends on the language, but commonly, a name that occurs once in the program and is not public/exported is not used. There are exceptions, such as constructors and destructors, operator overloads (in C++ and Python, where the name at the point of definition does not match the name at the call site) and various other methods.
For example, in Python, to allow indexing (foo[x]) to work, you define a method __getitem__ in the class to which foo belongs. But hardly ever would you call __getitem__ explicitly.
What you need to know is the (or all possible) entry point(s) to your code:
For a simple command line program, this is the "main" method or, in the most simple case, the top of your script.
For libraries, in fact, it is everything visible from outside.
The situation turns more complicated if methods can be referenced from outside by means of introspection. This is language specific and requires knowledge into details of the techniques used.
What you need to do is follow all references from all entry points recursively to mark up all used methods. Whatever remains unmarked can safely - and should - be removed.
Since this is a diligent but routine piece of work, there are tools available which do that for various programming languages. Examples include ReSharper for C# or ProGuard for Java.

Strategy for handling parameter validation in class library

I got a rather big class library that contains a lot of code.
I am looking at how to optimize the performance of some of the code, and for some rather simple utility methods I've found that the parameter validation occupies a rather large portion of the runtime for some core methods.
Let me give a typical example:
A.MethodA1 runs a loop, iterating over a collection, calling B.MethodB1 for each element
B.MethodB1 processes the element and returns the result, it's a rather basic calculation, but since it is used many places, it has been put into its own method instead of being copied and pasted where needed
A.MethodA1 calls C.MethodC1 with the results of B.MethodB1, and puts the result into a list that is returned at the end of the loop
In the case I've found now, B.MethodB1 does rudimentary parameter validation. Since the method calls other internal methods, I'd like to avoid having NullReferenceExceptions several layers deep into the code, and rather fail early, hence B.MethodB1 validates the parameters, like checking for null and some basic range checks on another parameter.
However, in this particular call scenario, it is impossible (due to other program logic) for these parameters to ever have the wrong values. If they had, from the program standpoint, B.MethodB1 would never be called at all for those values, A.MethodA1 would fail before the call to B.MethodB1.
So I was considering removing the parameter validation in B.MethodB1, since it occupies roughly 65% of the method runtime (and this is part of some heavily used code.)
However, B.MethodB1 is a public method, and can thus be called from the program, in which case I want the parameter validation.
So how would you solve this dilemma?
Keep the parameter validation, and take the performance hit
Remove the parameter validation, and have potentially fail-late problems in the method
Split the method into two, one internal that doesn't have parameter validation, called by the "safe" path, and one public that has the parameter validation + a call to the internal version.
The latter one would give me the benefits of having no parameter validation, while still exposing a public entrypoint which does have parameter validation, but for some reason it doesn't sit right with me.
Opinions?
I would go with option 3. I tend to use assertions for private and internal methods and do all the validation in public methods.
By the way, is the performance hit really that big?
That's an interesting question.
Hmmm, makes me think ... "code contracts" .. It would seem like it might be technically possible to statically (at compile time) have certain code contracts be proven to be fulfilled. If this were the case and you had such a compilation validation option you could state these contracts without ever having to validate the conditions at runtime.
It would require that the client code itself be validated against the code contacts.
And, of course it would inevitably be highly dependent on the type of conditions you'd want to write, and it would probably only be feasible to prove these contracts to a certain point (how far up the possible call graph would you go?). Beyond this point the validator might have to beg off, and insist that you place a runtime check (or maybe a validation warning suppression?).
All just idle speculation. Does make me wonder a bit more about C# 4.0 code contracts. I wonder if these have support for static analysis. Have you checked them out? I've been meaning to, but learning F# is having to take priority at the moment!
Update:
Having read up a little on it, it appears that C# 4.0 does indeed have a 'static checker' as well as a binary rewriter (which takes care of altering the output binary so that pre and post condition checks are in the appropriate location)
What's not clear from my extremely quick read, is whether you can opt out of the binary rewriting - what I'm thinking here is that what you'd really be looking for is to use the code contracts, have the metadata (or code) for the contracts maintained within the various assemblies but use only the static checker for at least a selected subset of contracts, so that you in theory get proven safety without any runtime hit.
Here's a link to an article on the code contracts

How to save and load different types of objects?

During coding I frequently encounter this situation:
I have several objects (ConcreteType1, ConcreteType2, ...) with the same base type AbstractType, which has abstract methods save and load . Each object can (and has to) save some specific kind of data, by overriding the save method.
I have a list of AbstractType objects which contains various ConcreteTypeX objects.
I walk the list and the save method for each object.
At this point I think it's a good OO design. (Or am I wrong?) The problems start when I want to reload the data:
Each object can load its own data, but I have to know the concrete type in advance, so I can instantiate the right ConcreteTypeX and call the load method. So the loading method has to know a great deal about the concrete types. I usually "solved" this problem by writing some kind of marker before calling save, which is used by the loader to determine the right ConcreteTypeX.
I always had/have a bad feeling about this. It feels like some kind of anti-pattern...
Are there better ways?
EDIT:
I'm sorry for the confusion, I re-wrote some of the text.
I'm aware of serialization and perhaps there is some next-to-perfect solution in Java/.NET/yourFavoriteLanguage, but I'm searching for a general solution, which might be better and more "OOP-ish" compared to my concept.
Is this either .NET or Java? If so, why aren't you using serialisation?
If you can't simply use serialization, then I would still definitely pull the object loading logic out of the base class. Your instinct is correct, leading you to correctly identify a code smell. The base class shouldn't need to change when you change or add derived classes.
The problem is, something has to load the data and instantiate those objects. This sounds like a job for the Abstract Factory pattern.
There are better ways, but let's take a step back and look at it conceptually. What are all objects doing? Loading and Saving. When you get the object from memory, you really don't to have to care whether it gets its information from a file, a database, or the windows registry. You just want the object loaded. That's important to remember because later on, your maintanence programmer will look at the LoadFromFile() method and wonder, "Why is it called that since it really doesn't load anything from a file?"
Secondly, you're running into the issue that we all run into, and it's based in dividing work. You want a level that handles getting data from a physical source; you want a level that manipulates this data, and you want a level that displays this data. This is the crux of N-Tier Development. I've linked to an article that discusses your problem in great detail, and details how to create a Data Access Layer to resolve your issue. There are also numerous code projects here and here.
If it's Java you seek, simply substitute 'java' for .NET and search for 'Java N-Tier development'. However, besides syntactical differences, the design structure is the same.

DoSomethingToThing(Thing n) vs Thing.DoSomething()

What factors determine which approach is more appropriate?
I think both have their places.
You shouldn't simply use DoSomethingToThing(Thing n) just because you think "Functional programming is good". Likewise you shouldn't simply use Thing.DoSomething() because "Object Oriented programming is good".
I think it comes down to what you are trying to convey. Stop thinking about your code as a series of instructions, and start thinking about it like a paragraph or sentence of a story. Think about which parts are the most important from the point of view of the task at hand.
For example, if the part of the 'sentence' you would like to stress is the object, you should use the OO style.
Example:
fileHandle.close();
Most of the time when you're passing around file handles, the main thing you are thinking about is keeping track of the file it represents.
CounterExample:
string x = "Hello World";
submitHttpRequest( x );
In this case submitting the HTTP request is far more important than the string which is the body, so submitHttpRequst(x) is preferable to x.submitViaHttp()
Needless to say, these are not mutually exclusive. You'll probably actually have
networkConnection.submitHttpRequest(x)
in which you mix them both. The important thing is that you think about what parts are emphasized, and what you will be conveying to the future reader of the code.
To be object-oriented, tell, don't ask : http://www.pragmaticprogrammer.com/articles/tell-dont-ask.
So, Thing.DoSomething() rather than DoSomethingToThing(Thing n).
If you're dealing with internal state of a thing, Thing.DoSomething() makes more sense, because even if you change the internal representation of Thing, or how it works, the code talking to it doesn't have to change. If you're dealing with a collection of Things, or writing some utility methods, procedural-style DoSomethingToThing() might make more sense or be more straight-forward; but still, can usually be represented as a method on the object representing that collection: for instance
GetTotalPriceofThings();
vs
Cart.getTotal();
It really depends on how object oriented your code is.
Thing.DoSomething is appropriate if Thing is the subject of your sentence.
DoSomethingToThing(Thing n) is appropriate if Thing is the object of your sentence.
ThingA.DoSomethingToThingB(ThingB m) is an unavoidable combination, since in all the languages I can think of, functions belong to one class and are not mutually owned. But this makes sense because you can have a subject and an object.
Active voice is more straightforward than passive voice, so make sure your sentence has a subject that isn't just "the computer". This means, use form 1 and form 3 frequently, and use form 2 rarely.
For clarity:
// Form 1: "File handle, close."
fileHandle.close();
// Form 2: "(Computer,) close the file handle."
close(fileHandle);
// Form 3: "File handle, write the contents of another file handle."
fileHandle.writeContentsOf(anotherFileHandle);
I agree with Orion, but I'm going to rephrase the decision process.
You have a noun and a verb / an object and an action.
If many objects of this type will use this action, try to make the action part of the object.
Otherwise, try to group the action separately, but with related actions.
I like the File / string examples. There are many string operations, such as "SendAsHTTPReply", which won't happen for your average string, but do happen often in a certain setting. However, you basically will always close a File (hopefully), so it makes perfect sense to put the Close action in the class interface.
Another way to think of this is as buying part of an entertainment system. It makes sense to bundle a TV remote with a TV, because you always use them together. But it would be strange to bundle a power cable for a specific VCR with a TV, since many customers will never use this. The key idea is how often will this action be used on this object?
Not nearly enough information here. It depends if your language even supports the construct "Thing.something" or equivalent (ie. it's an OO language). If so, it's far more appropriate because that's the OO paradigm (members should be associated with the object they act on). In a procedural style, of course, DoSomethingtoThing() is your only choice... or ThingDoSomething()
DoSomethingToThing(Thing n) would be more of a functional approach whereas Thing.DoSomething() would be more of an object oriented approach.
That is the Object Oriented versus Procedural Programming choice :)
I think the well documented OO advantages apply to the Thing.DoSomething()
This has been asked Design question: does the Phone dial the PhoneNumber, or does the PhoneNumber dial itself on the Phone?
Here are a couple of factors to consider:
Can you modify or extend the Thing class. If not, use the former
Can Thing be instantiated. If not, use the later as a static method
If Thing actually get modified (i.e. has properties that change), prefer the latter. If Thing is not modified the latter is just as acceptable.
Otherwise, as objects are meant to map on to real world object, choose the method that seems more grounded in reality.
Even if you aren't working in an OO language, where you would have Thing.DoSomething(), for the overall readability of your code, having a set of functions like:
ThingDoSomething()
ThingDoAnotherTask()
ThingWeDoSomethingElse()
then
AnotherThingDoSomething()
and so on is far better.
All the code that works on "Thing" is on the one location. Of course, the "DoSomething" and other tasks should be named consistently - so you have a ThingOneRead(), a ThingTwoRead()... by now you should get point. When you go back to work on the code in twelve months time, you will appreciate taking the time to make things logical.
In general, if "something" is an action that "thing" naturally knows how to do, then you should use thing.doSomething(). That's good OO encapsulation, because otherwise DoSomethingToThing(thing) would have to access potential internal information of "thing".
For example invoice.getTotal()
If "something" is not naturally part of "thing's" domain model, then one option is to use a helper method.
For example: Logger.log(invoice)
If DoingSomething to an object is likely to produce a different result in another scenario, then i'd suggest you oneThing.DoSomethingToThing(anotherThing).
For example you may have two was of saving thing in you program so you might adopt a DatabaseObject.Save(thing) SessionObject.Save(thing) would be more advantageous than thing.Save() or thing.SaveToDatabase or thing.SaveToSession().
I rarely pass no parameters to a class, unless I'm retrieving public properties.
To add to Aeon's answer, it depends on the the thing and what you want to do to it. So if you are writing Thing, and DoSomething alters the internal state of Thing, then the best approach is Thing.DoSomething. However, if the action does more than change the internal state, then DoSomething(Thing) makes more sense. For example:
Collection.Add(Thing)
is better than
Thing.AddSelfToCollection(Collection)
And if you didn't write Thing, and cannot create a derived class, then you have no chocie but to do DoSomething(Thing)
Even in object oriented programming it might be useful to use a function call instead of a method (or for that matter calling a method of an object other than the one we call it on). Imagine a simple database persistence framework where you'd like to just call save() on an object. Instead of including an SQL statement in every class you'd like to have saved, thus complicating code, spreading SQL all across the code and making changing the storage engine a PITA, you could create an Interface defining save(Class1), save(Class2) etc. and its implementation. Then you'd actually be calling databaseSaver.save(class1) and have everything in one place.
I have to agree with Kevin Conner
Also keep in mind the caller of either of the 2 forms. The caller is probably a method of some other object that definitely does something to your Thing :)