Why are copy constructors unnecessary for immutable objects? - language-agnostic

Why are copy constructors unnecessary for immutable objects? Please explain this for me.

Because the value cannot change, it's every bit as good to reference the same object in all cases, there's no need to have an "extra copy", so to speak.

This is a language dependent question especially with respect to lifetime. For a moment lets forget about those.
Copy constructors are valuable in that they allow for you to take one object and create a completely independent copy of it. This is valuable in that you can now modify the second object independent of the first. Or a component can create a private copy to protect itself from other components changing the object out from under it.
Immutable objects are unchangeable. There is no value in creating a copy of an object that won't change.
Now lets thing about lifetime again. In languages like C++ copy constructors also allow you to work around memory / lifetime issues. For example if I'm writing an API which takes a SomeType* and I want to keep it around longer than the lifetime of my method. In C++ the most reliable way to do this is to create a copy of the object via a copy constructor.

This is somewhat language dependent:
However, many languages require a copy constructor. If you don't provide one, the language will implicitly generate one.
With an immutable object, however, this is typically fine, since the default copy constructor (typically) does a shallow copy of all values. With a mutable data type (ie: containing internal object references to other objects), shallow copying is typically a poor choice, since the copy is only copying the reference/pointer encapsulated within it.

it's so natural because value of immutable object can't be changed.

Related

How are implemented classes in dynamic languages?

How are implemented classes in dynamic languages ?
I know that Javascript is using a prototype pattern (there is 'somewhere' a container of unbound JS functions, which are bind when calling them through an object), but I have no idea of how it works in other languages.
I'm curious about this, because I can't think of an efficient way to have native bound methods without wasting memory and/or cpu by copying members for each instance.
(By bound method, I mean that the following code should work :)
class Foo { function bar() : return 42; };
var test = new Foo();
var method = test.bar;
method() == 42;
This highly depends on the language and the implementation. I'll tell you what I know about CPython and PyPy.
The general idea, which is also what CPython does for the most part, goes like this:
Every object has a class, specifically a reference to that class object.
Apart from instance members, which are obviously stored in the individual object, the class also has members. This includes methods, so methods don't have a per-object cost.
A class has a method resolution order (MRO) determined by the inheritance relationships, wherein each base class occurs exactly once. If we didn't have multiple inheritance, this would simply be a reference to the base class, but this way the MRO is hard to figure out on the fly (you'd have to start from the most derived class every time).
(Classes are also objects and have classes themselves, but we'll gloss over that for now.)
If attribute lookup on an object fails, the same attribute is looked up on the classes in the MRO, in the order specified by the MRO. (This is the default behavior, which can be changed by defining magic methods like __getattr__ and __getattribute__.)
So far so simple, and not really an explanation for bound methods. I just wanted to make sure we're talking about the same thing. The missing piece is descriptors. The descriptor protocol is defined in the "deep magic" section of the language reference, but the short and simple story is that lookup on a class can be hijacked by the object it results in via a __get__ method. More importantly, this __get__ method is told whether the lookup started on an instance or on the "owner" (the class).
In Python 2, we have an ugly and unnecessary UnboundMethod descriptor which (apart from the __get__ method) simply wraps the function to throw errors on Class.method(self) if self is not of an acceptable type. In Python 3, the __get__ is simply part of all function objects, and unbound methods are gone. In both cases, the __get__ method returns itself when you look it up on a class (so you can use Class.method, which is useful in a few cases) and a "bound method" object when you look it up on an object. This bound method object does nothing more than storing the raw function and the instance, and passing the latter as first argument to the former in its __call__ (special method overriding the function call syntax).
So, for CPython: While there is a cost to bound methods, it's smaller than you might think. Only two references are needed space-wise, and the CPU cost is limited to a small memory allocation, and an extra indirection when calling. Note though that this cost applies to all method calls, not just those which actually make use of bound method features. a.f() has to call the descriptor and use its return value, because in a dynamic language we don't know if it's monkey-patched to do something different.
In PyPy, things are more interesting. As it's an implementation which doesn't compromise on correctness, the above model is still correct for reasoning about semantics. However, it's actually faster. Apart from the fact that the JIT compiler inlines and then eliminates the entire mess described above in most cases, they also tackle the problem on bytecode level. There are two new bytecode instructions, which preserve the semantics but omit the allocation of the bound method object in the case of a.f(). There is also a method cache which can simplify the lookup process, but requires some additional bookkeeping (though some of that bookkeeping is already done for the JIT).

C++0x bind and function without copy-construction

I'm currently trying out a few of the new C++0x features, namely std::function and std::bind. These two functions seem rather suitable for a event-delegate-system for C++ that works like in C♯. I've tried myself to create something like delegates before, but the Hacks I would have needed for member-function-pointers were to much for me…
During my tests I noticed that std::bind copies every object you bind. While that surely enhances safety - can't delete a still registered eventhandler :) - it's also a problem with stateful objects. Is there a way to deactivate the copying - or at least a way to obtain the encapsulated object from the std::function again?
PS: Is there a reference for the features that are going to be included in C++0x (hopefully C++11!) In the end it's at major parts of TR1 and a few additions…
I tried cppreference.org, but they are still at an early stage at documentation, cplusplus.com on the other seems to not even have started on covering C++0x.
If you want to avoid copying use std::ref and/or std::cref. They wrap the object into a pseudoreference
It isn't quite right that:
I noticed that std::bind copies every
object you bind.
At least that isn't the intended specification. You should be able to move a non-copyable object into a bind:
std::bind(f, std::unique_ptr<int>(new int(3)))
However, now that the move-only object is stored in the binder, it is an lvalue. Therefore you can only call it if f accepts an lvalue move-only object (say by lvalue reference). If this is not acceptable, and if the source object outlives the binder, then use of std::ref is another good solution (as mentioned by Armen).
If you need to copy the bound object, then all of its bound arguments must be copyable. But if you only move construct the bound object, then it will only move construct its bound arguments.
The best reference is N3242. There isn't a good and comprehensive tutorial that I'm aware of yet. I might start with the boost documentation with the understanding that std::bind has been adapted to work with rvalue-refs as much as possible.
I have created a move compatable version of bind. there are still lots of problems with it like the binders constructor and a few buggylines here and there etc but it seems to work
check it out here
http://code-slim-jim.blogspot.jp/2012/11/perfect-forwarding-bind-compatable-with.html

Does a deep copy operation recursively copies subvariables which it doesn't own?

Given an object that has a variable which it doesn't own; that is, the variable is composed by aggregation instead of composition. Will a deep copy operation copy the variable or only the link to it?
I like the distinction that you are making here between the role of composition and aggregation in the context of a deep copy.
I am going to go against the other answer and say: no, an object should not deep-copy another object that it doesn't own.
One would expect a deep copy of an object to be (at least initially) identical to the original. If a deep copy were made of a reference that the original didn't own, then this leaves open the question of what owns the new copy. If the clone owns it, then it would not be identical to the original object. It would be an object like the original, except it owns the reference to one of its aggregated members. This would surely lead to chaos. If the clone doesn't own it, then who does?
This problem of ownership is especially important in non-garbage-collected languages, but it also creates problems even with a garbage collector. For example, if the clone is made to allow uncommitted changes to an object, are changes to be allowed on this other object that it references? If changes are not allowed, then there was no reason to deep-copy it. If changes are allowed, then how are those changes to be committed, since the object being modified doesn't control this referenced object? Sure, a mechanism for this could be contrived, but it would surly mean that the cloned object is overstepping its responsibilities, and the program would be a maintenance nightmare.
A deep copy operation that includes unowned objects also leads to problems of infinite (or at least excessive) copy operations. Suppose an object is part of a collection, and further suppose the object requires a reference to the collection. A naive deep-copy operation on that object would then create a new copy of the collection and each of its members. Even assuming that we avoid the problem of infinite recursion, and keep all the references consistent among this new set of objects, it is still excessive for most purposes, and for those cases where a new collection is desired, wouldn't it make more sense to deep-copy the collection itself, rather than one of its members, for this purpose?
I think a deep-copy that only includes owned objects, as you suggest, is the only sane approach for most purposes.
Deep copy in oposite to shallow one should copy whole object recursively to the ground and make completely new copy of object and all contained objects.
So yes, it should copy variables, not only links..

Am I overdoing it with my Factory Method?

Part of our core product is a website CMS which makes use of various page widgets. These widgets are responsible for displaying content, listing products, handling event registration, etc. Each widget is represented by class which derives from the base widget class. When rendering a page the server grabs the page's widget from the database and then creates an instance of the correct class. The factory method right?
Private Function WidgetFactory(typeId)
Dim oWidget
Select Case typeId
Case widgetType.ContentBlock
Set oWidget = New ContentWidget
Case widgetType.Registration
Set oWidget = New RegistrationWidget
Case widgetType.DocumentList
Set oWidget = New DocumentListWidget
Case widgetType.DocumentDisplay
End Select
Set WidgetFactory = oWidget
End Function
Anyways, this is all fine but as time has gone on the number of types of widgets has increased to around 50 meaning the factory method is rather long. Every time I create a new type of widget I go to add another couple of lines to the method and a little alarm rings in my head that maybe this isn't the best way to do things. I tend to just ignore that alarm but it's getting louder.
So, am I doing it wrong? Is there a better way to handle this scenario?
I think the question you should ask yourself is: Why am I using a Factory method here?
If the answer is "because of A", and A is a good reason, then continue doing it, even if it means some extra code. If the answer is "I don't know; because I've heard that you are supposed to do it this way?" then you should reconsider.
Let's go over the standard reasons for using factories. Here's what Wikipedia says about the Factory method pattern:
[...], it deals with the problem of creating objects (products) without specifying the exact class of object that will be created. The factory method design pattern handles this problem by defining a separate method for creating the objects, whose subclasses can then override to specify the derived type of product that will be created.
Since your WidgetFactory is Private, this is obviously not the reason why you use this pattern. What about the "Factory pattern" itself (independent of whether you implement it using a Factory method or an abstract class)? Again, Wikipedia says:
Use the factory pattern when:
The creation of the object precludes reuse without significantly duplicating code.
The creation of the object requires access to information or resources not appropriate to contain within the composing object.
The lifetime management of created objects needs to be centralised to ensure consistent behavior.
From your sample code, it does not look like any of this matches your need. So, the question (which only you can answer) is: (1) How likely is it that you will need the features of a centralized Factory for your widgets in the future and (2) how costly is it to change everything back to a Factory approach if you need it in the future? If both are low, you can safely drop the Factory method for the time being.
EDIT: Let me get back to your special case after this generic elaboration: Usually, it's a = new XyzWidget() vs. a = WidgetFactory.Create(WidgetType.Xyz). In your case, however, you have some (numeric?) typeId from a database. As Mark correctly wrote, you need to have this typeId -> className map somewhere.
So, in that case, the good reason for using a factory method could be: "I need some kind of huge ConvertWidgetTypeIdToClassName select-case-statement anyway, so using a factory method takes no additional code plus it provides the factory method advantages for free, if I should ever need them."
As an alternative, you could store the class name of the widget in the database (you probably already have some WidgetType table with primary key typeId anyway, right?) and create the class using reflection (if your language allows for this type of thing). This has a lot of advantages (e.g. you could drop in DLLs with new widgets and don't have to change your core CMS code) but also disadvantages (e.g. "magic string" in your database which is not checked at compile time; possible code injection, depending on who has access to that table).
The WidgetFactory method is really a mapping from a typeId enumeration to concrete classes. In general it's best if you can avoid enumerations entirely, but sometimes (particularly in web applications) you need to round-trip to an environment (e.g. the browser) that doesn't understand polymorphism and you need such measures.
Refactoring contains a pretty good explanation of why switch/select case statements are code smells, but that mainly addresses the case where you have many similar switches.
If your WidgetFactory method is the only place where you switch on that particular enum, I would say that you don't have to worry. You need to have that map somewhere.
As an alternative, you could define the map as a dictionary, but the amount of code lines wouldn't decrease significantly - you may be able to cut the lines of code in half, but the degree of complexity would stay equivalent.
Your application of the factory pattern is correct. You have information which dictates which of N types is created. A factory is what knows how to do that. (It is a little odd as a private method. I would expect it to be on an IWidgetFactory interface.)
Your implementation, though, tightly couples the implementation to the concrete types. If you instead mapped typeId -> widgetType, you could use Activator.CreateInstance(widgetType) to make the factory understand any widget type.
Now, you can define the mappings however you want: a simple dictionary, discovery (attributes/reflection), in the configuration file, etc. You have to know all the types in one place somewhere, but you also have the option to compose multiple sources.
The classic way of implementing a factory is not to use a giant switch or if-ladder, but instead to use a map which maps object type name to an object creation function. Apart from anything else, this allows the factory to be modified at run-time.
Whether it's proper or not, I've always believed that the time to use a Factory is when the decision of what object type to create will be based upon information that is not available until run-time.
You indicated in a followup comment that the widget type is stored in a database. Since your code does not know what objects will be created until run-time, I think that this is a perfectly valid use of the Factory pattern. By having the factory, you enable your program to defer the decision of which object type to use until the time when the decision can actually be made.
It's been my experience that Factories grow so their dependencies don't have to. If you see this mapping duplicating itself in other places then you have cause for worry.
try categories your widgets, maybe based on their functionality.
if few of them are logically depending on each other, create them with single construction

Should persistent objects validate data upon set?

If one has a object which can persist itself across executions (whether to a DB using ORM, using something like Python's shelve module, etc), should validation of that object's attributes be placed within the class representing it, or outside?
Or, rather; should the persistent object be dumb and expect whatever is setting it's values to be benevolent, or should it be smart and validate the data being assigned to it?
I'm not talking about type validation or user input validation, but rather things that affect the persistent object such as links/references to other objects exist, ensuring numbers are unsigned, that dates aren't out of scope, etc.
Validation is a part of the encapsulation- an object is responsible for it's internal state, and validation is part of it's internal state.
It's like asking "should I let an object do a function and set his own variables or should I user getters to get them all, do the work in an external function and then you setters to set them back?"
Of course you should use a library to do most of the validation- you don't want to implement the "check unsigned values" function in every model, so you implement it at one place and let each model use it in his own code as fit.
The object should validate the data input. Otherwise every part of the application which assigns data has to apply the same set of tests, and every part of the application which retrieves the persisted data will need to handle the possibility that some other module hasn't done their checks properly.
Incidentally I don't think this is an object-oriented thang. It applies to any data persistence construct which takes input. Basically, you're talking Design By Contract preconditions.
My policy is that, for a global code to be robust, each object A should check as much as possible, as early as possible. But the "as much as possible" needs explanation:
The internal coherence of each field B in A (type, range in type etc) should be checked by the field type B itself. If it is a primitive field, or a reused class, it is not possible, so the A object should check it.
The coherence of related fields (if that B field is null, then C must also be) is the typical responsibility of object A.
The coherence of a field B with other codes that are external to A is another matter. This is where the "pojo" approach (in Java, but applicable to any language) comes into play.
The POJO approach says that with all the responsibilities/concerns that we have in modern software (persistance & validation are only two of them), domain model end up being messy and hard to understand. The problem is that these domain objects are central to the understanding of the whole application, to communicating with domain experts and so on. Each time you have to read a domain object code, you have to handle the complexity of all these concerns, while you might care of none or one...
So, in the POJO approach, your domain objects must not carry code related to one of these concerns (which usually carries an interface to implement, or a superclass to have).
All concern except the domain one are out of the object (but some simple information can still be provided, in java usually via Annotations, to parameterize generic external code that handle one concern).
Also, the domain objects relate only to other domain objects, not to some framework classes related to one concern (such as validation, or persistence). So the domain model, with all classes, can be put in a separate "package" (project or whatever), without dependencies on technical or concern-related codes. This make it much easier to understand the heart of a complex application, without all that complexity of these secondary aspects.