Does a deep copy operation recursively copies subvariables which it doesn't own? - language-agnostic

Given an object that has a variable which it doesn't own; that is, the variable is composed by aggregation instead of composition. Will a deep copy operation copy the variable or only the link to it?

I like the distinction that you are making here between the role of composition and aggregation in the context of a deep copy.
I am going to go against the other answer and say: no, an object should not deep-copy another object that it doesn't own.
One would expect a deep copy of an object to be (at least initially) identical to the original. If a deep copy were made of a reference that the original didn't own, then this leaves open the question of what owns the new copy. If the clone owns it, then it would not be identical to the original object. It would be an object like the original, except it owns the reference to one of its aggregated members. This would surely lead to chaos. If the clone doesn't own it, then who does?
This problem of ownership is especially important in non-garbage-collected languages, but it also creates problems even with a garbage collector. For example, if the clone is made to allow uncommitted changes to an object, are changes to be allowed on this other object that it references? If changes are not allowed, then there was no reason to deep-copy it. If changes are allowed, then how are those changes to be committed, since the object being modified doesn't control this referenced object? Sure, a mechanism for this could be contrived, but it would surly mean that the cloned object is overstepping its responsibilities, and the program would be a maintenance nightmare.
A deep copy operation that includes unowned objects also leads to problems of infinite (or at least excessive) copy operations. Suppose an object is part of a collection, and further suppose the object requires a reference to the collection. A naive deep-copy operation on that object would then create a new copy of the collection and each of its members. Even assuming that we avoid the problem of infinite recursion, and keep all the references consistent among this new set of objects, it is still excessive for most purposes, and for those cases where a new collection is desired, wouldn't it make more sense to deep-copy the collection itself, rather than one of its members, for this purpose?
I think a deep-copy that only includes owned objects, as you suggest, is the only sane approach for most purposes.

Deep copy in oposite to shallow one should copy whole object recursively to the ground and make completely new copy of object and all contained objects.
So yes, it should copy variables, not only links..

Related

Why are copy constructors unnecessary for immutable objects?

Why are copy constructors unnecessary for immutable objects? Please explain this for me.
Because the value cannot change, it's every bit as good to reference the same object in all cases, there's no need to have an "extra copy", so to speak.
This is a language dependent question especially with respect to lifetime. For a moment lets forget about those.
Copy constructors are valuable in that they allow for you to take one object and create a completely independent copy of it. This is valuable in that you can now modify the second object independent of the first. Or a component can create a private copy to protect itself from other components changing the object out from under it.
Immutable objects are unchangeable. There is no value in creating a copy of an object that won't change.
Now lets thing about lifetime again. In languages like C++ copy constructors also allow you to work around memory / lifetime issues. For example if I'm writing an API which takes a SomeType* and I want to keep it around longer than the lifetime of my method. In C++ the most reliable way to do this is to create a copy of the object via a copy constructor.
This is somewhat language dependent:
However, many languages require a copy constructor. If you don't provide one, the language will implicitly generate one.
With an immutable object, however, this is typically fine, since the default copy constructor (typically) does a shallow copy of all values. With a mutable data type (ie: containing internal object references to other objects), shallow copying is typically a poor choice, since the copy is only copying the reference/pointer encapsulated within it.
it's so natural because value of immutable object can't be changed.

Should persistent objects validate data upon set?

If one has a object which can persist itself across executions (whether to a DB using ORM, using something like Python's shelve module, etc), should validation of that object's attributes be placed within the class representing it, or outside?
Or, rather; should the persistent object be dumb and expect whatever is setting it's values to be benevolent, or should it be smart and validate the data being assigned to it?
I'm not talking about type validation or user input validation, but rather things that affect the persistent object such as links/references to other objects exist, ensuring numbers are unsigned, that dates aren't out of scope, etc.
Validation is a part of the encapsulation- an object is responsible for it's internal state, and validation is part of it's internal state.
It's like asking "should I let an object do a function and set his own variables or should I user getters to get them all, do the work in an external function and then you setters to set them back?"
Of course you should use a library to do most of the validation- you don't want to implement the "check unsigned values" function in every model, so you implement it at one place and let each model use it in his own code as fit.
The object should validate the data input. Otherwise every part of the application which assigns data has to apply the same set of tests, and every part of the application which retrieves the persisted data will need to handle the possibility that some other module hasn't done their checks properly.
Incidentally I don't think this is an object-oriented thang. It applies to any data persistence construct which takes input. Basically, you're talking Design By Contract preconditions.
My policy is that, for a global code to be robust, each object A should check as much as possible, as early as possible. But the "as much as possible" needs explanation:
The internal coherence of each field B in A (type, range in type etc) should be checked by the field type B itself. If it is a primitive field, or a reused class, it is not possible, so the A object should check it.
The coherence of related fields (if that B field is null, then C must also be) is the typical responsibility of object A.
The coherence of a field B with other codes that are external to A is another matter. This is where the "pojo" approach (in Java, but applicable to any language) comes into play.
The POJO approach says that with all the responsibilities/concerns that we have in modern software (persistance & validation are only two of them), domain model end up being messy and hard to understand. The problem is that these domain objects are central to the understanding of the whole application, to communicating with domain experts and so on. Each time you have to read a domain object code, you have to handle the complexity of all these concerns, while you might care of none or one...
So, in the POJO approach, your domain objects must not carry code related to one of these concerns (which usually carries an interface to implement, or a superclass to have).
All concern except the domain one are out of the object (but some simple information can still be provided, in java usually via Annotations, to parameterize generic external code that handle one concern).
Also, the domain objects relate only to other domain objects, not to some framework classes related to one concern (such as validation, or persistence). So the domain model, with all classes, can be put in a separate "package" (project or whatever), without dependencies on technical or concern-related codes. This make it much easier to understand the heart of a complex application, without all that complexity of these secondary aspects.

Doesn't Passing in Parameters that Should Be Known Implicitly Violate Encapsulation?

I often hear around here from test driven development people that having a function get large amounts of information implicitly is a bad thing. I can see were this would be bad from a testing perspective, but isn't it sometimes necessary from an encapsulation perspective? The following question comes to mind:
Is using Random and OrderBy a good shuffle algorithm?
Basically, someone wanted to create a function in C# to randomly shuffle an array. Several people told him that the random number generator should be passed in as a parameter. This seems like an egregious violation of encapsulation to me, even if it does make testing easier. Isn't the fact that an array shuffling algorithm requires any state at all other than the array it's shuffling an implementation detail that the caller should not have to care about? Wouldn't the correct place to get this information be implicitly, possibly from a thread-local singleton?
I don't think it breaks encapsulation. The only state in the array is the data itself - and "a source of randomness" is essentially a service. Why should an array naturally have an associated source of randomness? Why should that have to be a singleton? What about different situations which have different requirements - e.g. speed vs cryptographically secure randomness? There's a reason why java.util.Random has a SecureRandom subclass :) Perhaps it doesn't matter whether the shuffle's results are predictable with a lot of effort and observation - or perhaps it does. That will depend on the context, and that's information that the shuffle algorithm shouldn't care about.
Once you start thinking of it as a service, it makes sense that it's passed in as a dependency.
Yes, you could get it from a thread-local singleton (and indeed I'm going to blog about exactly that in the next few days) but I would generally code it so that the caller gets to make that decision.
One benefit of the "randomness as a service" concept is that it makes for repeatability - if you've got a test which fails, you can pass in a Random with a specific seed and know you'll always get the same results, which makes debugging easier.
Of course, there's always the option of making the Random optional - use a thread-local singleton as a default if the caller doesn't provide their own.
Yes, that does break encapsulation. As with most software design decisions, this is a trade-off between two opposing forces. If you encapsulate the RNG then you make it difficult to change for a unit test. If you make it a parameter then you make it easy for a user to change the RNG (and potentially get it wrong).
My personal preference is to make it easy to test, then provide a default implementation (a default constructor that creates its own RNG, in this particular case) and good documentation for the end user. Adding a method with the signature
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
that creates a Random using the current system time as its seed would take care of most normal use cases of this method. The original method
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
could be used for testing (pass in a Random object with a known seed) and also in those rare cases where a user decides they need a cryptographically secure RNG. The one-parameter implementation should call this method.
I don't think this violates encapsulation.
Your Example
I would say that being able to provide an RNG is a feature of the class. I would obviously provide a method that doesn't require it, but I can see times where it may be useful to be able to duplicate the randomization.
What if the array shuffler was part of a game that used the RNG for level generation. If a user wanted to save the level and play it again later, it may be more efficient to store the RNG seed.
General Case
Simple classes that have a single task like this typically don't need to worry about divulging their inner workings. What they encapsulate is the logic of the task, not the elements required by that logic.

Undo/Redo with immutable objects

I read the following in an article
Immutable objects are particularly handy for implementing certain common idioms such as undo/redo and abortable transactions. Take undo for example. A common technique for implementing undo is to keep a stack of objects that somehow know how to run each command in reverse (the so-called "Command Pattern"). However, figuring out how to run a command in reverse can be tricky. A simpler technique is to maintain a stack of immutable objects representing the state of the system between successive commands. Then, to undo a command, you simply revert back to the previous system state (and probably store the current state on the redo stack).
However, the article does not show a good practical example of how immutable objects could be used to implement "undo" operations. For example... deleting 10 emails from a gmail inbox. Once you do that, it has an undo option. How would an immutable object help in this regard?
The immutable objects would hold the entire state of the system, so in this case you'd have object A that contains the original inbox, and then object B that contains the inbox with ten e-mails deleted, and (in effect) a pointer back from B to A indicating that, if you do one "undo", then you stop using B as the state of the system and start using A instead.
However, Gmail inboxes are far too large to use this technique. You'd use it on documents that can actually be stored in a fairly small amount of memory, so that you can keep many of them around for multi-level undo.
If you want to keep ten levels of undo, you can potentially save memory by only keeping two immutable objects - one that is current, and one that is from ten "undos" ago - and a list of Commands that were applied between them.
To do an "undo", you re-execute all but the last Command object, use that as the new current object, and erase the last Command (or save it as a "Redo" object). Every time you do a new action, you update the current object, add the associated Command to the list, and then (if the list is more than ten Commands long) you execute the first Command on the object from the start of the undo list and throw away the first Command on the list.
You can do various other checkpointing systems as well, involving a variable number of complete representations of the system as well as a variable number of Commands between them. But it gets further and further from the original idea that you cited and becomes more and more like a typical mutable system. It does, however, avoid the problem of making Commands consistently reversible; you need only ever apply Commands to an object forward and not reverse.
SVN and other version control systems are effectively a disk- or network-based form of undo-and-redo.

Repository, Entity objects and Domain Objects

In my Repositories, I am making assignments to my domain objects from the Linq Entity queries. I then have a service layer to act on these object returned from repositories.
Should my Domain objects be in the repository like this? Or should my repositories be restricted to the Entities and Data Access, and instead have my service layer make assignments to the domain objects?
Doing all assignments in Repository seems easier, but now the distinction between my database and domain objects is not apparent. What is proper practice here? tia
IMO if the app is relativly simple and you cant imagine ripping out the Data access go ahead and make the asignments in the Repository. But if you think the app will get more complicated in the future or that you may want to change the data access keep this functionality out of the repositories.
I have done apps with assignement in the repositories and other in the service layer and yet another one i had a seperate conversion layer (it was not a one on one conversion and the objects were complex).
One thing to remember about best practices, There there to help, if it makes thing more dificult then dont use it.
I used to not like it. But now usually never look back. Basically the thing is that if you need to change to an external datasource that is structured different, you can set up a new mapping along with the implementation of the repository code and be done with it.
It is about data mapping. Check this link: http://www.martinfowler.com/eaaCatalog/repository.html
Also check this related question: IRepository confusion on objects returned. I have used a similar mapper, but have made it operated at the IQueryable level, which have made able to do some pretty interesting stuff while working with the Domain Object after the mapping.