Programming style: should you return early if a guard condition is not satisfied? - language-agnostic

One thing I've sometimes wondered is which is the better style out of the two shown below (if any)? Is it better to return immediately if a guard condition hasn't been satisfied, or should you only do the other stuff if the guard condition is satisfied?
For the sake of argument, please assume that the guard condition is a simple test that returns a boolean, such as checking to see if an element is in a collection, rather than something that might affect the control flow by throwing an exception. Also assume that methods/functions are short enough not to require editor scrolling.
// Style 1
public SomeType aMethod() {
SomeType result = null;
if (!guardCondition()) {
return result;
}
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}
// Style 2
public SomeType aMethod() {
SomeType result = null;
if (guardCondition()) {
doStuffToResult(result);
doMoreStuffToResult(result);
}
return result;
}

I prefer the first style, except that I wouldn't create a variable when there is no need for it. I'd do this:
// Style 3
public SomeType aMethod() {
if (!guardCondition()) {
return null;
}
SomeType result = new SomeType();
doStuffToResult(result);
doMoreStuffToResult(result);
return result;
}

Having been trained in Jackson Structured Programming in the late '80s, my ingrained philosophy was always "a function should have a single entry-point and a single exit-point"; this meant I wrote code according to Style 2.
In the last few years I have come to realise that code written in this style is often overcomplex and hard to read/maintain, and I have switched to Style 1.
Who says old dogs can't learn new tricks? ;)

Style 1 is what the Linux kernel indirectly recommends.
From https://www.kernel.org/doc/Documentation/process/coding-style.rst, chapter 1:
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
Style 2 adds levels of indentation, ergo, it is discouraged.
Personally, I like style 1 as well. Style 2 makes it harder to match up closing braces in functions that have several guard tests.

I don't know if guard is the right word here. Normally an unsatisfied guard results in an exception or assertion.
But beside this I'd go with style 1, because it keeps the code cleaner in my opinion. You have a simple example with only one condition. But what happens with many conditions and style 2? It leads to a lot of nested ifs or huge if-conditions (with || , &&). I think it is better to return from a method as soon as you know that you can.
But this is certainly very subjective ^^

Martin Fowler refers to this refactoring as :
"Replace Nested Conditional with Guard Clauses"
If/else statements also brings cyclomatic complexity. Hence harder to test cases. In order to test all the if/else blocks you might need to input lots of options.
Where as if there are any guard clauses, you can test them first, and deal with the real logic inside the if/else clauses in a clearer fashion.

If you dig through the .net-Framework using .net-Reflector you will see the .net programmers use style 1 (or maybe style 3 already mentioned by unbeli).
The reasons are already mentioned by the answers above. and maybe one other reason is to make the code better readable, concise and clear.
the most thing this style is used is when checking the input parameters, you always have to do this if you program a kind of frawework/library/dll.
first check all input parameters than work with them.

It sometimes depends on the language and what kinds of "resources" that you are using (e.g. open file handles).
In C, Style 2 is definitely safer and more convenient because a function has to close and/or release any resources that it obtained during execution. This includes allocated memory blocks, file handles, handles to operating system resources such as threads or drawing contexts, locks on mutexes, and any number of other things. Delaying the return until the very end or otherwise restricting the number of exits from a function allows the programmer to more easily ensure that s/he properly cleans up, helping to prevent memory leaks, handle leaks, deadlock, and other problems.
In C++ using RAII-style programming, both styles are equally safe, so you can pick one that is more convenient. Personally I use Style 1 with RAII-style C++. C++ without RAII is like C, so, again, Style 2 is probably better in that case.
In languages like Java with garbage collection, the runtime helps smooth over the differences between the two styles because it cleans up after itself. However, there can be subtle issues with these languages, too, if you don't explicitly "close" some types of objects. For example, if you construct a new java.io.FileOutputStream and do not close it before returning, then the associated operating system handle will remain open until the runtime garbage collects the FileOutputStream instance that has fallen out of scope. This could mean that another process or thread that needs to open the file for writing may be unable to until the FileOutputStream instance is collected.

Although it goes against best practices that I have been taught I find it much better to reduce the nesting of if statements when I have a condition such as this. I think it is much easier to read and although it exits in more than one place it is still very easy to debug.

I would say that Style1 became more used because is the best practice if you combine it with small methods.
Style2 look a better solution when you have big methods. When you have them ... you have some common code that you want to execute no matter how you exit. But the proper solution is not to force a single exit point but to make the methods smaller.
For example if you want to extract a sequence of code from a big method, and this method has two exit points you start to have problems, is hard to do it automatically. When i have a big method written in style1 i usually transform it in style2, then i extract methods then in each of them i should have Style1 code.
So Style1 is best but is compatible with small methods.
Style2 is not so good but is recommended if you have big methods that you don't want, have time to split.

I prefer to use method #1 myself, it is logically easier to read and also logically more similar to what we are trying to do. (if something bad happens, exit function NOW, do not pass go, do not collect $200)
Furthermore, most of the time you would want to return a value that is not a logically possible result (ie -1) to indicate to the user who called the function that the function failed to execute properly and to take appropriate action. This lends itself better to method #1 as well.

I would say "It depends on..."
In situations where I have to perform a cleanup sequence with more than 2 or 3 lines before leaving a function/method I would prefer style 2 because the cleanup sequence has to be written and modified only once. That means maintainability is easier.
In all other cases I would prefer style 1.

Number 1 is typically the easy, lazy and sloppy way. Number 2 expresses the logic cleanly. What others have pointed out is that yes it can become cumbersome. This tendency though has an important benefit. Style #1 can hide that your function is probably doing too much. It doesn't visually demonstrate the complexity of what's going on very well. I.e. it prevents the code from saying to you "hey this is getting a bit too complex for this one function". It also makes it a bit easier for other developers that don't know your code to miss those returns sprinkled here and there, at first glance anyway.
So let the code speak. When you see long conditions appearing or nested if statements it is saying that maybe it would be better to break this stuff up into multiple functions or that it needs to be rewritten more elegantly.

Related

Conditional and Loop return in the middle of the code is that correct? [duplicate]

This question already has answers here:
Should a function have only one return statement?
(50 answers)
Closed 9 years ago.
Sometimes me have indicated that you can not put a return in the middle of a conditional or a loop, because it breaks the process. However, now have indicated to me that if you can do, and is better. I'm confused. Usually would happen in a function
Can you put a return? Is not it? Why? Or doesn't it make any difference?
Example:
if (i == 0)
{
//other code
return true;
}
else
{
//other code
return false;
}
or
if (i == 0)
{
//other code
b= true;
}
else
{
//other code
b= false;
}
return b;
Your two examples are basically equivalent in functionality, and either will work. In fact, an optimizing compiler may easily turn your second example into your first.
Most programmers would likely prefer the first as the intent is clearer.
It's better to have a single return at the bottom. That way, you have only one point of entry and one point of exit. It is much easier to debug code when you don't have to worry about where it will exit. This is not big deal with very short methods, but for long ones that go on for a few hundred lines, it is much cleaner.
I don't see any practical implication of returning in the middle of a loop. If you hear people saying you shouldn't, then it must be on the basis of readability of the code. If you have multiple exit points from the function, it might make some code ugly. Also, most of the time, you have to do some cleanup before exiting the routine. So, generally programmers tend to keep the cleanup routine at one place and always exit through that path. if you have multiple exit points then you have to add the clean up routines in all these places, that makes code duplication and again ruin the readability of the code. I have seen codes with returns spread all over the places and eventually failing to do the clean up properly and causing memory leaks.
The bigger problem is, most of the time the code you write now lives for a long time and the maintainer keeps changing, and at some point people doesn't understand the whole intent of all the lines of code present. that will add in to all these confusion.
All that said, I have seen a lot very beautifully written code with returns in the middle of loops.
This is a choice of style rather than it being a rule or a matter of performance. The second code example follows the "single entry, single exit" approach, where the code within the function only enters from the top and only exits from the bottom. The idea behind this is that this is more "safe" and easier to follow the code flow. The safety comes into play when you have manually set dynamic storage: with a single point of return, you can ensure that you free all the memory. Of course, languages like java and C# do dynamic storage for you, so this isn't really an issue. Also, if you're exiting multiple times in the middle of a function (particularly if it's very long), it might be hard to keep track of what causes the function to return.
However, choosing to exit only at the bottom of a function can create its own problems, as you may sometimes need to keep track of more state by setting and checking flags.
As for your original question, it certainly does not break anything in modern programming languages; it's all up to you. Go with the way you find easier to follow.

Successive success checks

Most of you have probably bumped into a situation, where multiple things must be in check and in certain order before the application can proceed, for example in a very simple case of creating a listening socket (socket, bind, listen, accept etc.). There are at least two obvious ways (don't take this 100% verbatim):
if (1st_ok)
{
if (2nd_ok)
{
...
or
if (!1st_ok)
{
return;
}
if (!2nd_ok)
{
return;
}
...
Have you ever though of anything smarter, do you prefer one over the other of the above, or do you (if the language provides for it) use exceptions?
I prefer the second technique. The main problem with the first one is that it increases the nesting depth of the code, which is a significant issue when you've got a substantial number of preconditions/resource-allocs to check since the business part of the function ends up deeply buried behind a wall of conditions (and frequently loops too). In the second case, you can simplify the conceptual logic to "we've got here and everything's OK", which is much easier to work with. Keeping the normal case as straight-line as possible is just easier to grok, especially when doing maintenance coding.
It depends on the language - e.g. in C++ you might well use exceptions, while in C you might use one of several strategies:
if/else blocks
goto (one of the few cases where a single goto label for "exception" handling might be justified
use break within a do { ... } while (0) loop
Personally I don't like multiple return statements in a function - I prefer to have a common clean up block at the end of the function followed by a single return statement.
This tends to be a matter of style. Some people only like returning at the end of a procedure, others prefer to do it wherever needed.
I'm a fan of the second method, as it allows for clean and concise code as well as ease of adding documentation on what it's doing.
// Checking for llama integration
if (!1st_ok)
{
return;
}
// Llama found, loading spitting capacity
if (!2nd_ok)
{
return;
}
// Etc.
I prefer the second version.
In the normal case, all code between the checks executes sequentially, so I like to see them at the same level. Normally none of the if branches are executed, so I want them to be as unobtrusive as possible.
I use 2nd because I think It reads better and easier to follow the logic. Also they say exceptions should not be used for flow control, but for the exceptional and unexpected cases. Id like to see what pros say about this.
What about
if (1st_ok && 2nd_ok) { }
or if some work must be done, like in your example with sockets
if (1st_ok() && 2nd_ok()) { }
I avoid the first solution because of nesting.
I avoid the second solution because of corporate coding rules which forbid multiple return in a function body.
Of course coding rules also forbid goto.
My workaround is to use a local variable:
bool isFailed = false; // or whatever is available for bool/true/false
if (!check1) {
log_error();
try_recovery_action();
isFailed = true;
}
if (!isfailed) {
if (!check2) {
log_error();
try_recovery_action();
isFailed = true;
}
}
...
This is not as beautiful as I would like but it is the best I've found to conform to my constraints and to write a readable code.
For what it is worth, here are some of my thoughts and experiences on this question.
Personally, I tend to prefer the second case you outlined. I find it easier to follow (and debug) the code. That is, as the code progresses, it becomes "more correct". In my own experience, this has seemed to be the preferred method.
I don't know how common it is in the field, but I've also seen condition testing written as ...
error = foo1 ();
if ((error == OK) && test1)) {
error = foo2 ();
}
if ((error == OK) && (test2)) {
error = foo3 ();
}
...
return (error);
Although readable (always a plus in my books) and avoiding deep nesting, it always struck me as using a lot of unnecessary testing to achieve those ends.
The first method, I see used less frequently than the second. Of those times, the vast majority of the time was because there was no nice way around it. For the remaining few instances, it was justified on the basis of extracting a little more performance on the success case. The argument was that the processor would predict a forward branch as not taken (corresponding to the else clause). This depended upon several factors including, the architecture, compiler, language, need, .... Obviously most projects (and most aspects of the project) did not meet those requirements.
Hope this helps.

Is it bad practice to use temporary variables to avoid typing?

I sometimes use temporary variables to shorten the identifiers:
private function doSomething() {
$db = $this->currentDatabase;
$db->callMethod1();
$db->callMethod2();
$db->callMethod3();
$db->...
}
Although this is a PHP example, I'm asking in general:
Is this bad practice? Are there any drawbacks?
This example is perfectly fine, since you are using it in functions/methods.
The variable will be unset right after the method/function ends - so there's not much of a memory leak or what.
Also by doing this, you "sort of" implemented DRY - don't repeat yourself.
Why write so many $this->currentDatabase when you can write $db. And what if you have to change $this->currentDatabase to some other values?
Actually, you're not trying to avoid typing (otherwise, you'd use a completion mechanism in your editor), but you're just making your function more readable (by using "abbreviations") which is a good thing.
Drawbacks will show up when you start doing this to avoid typing (and sacrifice readability)
It depends what is the contract on $this->currentDatabase. Can it change at any time, after any method call? If it changes, are you supposed to keep on using the object you did when you made your first db call, or are you supposed to always us the current value? This dictates if you must always use $this->currentDatabase, or if you must always store it in a variable before using.
So, strictly speaking, this is not a style question at all.
But, assuming the member is never changed during function calls such as this, it makes no difference. I'd say storing it in a variable is slightly better, as it is easier to read and avoids a member access on an object at every operation. The compiler may optimize it away if it's good, but in many languages such optimizations are very difficult - and accessing a local variable is almost invariably faster than accessing a member of an object.
In general :
Both $db as $this->currentDatabase point to exactly the same object.
The little space allocated for $db is freed (or elligeable for garbage collection) when the function ends
so I'd say : no, it's not bad practice.
I seem to remember that Steve McConnell recommends against using temporary variables in "Code Complete". At the risk of committing heresy, I have to disagree. I prefer the additional readability introduced. I also find myself adding them to aid single-step debugging, then seeing no reason to remove them.
I don't think there is a performance penalty if you use the original variable instead of skipping the first dereference ($this->currentDatabase).
However, as readability is much improved using the abbreviation, go for it!
Of course it also will depend on your team's coding conventions.
If you do this carefully it is absolutely fine. As long as you only use a few of this variables in a small amount of code and inside of small functions I think this is ok.
If you have a lot of this variables and they are badly named like i,j,l and f in the same function the understandability of your code will suffer. If this is the case I would rather type a little bit more then have not understandable code. This is one reason a good IDE has automatic code completion.
No, I think, this is ok. Often performance if not as critical as clean readable code.
Also, you are trading memory a small allocation hit on the stack for faster method calls by avoiding extra dereferencing.
A getter will solve your problem:
private function doSomething() {
getDB()->callMethod1();
getDB()->callMethod2();
getDB()->callMethod3();
}
by clean code N.

What's the best name for a non-mutating "add" method on an immutable collection? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 11 months ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Sorry for the waffly title - if I could come up with a concise title, I wouldn't have to ask the question.
Suppose I have an immutable list type. It has an operation Foo(x) which returns a new immutable list with the specified argument as an extra element at the end. So to build up a list of strings with values "Hello", "immutable", "world" you could write:
var empty = new ImmutableList<string>();
var list1 = empty.Foo("Hello");
var list2 = list1.Foo("immutable");
var list3 = list2.Foo("word");
(This is C# code, and I'm most interested in a C# suggestion if you feel the language is important. It's not fundamentally a language question, but the idioms of the language may be important.)
The important thing is that the existing lists are not altered by Foo - so empty.Count would still return 0.
Another (more idiomatic) way of getting to the end result would be:
var list = new ImmutableList<string>().Foo("Hello")
.Foo("immutable")
.Foo("word");
My question is: what's the best name for Foo?
EDIT 3: As I reveal later on, the name of the type might not actually be ImmutableList<T>, which makes the position clear. Imagine instead that it's TestSuite and that it's immutable because the whole of the framework it's a part of is immutable...
(End of edit 3)
Options I've come up with so far:
Add: common in .NET, but implies mutation of the original list
Cons: I believe this is the normal name in functional languages, but meaningless to those without experience in such languages
Plus: my favourite so far, it doesn't imply mutation to me. Apparently this is also used in Haskell but with slightly different expectations (a Haskell programmer might expect it to add two lists together rather than adding a single value to the other list).
With: consistent with some other immutable conventions, but doesn't have quite the same "additionness" to it IMO.
And: not very descriptive.
Operator overload for + : I really don't like this much; I generally think operators should only be applied to lower level types. I'm willing to be persuaded though!
The criteria I'm using for choosing are:
Gives the correct impression of the result of the method call (i.e. that it's the original list with an extra element)
Makes it as clear as possible that it doesn't mutate the existing list
Sounds reasonable when chained together as in the second example above
Please ask for more details if I'm not making myself clear enough...
EDIT 1: Here's my reasoning for preferring Plus to Add. Consider these two lines of code:
list.Add(foo);
list.Plus(foo);
In my view (and this is a personal thing) the latter is clearly buggy - it's like writing "x + 5;" as a statement on its own. The first line looks like it's okay, until you remember that it's immutable. In fact, the way that the plus operator on its own doesn't mutate its operands is another reason why Plus is my favourite. Without the slight ickiness of operator overloading, it still gives the same connotations, which include (for me) not mutating the operands (or method target in this case).
EDIT 2: Reasons for not liking Add.
Various answers are effectively: "Go with Add. That's what DateTime does, and String has Replace methods etc which don't make the immutability obvious." I agree - there's precedence here. However, I've seen plenty of people call DateTime.Add or String.Replace and expect mutation. There are loads of newsgroup questions (and probably SO ones if I dig around) which are answered by "You're ignoring the return value of String.Replace; strings are immutable, a new string gets returned."
Now, I should reveal a subtlety to the question - the type might not actually be an immutable list, but a different immutable type. In particular, I'm working on a benchmarking framework where you add tests to a suite, and that creates a new suite. It might be obvious that:
var list = new ImmutableList<string>();
list.Add("foo");
isn't going to accomplish anything, but it becomes a lot murkier when you change it to:
var suite = new TestSuite<string, int>();
suite.Add(x => x.Length);
That looks like it should be okay. Whereas this, to me, makes the mistake clearer:
var suite = new TestSuite<string, int>();
suite.Plus(x => x.Length);
That's just begging to be:
var suite = new TestSuite<string, int>().Plus(x => x.Length);
Ideally, I would like my users not to have to be told that the test suite is immutable. I want them to fall into the pit of success. This may not be possible, but I'd like to try.
I apologise for over-simplifying the original question by talking only about an immutable list type. Not all collections are quite as self-descriptive as ImmutableList<T> :)
In situations like that, I usually go with Concat. That usually implies to me that a new object is being created.
var p = listA.Concat(listB);
var k = listA.Concat(item);
I'd go with Cons, for one simple reason: it means exactly what you want it to.
I'm a huge fan of saying exactly what I mean, especially in source code. A newbie will have to look up the definition of Cons only once, but then read and use that a thousand times. I find that, in the long term, it's nicer to work with systems that make the common case easier, even if the up-front cost is a little bit higher.
The fact that it would be "meaningless" to people with no FP experience is actually a big advantage. As you pointed out, all of the other words you found already have some meaning, and that meaning is either slightly different or ambiguous. A new concept should have a new word (or in this case, an old one). I'd rather somebody have to look up the definition of Cons, than to assume incorrectly he knows what Add does.
Other operations borrowed from functional languages often keep their original names, with no apparent catastrophes. I haven't seen any push to come up with synonyms for "map" and "reduce" that sound more familiar to non-FPers, nor do I see any benefit from doing so.
(Full disclosure: I'm a Lisp programmer, so I already know what Cons means.)
Actually I like And, especially in the idiomatic way. I'd especially like it if you had a static readonly property for the Empty list, and perhaps make the constructor private so you always have to build from the empty list.
var list = ImmutableList<string>.Empty.And("Hello")
.And("Immutable")
.And("Word");
Whenever I'm in a jam with nomenclature, I hit up the interwebs.
thesaurus.com returns this for "add":
Definition: adjoin, increase; make
further comment
Synonyms: affix,
annex, ante, append, augment, beef
up, boost, build up, charge up,
continue, cue in, figure in, flesh
out, heat up, hike, hike up, hitch on,
hook on, hook up with, include, jack
up, jazz up, join together, pad,
parlay, piggyback, plug into, pour it
on, reply, run up, say further, slap
on, snowball, soup up, speed up,
spike, step up, supplement, sweeten,
tack on, tag
I like the sound of Adjoin, or more simply Join. That is what you're doing, right? The method could also apply to joining other ImmutableList<>'s.
Personally, I like .With(). If I was using the object, after reading the documentation or the code comments, it would be clear what it does, and it reads ok in the source code.
object.With("My new item as well");
Or, you add "Along" with it.. :)
object.AlongWith("this new item");
I ended up going with Add for all of my Immutable Collections in BclExtras. The reason being is that it's an easy predictable name. I'm not worried so much about people confusing Add with a mutating add since the name of the type is prefixed with Immutable.
For awhile I considered Cons and other functional style names. Eventually I discounted them because they're not nearly as well known. Sure functional programmers will understand but they're not the majority of users.
Other Names: you mentioned:
Plus: I'm wishy/washing on this one. For me this doesn't distinguish it as being a non-mutating operation anymore than Add does
With: Will cause issues with VB (pun intended)
Operator overloading: Discoverability would be an issue
Options I considered:
Concat: String's are Immutable and use this. Unfortunately it's only really good for adding to the end
CopyAdd: Copy what? The source, the list?
AddToNewList: Maybe a good one for List. But what about a Collection, Stack, Queue, etc ...
Unfortunately there doesn't really seem to be a word that is
Definitely an immutable operation
Understandable to the majority of users
Representable in less than 4 words
It gets even more odd when you consider collections other than List. Take for instance Stack. Even first year programmers can tell you that Stacks have a Push/Pop pair of methods. If you create an ImmutableStack and give it a completely different name, lets call it Foo/Fop, you've just added more work for them to use your collection.
Edit: Response to Plus Edit
I see where you're going with Plus. I think a stronger case would actually be Minus for remove. If I saw the following I would certainly wonder what in the world the programmer was thinking
list.Minus(obj);
The biggest problem I have with Plus/Minus or a new pairing is it feels like overkill. The collection itself already has a distinguishing name, the Immutable prefix. Why go further by adding vocabulary whose intent is to add the same distinction as the Immutable prefix already did.
I can see the call site argument. It makes it clearer from the standpoint of a single expression. But in the context of the entire function it seems unnecessary.
Edit 2
Agree that people have definitely been confused by String.Concat and DateTime.Add. I've seen several very bright programmers hit this problem.
However I think ImmutableList is a different argument. There is nothing about String or DateTime that establishes it as Immutable to a programmer. You must simply know that it's immutable via some other source. So the confusion is not unexpected.
ImmutableList does not have that problem because the name defines it's behavior. You could argue that people don't know what Immutable is and I think that's also valid. I certainly didn't know it till about year 2 in college. But you have the same issue with whatever name you choose instead of Add.
Edit 3: What about types like TestSuite which are immutable but do not contain the word?
I think this drives home the idea that you shouldn't be inventing new method names. Namely because there is clearly a drive to make types immutable in order to facilitate parallel operations. If you focus on changing the name of methods for collections, the next step will be the mutating method names on every type you use that is immutable.
I think it would be a more valuable effort to instead focus on making types identifiable as Immutable. That way you can solve the problem without rethinking every mutating method pattern out there.
Now how can you identify TestSuite as Immutable? In todays environment I think there are a few ways
Prefix with Immutable: ImmutableTestSuite
Add an Attribute which describes the level of Immutablitiy. This is certainly less discoverable
Not much else.
My guess/hope is development tools will start helping this problem by making it easy to identify immutable types simply by sight (different color, stronger font, etc ...). But I think that's the answer though over changing all of the method names.
I think this may be one of those rare situations where it's acceptable to overload the + operator. In math terminology, we know that + doesn't append something to the end of something else. It always combines two values together and returns a new resulting value.
For example, it's intuitively obvious that when you say
x = 2 + 2;
the resulting value of x is 4, not 22.
Similarly,
var empty = new ImmutableList<string>();
var list1 = empty + "Hello";
var list2 = list1 + "immutable";
var list3 = list2 + "word";
should make clear what each variable is going to hold. It should be clear that list2 is not changed in the last line, but instead that list3 is assigned the result of appending "word" to list2.
Otherwise, I would just name the function Plus().
To be as clear as possible, you might want to go with the wordier CopyAndAdd, or something similar.
I would call it Extend() or maybe ExtendWith() if you feel like really verbose.
Extends means adding something to something else without changing it. I think this is very relevant terminology in C# since this is similar to the concept of extension methods - they "add" a new method to a class without "touching" the class itself.
Otherwise, if you really want to emphasize that you don't modify the original object at all, using some prefix like Get- looks like unavoidable to me.
Added(), Appended()
I like to use the past tense for operations on immutable objects. It conveys the idea that you aren't changing the original object, and it's easy to recognize when you see it.
Also, because mutating method names are often present-tense verbs, it applies to most of the immutable-method-name-needed cases you run into. For example an immutable stack has the methods "pushed" and "popped".
I like mmyers suggestion of CopyAndAdd. In keeping with a "mutation" theme, maybe you could go with Bud (asexual reproduction), Grow, Replicate, or Evolve? =)
EDIT: To continue with my genetic theme, how about Procreate, implying that a new object is made which is based on the previous one, but with something new added.
This is probably a stretch, but in Ruby there is a commonly used notation for the distinction: add doesn't mutate; add! mutates. If this is an pervasive problem in your project, you could do that too (not necessarily with non-alphabetic characters, but consistently using a notation to indicate mutating/non-mutating methods).
Join seems appropriate.
Maybe the confusion stems from the fact that you want two operations in one. Why not separate them? DSL style:
var list = new ImmutableList<string>("Hello");
var list2 = list.Copy().With("World!");
Copy would return an intermediate object, that's a mutable copy of the original list. With would return a new immutable list.
Update:
But, having an intermediate, mutable collection around is not a good approach. The intermediate object should be contained in the Copy operation:
var list1 = new ImmutableList<string>("Hello");
var list2 = list1.Copy(list => list.Add("World!"));
Now, the Copy operation takes a delegate, which receives a mutable list, so that it can control the copy outcome. It can do much more than appending an element, like removing elements or sorting the list. It can also be used in the ImmutableList constructor to assemble the initial list without intermediary immutable lists.
public ImmutableList<T> Copy(Action<IList<T>> mutate) {
if (mutate == null) return this;
var list = new List<T>(this);
mutate(list);
return new ImmutableList<T>(list);
}
Now there's no possibility of misinterpretation by the users, they will naturally fall into the pit of success.
Yet another update:
If you still don't like the mutable list mention, even now that it's contained, you can design a specification object, that will specify, or script, how the copy operation will transform its list. The usage will be the same:
var list1 = new ImmutableList<string>("Hello");
// rules is a specification object, that takes commands to run in the copied collection
var list2 = list1.Copy(rules => rules.Append("World!"));
Now you can be creative with the rules names and you can only expose the functionality that you want Copy to support, not the entire capabilities of an IList.
For the chaining usage, you can create a reasonable constructor (which will not use chaining, of course):
public ImmutableList(params T[] elements) ...
...
var list = new ImmutableList<string>("Hello", "immutable", "World");
Or use the same delegate in another constructor:
var list = new ImmutableList<string>(rules =>
rules
.Append("Hello")
.Append("immutable")
.Append("World")
);
This assumes that the rules.Append method returns this.
This is what it would look like with your latest example:
var suite = new TestSuite<string, int>(x => x.Length);
var otherSuite = suite.Copy(rules =>
rules
.Append(x => Int32.Parse(x))
.Append(x => x.GetHashCode())
);
A few random thoughts:
ImmutableAdd()
Append()
ImmutableList<T>(ImmutableList<T> originalList, T newItem) Constructor
DateTime in C# uses Add. So why not use the same name? As long the users of your class understand the class is immutable.
I think the key thing you're trying to get at that's hard to express is the nonpermutation, so maybe something with a generative word in it, something like CopyWith() or InstancePlus().
I don't think the English language will let you imply immutability in an unmistakable way while using a verb that means the same thing as "Add". "Plus" almost does it, but people can still make the mistake.
The only way you're going to prevent your users from mistaking the object for something mutable is by making it explicit, either through the name of the object itself or through the name of the method (as with the verbose options like "GetCopyWith" or "CopyAndAdd").
So just go with your favourite, "Plus."
First, an interesting starting point:
http://en.wikipedia.org/wiki/Naming_conventions_(programming) ...In particular, check the "See Also" links at the bottom.
I'm in favor of either Plus or And, effectively equally.
Plus and And are both math-based in etymology. As such, both connote mathematical operation; both yield an expression which reads naturally as expressions which may resolve into a value, which fits with the method having a return value. And bears additional logic connotation, but both words apply intuitively to lists. Add connotes action performed on an object, which conflicts with the method's immutable semantics.
Both are short, which is especially important given the primitiveness of the operation. Simple, frequently-performed operations deserve shorter names.
Expressing immutable semantics is something I prefer to do via context. That is, I'd rather simply imply that this entire block of code has a functional feel; assume everything is immutable. That might just be me, however. I prefer immutability to be the rule; if it's done, it's done a lot in the same place; mutability is the exception.
How about Chain() or Attach()?
I prefer Plus (and Minus). They are easily understandable and map directly to operations involving well known immutable types (the numbers). 2+2 doesn't change the value of 2, it returns a new, equally immutable, value.
Some other possibilities:
Splice()
Graft()
Accrete()
How about mate, mateWith, or coitus, for those who abide. In terms of reproducing mammals are generally considered immutable.
Going to throw Union out there too. Borrowed from SQL.
Apparently I'm the first Obj-C/Cocoa person to answer this question.
NNString *empty = [[NSString alloc] init];
NSString *list1 = [empty stringByAppendingString:#"Hello"];
NSString *list2 = [list1 stringByAppendingString:#"immutable"];
NSString *list3 = [list2 stringByAppendingString:#"word"];
Not going to win any code golf games with this.
I think "Add" or "Plus" sounds fine. The name of the list itself should be enough to convey the list's immutability.
Maybe there are some words which remember me more of making a copy and add stuff to that instead of mutating the instance (like "Concatenate"). But i think having some symmetry for those words for other actions would be good to have too. I don't know of a similar word for "Remove" that i think of the same kind like "Concatenate". "Plus" sounds little strange to me. I wouldn't expect it being used in a non-numerical context. But that could aswell come from my non-english background.
Maybe i would use this scheme
AddToCopy
RemoveFromCopy
InsertIntoCopy
These have their own problems though, when i think about it. One could think they remove something or add something to an argument given. Not sure about it at all. Those words do not play nice in chaining either, i think. Too wordy to type.
Maybe i would just use plain "Add" and friends too. I like how it is used in math
Add 1 to 2 and you get 3
Well, certainly, a 2 remains a 2 and you get a new number. This is about two numbers and not about a list and an element, but i think it has some analogy. In my opinion, add does not necessarily mean you mutate something. I certainly see your point that having a lonely statement containing just an add and not using the returned new object does not look buggy. But I've now also thought some time about that idea of using another name than "add" but i just can't come up with another name, without making me think "hmm, i would need to look at the documentation to know what it is about" because its name differs from what I would expect to be called "add". Just some weird thought about this from litb, not sure it makes sense at all :)
Looking at http://thesaurus.reference.com/browse/add and http://thesaurus.reference.com/browse/plus I found gain and affix but I'm not sure how much they imply non-mutation.
I think that Plus() and Minus() or, alternatively, Including(), Excluding() are reasonable at implying immutable behavior.
However, no naming choice will ever make it perfectly clear to everyone, so I personally believe that a good xml doc comment would go a very long way here. VS throws these right in your face when you write code in the IDE - they're hard to ignore.
Append - because, note that names of the System.String methods suggest that they mutate the instance, but they don't.
Or I quite like AfterAppending:
void test()
{
Bar bar = new Bar();
List list = bar.AfterAppending("foo");
}
list.CopyWith(element)
As does Smalltalk :)
And also list.copyWithout(element) that removes all occurrences of an element, which is most useful when used as list.copyWithout(null) to remove unset elements.
I would go for Add, because I can see the benefit of a better name, but the problem would be to find different names for every other immutable operation which might make the class quite unfamiliar if that makes sense.

Spartan Programming

I really enjoyed Jeff's post on Spartan Programming. I agree that code like that is a joy to read. Unfortunately, I'm not so sure it would necessarily be a joy to work with.
For years I have read about and adhered to the "one-expression-per-line" practice. I have fought the good fight and held my ground when many programming books countered this advice with example code like:
while (bytes = read(...))
{
...
}
while (GetMessage(...))
{
...
}
Recently, I've advocated one expression per line for more practical reasons - debugging and production support. Getting a log file from production that claims a NullPointer exception at "line 65" which reads:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
is frustrating and entirely avoidable. Short of grabbing an expert with the code that can choose the "most likely" object that was null ... this is a real practical pain.
One expression per line also helps out quite a bit while stepping through code. I practice this with the assumption that most modern compilers can optimize away all the superfluous temp objects I've just created ...
I try to be neat - but cluttering my code with explicit objects sure feels laborious at times. It does not generally make the code easier to browse - but it really has come in handy when tracing things down in production or stepping through my or someone else's code.
What style do you advocate and can you rationalize it in a practical sense?
In The Pragmatic Programmer Hunt and Thomas talk about a study they term the Law of Demeter and it focuses on the coupling of functions to modules other than there own. By allowing a function to never reach a 3rd level in it's coupling you significantly reduce the number of errors and increase the maintainability of the code.
So:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
Is close to a felony because we are 4 objects down the rat hole. That means to change something in one of those objects I have to know that you called this whole stack right here in this very method. What a pain.
Better:
Account.getUser();
Note this runs counter to the expressive forms of programming that are now really popular with mocking software. The trade off there is that you have a tightly coupled interface anyway, and the expressive syntax just makes it easier to use.
I think the ideal solution is to find a balance between the extremes. There is no way to write a rule that will fit in all situations; it comes with experience. Declaring each intermediate variable on its own line will make reading the code more difficult, which will also contribute to the difficulty in maintenance. By the same token, debugging is much more difficult if you inline the intermediate values.
The 'sweet spot' is somewhere in the middle.
One expression per line.
There is no reason to obfuscate your code. The extra time you take typing the few extra terms, you save in debug time.
I tend to err on the side of readability, not necessarily debuggability. The examples you gave should definitely be avoided, but I feel that judicious use of multiple expressions can make the code more concise and comprehensible.
I'm usually in the "shorter is better" camp. Your example is good:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
I would cringe if I saw that over four lines instead of one--I don't think it'd make it easier to read or understand. The way you presented it here, it's clear that you're digging for a single object. This isn't better:
obja State = session.getState();
objb Account = State.getAccount();
objc AccountNumber = Account.getAccountNumber();
ObjectA a = getTheUser(AccountNumber);
This is a compromise:
objb Account = session.getState().getAccount();
ObjectA a = getTheUser(Account.getAccountNumber());
but I still prefer the single line expression. Here's an anecdotal reason: it's difficult for me to reread and error-check the 4-liner right now for dumb typos; the single line doesn't have this problem because there are simply fewer characters.
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
This is a bad example, probably because you just wrote something from the top of your head.
You are assigning, to variable named a of type ObjectA, the return value of a function named getTheUser.
So let's assume you wrote this instead:
User u = getTheUser(session.getState().getAccount().getAccountNumber());
I would break this expression like so:
Account acc = session.getState().getAccount();
User user = getTheUser( acc.getAccountNumber() );
My reasoning is: how would I think about what I am doing with this code?
I would probably think: "first I need to get the account from the session and then I get the user using that account's number".
The code should read the way you think. Variables should refer to the main entities involved; not so much to their properties (so I wouldn't store the account number in a variable).
A second factor to have in mind is: will I ever need to refer to this entity again in this context?
If, say, I'm pulling more stuff out of the session state, I would introduce SessionState state = session.getState().
This all seems obvious, but I'm afraid I have some difficulty putting in words why it makes sense, not being a native English speaker and all.
Maintainability, and with it, readability, is king. Luckily, shorter very often means more readable.
Here are a few tips I enjoy using to slice and dice code:
Variable names: how would you describe this variable to someone else on your team? You would not say "the numberOfLinesSoFar integer". You would say "numLines" or something similar - comprehensible and short. Don't pretend like the maintainer doesn't know the code at all, but make sure you yourself could figure out what the variable is, even if you forgot your own act of writing it. Yes, this is kind of obvious, but it's worth more effort than I see many coders put into it, so I list it first.
Control flow: Avoid lots of closing clauses at once (a series of }'s in C++). Usually when you see this, there's a way to avoid it. A common case is something like
:
if (things_are_ok) {
// Do a lot of stuff.
return true;
} else {
ExpressDismay(error_str);
return false;
}
can be replaced by
if (!things_are_ok) return ExpressDismay(error_str);
// Do a lot of stuff.
return true;
if we can get ExpressDismay (or a wrapper thereof) to return false.
Another case is:
Loop iterations: the more standard, the better. For shorter loops, it's good to use one-character iterators when the variable is never used except as an index into a single object.
The particular case I would argue here is against the "right" way to use an STL container:
for (vector<string>::iterator a_str = my_vec.begin(); a_str != my_vec.end(); ++a_str)
is a lot wordier, and requires overloaded pointer operators *a_str or a_str->size() in the loop. For containers that have fast random access, the following is a lot easier to read:
for (int i = 0; i < my_vec.size(); ++i)
with references to my_vec[i] in the loop body, which won't confuse anyone.
Finally, I often see coders take pride in their line number counts. But it's not the line numbers that count! I'm not sure of the best way to implement this, but if you have any influence over your coding culture, I'd try to shift the reward toward those with compact classes :)
Good explanation. I think this is version of the general Divide and Conquer mentality.