should I write more descriptive function names or add comments? - function

This is a language agnostic question, but I'm wandering what people prefer in terms of readability and maintainability... My hypothetical situation is that I'm writing a function which given a sequence will return a copy with all duplicate element removed and the order reversed.
/*
*This is an extremely well written function to return a sequence containing
*all the unique elements of OriginalSequence with their order reversed
*/
ReturnSequence SequenceFunction(OriginalSequence)
{...}
OR
UniqueAndReversedSequence MakeSequenceUniqueAndReversed(OriginalSequence)
{....}
The above is supposed to be a lucid example of using comments in the first instance or using very verbose function names in the second to describe the actions of the function.
Cheers,
Richard

I prefer the verbose function name as it make the call-site more readable. Of course, some function names (like your example) can get really long.
Perhaps a better name for your example function would be ReverseAndDedupe. Uh oh, now it is a little more clear that we have a function with two responsibilities*. Perhaps it would be even better to split this out into two functions: Reverse and Dedupe.
Now the call-site becomes even more readable:
Reverse(Dedupe(someSequence))
*Note: My rule of thumb is that any function that contains "and" in the name has too many responsibilities and needs to be split up in to separate functions.

Personally I prefer the second way - it's easy to see from the function name what it does - and because the code inside the function is well written anyway it'll be easy to work out exactly what happens inside it.
The problem I find with comments is they very quickly go out of date - there's no compile time check to ensure your comment is correct!
Also, you don't get access to the comment in the places where the function is actually called.
Very much a subjective question though!

Ideally you would do a combination of the two. Try to keep your method names concise but descriptive enough to get a good idea of what it's going to do. If there is any possibility of lack of clarity in the method name, you should have comments to assist the reader in the logic.

Even with descriptive names you should still be concise. I think what you have in the example is overkill. I would have written
UniqueSequence Reverse(Sequence)

I comment where there's an explanation in order that a descriptive name cannot adequately convey. If there's a peculiarity with a library that forced me to do something that appears non-standard or value in dropping a comment inline, I'll do that but otherwise I rely upon well-named methods and don't comment things a lot - except while I'm writing the code, and those are for myself. They get removed when it is done, typically.
Generally speaking, function header comments are just more lines to maintain and require the reader to look at both the comment and the code and then decide which is correct if they aren't in correspondence. Obviously the truth is always in the code. The comment may say X but comments don't compile to machine code (typically) so...
Comment when necessary and make a habit of naming things well. That's what I do.

I'd probably do one of these:
Call it ReverseAndDedupe (or DedupeAndReverse, depending which one it is -- I'd expect Dedupe alone to keep the first occurrence and discard later ones, so the two operations do not commute). All functions make some postcondition true, so Make can certainly go in order to shorten a too-long name. Functions don't generally need to be named for the types they operate on, and if they are then it should be in a consistent format. So Sequence can probably be removed from your proposed name too, or if it can't then I'd probably call it Sequence_ReverseAndDedupe.
Not create this function at all, make sure that callers can either do Reverse(Dedupe(x)) or Dedupe(Reverse(x)), depending which they actually want. It's no more code for them to write, so only an issue of whether there's some cunning optimization that only applies when you do both at once. Avoiding an intermediate copy might qualify there, but the general point is that if you can't name your function concisely, make sure there's a good reason why it's doing so many different things.
Call it ReversedAndDeduped if it returns a copy of the original sequence - this is a trick I picked up from Python, where l.sort() sorts the list l in place, and sorted(l) doesn't modify a list l at all.
Give it a name specific to the domain it's used in, rather than trying to make it so generic. Why am I deduping and reversing this list? There might be some term of art that means a list in that state, or some function which can only be performed on such a list. So I could call it 'Renuberate' (because a reversed, deduped list is known as a list "in Renuberated form", or 'MakeFrobbable' (because Frobbing requires this format).
I'd also comment it (or much better, document it), to explain what type of deduping it guarantees (if any - perhaps the implementation is left free to remove whichever dupes it likes so long as it gets them all).
I wouldn't comment it "extremely well written", although I might comment "highly optimized" to mean "this code is really hard to work with, but goes like the clappers, please don't touch it without running all the performance tests".
I don't think I'd want to go as far as 5-word function names, although I expect I have in the past.

Related

Should functions be specific or generic [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Specific functions vs many Arguments vs context dependent
So I've been developing for 3-4 years now, know a wide range of languages, know some impressive (to the small minded :P ) stuff.
But something I've always wondered; when I make a function should it be for a specific purpose, or should it be moulded to be re-usable, even if I have no need for it to be?
E.G:
//JS, but could be any language really
//specific
function HAL(){
alert("I'm afraid I can't let you do that, " + document.getElementById("Name").value + ".");
}
//generic
function HAL(nme){
alert("I'm afraid I can't let you do that, " + nme + ".");
}
//more generic
function HAL(msg, nme){
alert(msg + " " + nme + ".");
}
Yes, very simple example, but conveys the point I want to make. If we take this example, would I ever use it outside of the first? Probably not, so I'd be tempted to make it this way, but then common sense would (now) convince me to make it the second, yet I can't see any benefit of this way, if I know it's not going to be used in any other way, i.e. It's always going to use the input's value (Yes I would put that into a global variable normally).
Is it just a case of whatever I feel makes the most sense at the time, or should I follow the 2nd pattern as best I can?
In that particular case, I would write the first function for now (YAGNI, right?), and probably never need to change it. Then, if it turned out I did need to support alternate names, I'd make the current behavior the default, but allow an optional parameter to specify a name. Likewise with the message.
# In Ruby, but like you say, could be in anything:
// specific
def hal()
puts "I'm afraid I can't let you do that, #{fetch_name}."
end
// genericized refactoring
def hal( name = fetch_name )
puts "I'm afraid I can't let you do that, #{name}."
end
Typically, that's the approach I prefer to take: create functions at whatever is the most convenient degree of specificity for my current needs, but leave the door open for a more generalized approach later.
It helps that I use languages like Ruby that make this easy, but you can take the same approach to some extent even in Java or C. For example, in Java you might make a specific method with no parameters first, and then later refactor to a more generalized method with a "name" parameter and a no-parameter wrapper that filled in the default name.
A rule of thumb is that a function should have minimal side effects.
So, really, it would look something like this:
//By the way - don't call functions nouns. functions are verbs. data are nouns
void HAL(string s)
{
voicetype_t vt = voice.type();
voice.type(VOICE_OF_DOOM);
voice.say(s);
voice.type(vt);
}
A function shouldn't be just a series of statements to call them in some other context. It should be a unit of functionality that you want to abstract. Making a function to be specific is good, but making it context sensitive is bad. What you should do, is to use the generic way(last one) presented in your post, but provide the messages as constants. The language you use has some way to declare constants right?
In your example, I wouldn't make it generic. If a functionality can be used in many cases, make it generic so you can use it all the time without "copy, paste, make minor change, repeat". But telling the user he can't do that and adressing it as [contents of certain input field] is useful for only one case. Plus, the last shot is pointless.
However, I generally prefer my code to be as generic as feasible. Well, as long as the odds are I will need it one day... let's not violate YANGI too hard. But if it can be generic without hassle, why not?
In my opinion, functions should be genericized only to the extent that their purposes need to be. In other words, you should concede to the fact that, although we want to think differently, not everything is reusable, and thus, you shouldn't go out of your way to implement everything to be like that. Programmers should be conscious of the scope (and possibly the future development) of the product, so ultimately one should use their intuition as to how far to take generalizations of functions.
As for your examples, #3 is completely worthless as it only affixes a space between two strings and appends a period at the end--why would someone do this with a special function? I know that's only an example, but if we're talking about how far to generalize a method, something like that's taking it too far--almost to the point where it's just wasted LOC, and that is never something to sacrifice for the sake of generalizing.

How do you work around the need for apostrophes in certain function names?

When I'm programming, I often find myself writing functions that -should- (to be proper english) contain apostrophes (too bad C started everyone thinking that an apostrophe was an appropriate delimiter). For example: get_user's_group() -> get_users_group() . What do you guys do with that forced-bad-english ambiguous english? Just ignore the apostrophe? Create a different phrasing?
In that case, I would do get_group_for_user().
So, yes, I would "create a different phrasing" :)
Either that, or user.get_group().
getGroupForUser()
or
getGroupByUser()
My original answer of Ignore it, move on! is incomplete. You should ignore the fact you can't use ' in your method/function names. But you should continue to look at the naming of them to better explain what they do. I think this is a worthwhile pursuit in programming.
Picking on JavaScript, you could if you wanted to use apostrophes:
const user = {
"get_user's_group": () => console.log("Naming things! Am I right?!")
}
user["get_user's_group"]()
But don't do that 😬
Taking it further, you could if you wanted to, use a transpiler to take your grammatically correct name and transform it into something you never see.
Again with JavaScript as an example, maybe you could write a babel transform.
But don't do that 😛
As others have said, if there is context available from an object, that's a nice option:
user.get_group()
Failing that, the context of the surrounding code should be enough to make this your choice:
get_users_group()
How about getGroupByUser?
Either get_user_ApostropheShouldBeHereButLanguageWillNotLetMe_s_group or just ignore it because it really doesn't matter.
I ignore the apostraphe getGroupyUser and group_from_user are both perfectly understandable. Worrying about having correct grammer in your function names is a waste of time and distracts from the correct goal of having clear and understandable user names.
the point of proper english in function naming is a bit extreme ...
i mean why is the apostrophe bothering you but the _ instead of a space is not ?
Depending on the programming language you may be able to use Unicode variable names, this SO thread lists a few.
With Unicode identifiers you could use one of the unicode apostrophes to give the proper english language formatting to your variable name. Though this only speculative. And it would be hard to maintain. Actually, now that I think about it, it sounds downright evil.
Two points: First, don't use a name that would otherwise require an apostrophe if you can avoid it. Second, you are right in being concerned about ambiguity. For example, you could have:
getUsersGroup: gets the group of a list of users. If you are using an object-oriented language, this could have more information than just a group ID string. You could also have something like createUsersGroup, which would create a group object from a list of users passed in.
getGroupOfUser: takes in some sort of user object; returns the name of the group of the user
getGroupByUserId: takes in the user's name or a unique ID associated with that user; returns the name of the group of the user
The best way to delineate the difference between all of these is to just use standard method comments that explain the method names. This would depend on what language you are working with and what style of method comments your organization conventionally uses.
Normally I just drop the apostrophe, but do back-ticks work? (get_user`s_group)
getGroupOfUser? getUserGroup?
It's a programming language, not literature...
It would be getBackgroundColour in proper English (rather than getBackgroundColor)
Personally I'd write get_user_group() rather than get_group_for_user() since it feels like it reads better to me. Of course, I use a programming language where apostrophes are allowed in names:
proc get_user's_group {id} {#...}
Although, some of the more prolific non-English-native European users use it as a word separator:
proc user'group {id} {#...}
to each his own I guess..

What's the best name for a non-mutating "add" method on an immutable collection? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 11 months ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Sorry for the waffly title - if I could come up with a concise title, I wouldn't have to ask the question.
Suppose I have an immutable list type. It has an operation Foo(x) which returns a new immutable list with the specified argument as an extra element at the end. So to build up a list of strings with values "Hello", "immutable", "world" you could write:
var empty = new ImmutableList<string>();
var list1 = empty.Foo("Hello");
var list2 = list1.Foo("immutable");
var list3 = list2.Foo("word");
(This is C# code, and I'm most interested in a C# suggestion if you feel the language is important. It's not fundamentally a language question, but the idioms of the language may be important.)
The important thing is that the existing lists are not altered by Foo - so empty.Count would still return 0.
Another (more idiomatic) way of getting to the end result would be:
var list = new ImmutableList<string>().Foo("Hello")
.Foo("immutable")
.Foo("word");
My question is: what's the best name for Foo?
EDIT 3: As I reveal later on, the name of the type might not actually be ImmutableList<T>, which makes the position clear. Imagine instead that it's TestSuite and that it's immutable because the whole of the framework it's a part of is immutable...
(End of edit 3)
Options I've come up with so far:
Add: common in .NET, but implies mutation of the original list
Cons: I believe this is the normal name in functional languages, but meaningless to those without experience in such languages
Plus: my favourite so far, it doesn't imply mutation to me. Apparently this is also used in Haskell but with slightly different expectations (a Haskell programmer might expect it to add two lists together rather than adding a single value to the other list).
With: consistent with some other immutable conventions, but doesn't have quite the same "additionness" to it IMO.
And: not very descriptive.
Operator overload for + : I really don't like this much; I generally think operators should only be applied to lower level types. I'm willing to be persuaded though!
The criteria I'm using for choosing are:
Gives the correct impression of the result of the method call (i.e. that it's the original list with an extra element)
Makes it as clear as possible that it doesn't mutate the existing list
Sounds reasonable when chained together as in the second example above
Please ask for more details if I'm not making myself clear enough...
EDIT 1: Here's my reasoning for preferring Plus to Add. Consider these two lines of code:
list.Add(foo);
list.Plus(foo);
In my view (and this is a personal thing) the latter is clearly buggy - it's like writing "x + 5;" as a statement on its own. The first line looks like it's okay, until you remember that it's immutable. In fact, the way that the plus operator on its own doesn't mutate its operands is another reason why Plus is my favourite. Without the slight ickiness of operator overloading, it still gives the same connotations, which include (for me) not mutating the operands (or method target in this case).
EDIT 2: Reasons for not liking Add.
Various answers are effectively: "Go with Add. That's what DateTime does, and String has Replace methods etc which don't make the immutability obvious." I agree - there's precedence here. However, I've seen plenty of people call DateTime.Add or String.Replace and expect mutation. There are loads of newsgroup questions (and probably SO ones if I dig around) which are answered by "You're ignoring the return value of String.Replace; strings are immutable, a new string gets returned."
Now, I should reveal a subtlety to the question - the type might not actually be an immutable list, but a different immutable type. In particular, I'm working on a benchmarking framework where you add tests to a suite, and that creates a new suite. It might be obvious that:
var list = new ImmutableList<string>();
list.Add("foo");
isn't going to accomplish anything, but it becomes a lot murkier when you change it to:
var suite = new TestSuite<string, int>();
suite.Add(x => x.Length);
That looks like it should be okay. Whereas this, to me, makes the mistake clearer:
var suite = new TestSuite<string, int>();
suite.Plus(x => x.Length);
That's just begging to be:
var suite = new TestSuite<string, int>().Plus(x => x.Length);
Ideally, I would like my users not to have to be told that the test suite is immutable. I want them to fall into the pit of success. This may not be possible, but I'd like to try.
I apologise for over-simplifying the original question by talking only about an immutable list type. Not all collections are quite as self-descriptive as ImmutableList<T> :)
In situations like that, I usually go with Concat. That usually implies to me that a new object is being created.
var p = listA.Concat(listB);
var k = listA.Concat(item);
I'd go with Cons, for one simple reason: it means exactly what you want it to.
I'm a huge fan of saying exactly what I mean, especially in source code. A newbie will have to look up the definition of Cons only once, but then read and use that a thousand times. I find that, in the long term, it's nicer to work with systems that make the common case easier, even if the up-front cost is a little bit higher.
The fact that it would be "meaningless" to people with no FP experience is actually a big advantage. As you pointed out, all of the other words you found already have some meaning, and that meaning is either slightly different or ambiguous. A new concept should have a new word (or in this case, an old one). I'd rather somebody have to look up the definition of Cons, than to assume incorrectly he knows what Add does.
Other operations borrowed from functional languages often keep their original names, with no apparent catastrophes. I haven't seen any push to come up with synonyms for "map" and "reduce" that sound more familiar to non-FPers, nor do I see any benefit from doing so.
(Full disclosure: I'm a Lisp programmer, so I already know what Cons means.)
Actually I like And, especially in the idiomatic way. I'd especially like it if you had a static readonly property for the Empty list, and perhaps make the constructor private so you always have to build from the empty list.
var list = ImmutableList<string>.Empty.And("Hello")
.And("Immutable")
.And("Word");
Whenever I'm in a jam with nomenclature, I hit up the interwebs.
thesaurus.com returns this for "add":
Definition: adjoin, increase; make
further comment
Synonyms: affix,
annex, ante, append, augment, beef
up, boost, build up, charge up,
continue, cue in, figure in, flesh
out, heat up, hike, hike up, hitch on,
hook on, hook up with, include, jack
up, jazz up, join together, pad,
parlay, piggyback, plug into, pour it
on, reply, run up, say further, slap
on, snowball, soup up, speed up,
spike, step up, supplement, sweeten,
tack on, tag
I like the sound of Adjoin, or more simply Join. That is what you're doing, right? The method could also apply to joining other ImmutableList<>'s.
Personally, I like .With(). If I was using the object, after reading the documentation or the code comments, it would be clear what it does, and it reads ok in the source code.
object.With("My new item as well");
Or, you add "Along" with it.. :)
object.AlongWith("this new item");
I ended up going with Add for all of my Immutable Collections in BclExtras. The reason being is that it's an easy predictable name. I'm not worried so much about people confusing Add with a mutating add since the name of the type is prefixed with Immutable.
For awhile I considered Cons and other functional style names. Eventually I discounted them because they're not nearly as well known. Sure functional programmers will understand but they're not the majority of users.
Other Names: you mentioned:
Plus: I'm wishy/washing on this one. For me this doesn't distinguish it as being a non-mutating operation anymore than Add does
With: Will cause issues with VB (pun intended)
Operator overloading: Discoverability would be an issue
Options I considered:
Concat: String's are Immutable and use this. Unfortunately it's only really good for adding to the end
CopyAdd: Copy what? The source, the list?
AddToNewList: Maybe a good one for List. But what about a Collection, Stack, Queue, etc ...
Unfortunately there doesn't really seem to be a word that is
Definitely an immutable operation
Understandable to the majority of users
Representable in less than 4 words
It gets even more odd when you consider collections other than List. Take for instance Stack. Even first year programmers can tell you that Stacks have a Push/Pop pair of methods. If you create an ImmutableStack and give it a completely different name, lets call it Foo/Fop, you've just added more work for them to use your collection.
Edit: Response to Plus Edit
I see where you're going with Plus. I think a stronger case would actually be Minus for remove. If I saw the following I would certainly wonder what in the world the programmer was thinking
list.Minus(obj);
The biggest problem I have with Plus/Minus or a new pairing is it feels like overkill. The collection itself already has a distinguishing name, the Immutable prefix. Why go further by adding vocabulary whose intent is to add the same distinction as the Immutable prefix already did.
I can see the call site argument. It makes it clearer from the standpoint of a single expression. But in the context of the entire function it seems unnecessary.
Edit 2
Agree that people have definitely been confused by String.Concat and DateTime.Add. I've seen several very bright programmers hit this problem.
However I think ImmutableList is a different argument. There is nothing about String or DateTime that establishes it as Immutable to a programmer. You must simply know that it's immutable via some other source. So the confusion is not unexpected.
ImmutableList does not have that problem because the name defines it's behavior. You could argue that people don't know what Immutable is and I think that's also valid. I certainly didn't know it till about year 2 in college. But you have the same issue with whatever name you choose instead of Add.
Edit 3: What about types like TestSuite which are immutable but do not contain the word?
I think this drives home the idea that you shouldn't be inventing new method names. Namely because there is clearly a drive to make types immutable in order to facilitate parallel operations. If you focus on changing the name of methods for collections, the next step will be the mutating method names on every type you use that is immutable.
I think it would be a more valuable effort to instead focus on making types identifiable as Immutable. That way you can solve the problem without rethinking every mutating method pattern out there.
Now how can you identify TestSuite as Immutable? In todays environment I think there are a few ways
Prefix with Immutable: ImmutableTestSuite
Add an Attribute which describes the level of Immutablitiy. This is certainly less discoverable
Not much else.
My guess/hope is development tools will start helping this problem by making it easy to identify immutable types simply by sight (different color, stronger font, etc ...). But I think that's the answer though over changing all of the method names.
I think this may be one of those rare situations where it's acceptable to overload the + operator. In math terminology, we know that + doesn't append something to the end of something else. It always combines two values together and returns a new resulting value.
For example, it's intuitively obvious that when you say
x = 2 + 2;
the resulting value of x is 4, not 22.
Similarly,
var empty = new ImmutableList<string>();
var list1 = empty + "Hello";
var list2 = list1 + "immutable";
var list3 = list2 + "word";
should make clear what each variable is going to hold. It should be clear that list2 is not changed in the last line, but instead that list3 is assigned the result of appending "word" to list2.
Otherwise, I would just name the function Plus().
To be as clear as possible, you might want to go with the wordier CopyAndAdd, or something similar.
I would call it Extend() or maybe ExtendWith() if you feel like really verbose.
Extends means adding something to something else without changing it. I think this is very relevant terminology in C# since this is similar to the concept of extension methods - they "add" a new method to a class without "touching" the class itself.
Otherwise, if you really want to emphasize that you don't modify the original object at all, using some prefix like Get- looks like unavoidable to me.
Added(), Appended()
I like to use the past tense for operations on immutable objects. It conveys the idea that you aren't changing the original object, and it's easy to recognize when you see it.
Also, because mutating method names are often present-tense verbs, it applies to most of the immutable-method-name-needed cases you run into. For example an immutable stack has the methods "pushed" and "popped".
I like mmyers suggestion of CopyAndAdd. In keeping with a "mutation" theme, maybe you could go with Bud (asexual reproduction), Grow, Replicate, or Evolve? =)
EDIT: To continue with my genetic theme, how about Procreate, implying that a new object is made which is based on the previous one, but with something new added.
This is probably a stretch, but in Ruby there is a commonly used notation for the distinction: add doesn't mutate; add! mutates. If this is an pervasive problem in your project, you could do that too (not necessarily with non-alphabetic characters, but consistently using a notation to indicate mutating/non-mutating methods).
Join seems appropriate.
Maybe the confusion stems from the fact that you want two operations in one. Why not separate them? DSL style:
var list = new ImmutableList<string>("Hello");
var list2 = list.Copy().With("World!");
Copy would return an intermediate object, that's a mutable copy of the original list. With would return a new immutable list.
Update:
But, having an intermediate, mutable collection around is not a good approach. The intermediate object should be contained in the Copy operation:
var list1 = new ImmutableList<string>("Hello");
var list2 = list1.Copy(list => list.Add("World!"));
Now, the Copy operation takes a delegate, which receives a mutable list, so that it can control the copy outcome. It can do much more than appending an element, like removing elements or sorting the list. It can also be used in the ImmutableList constructor to assemble the initial list without intermediary immutable lists.
public ImmutableList<T> Copy(Action<IList<T>> mutate) {
if (mutate == null) return this;
var list = new List<T>(this);
mutate(list);
return new ImmutableList<T>(list);
}
Now there's no possibility of misinterpretation by the users, they will naturally fall into the pit of success.
Yet another update:
If you still don't like the mutable list mention, even now that it's contained, you can design a specification object, that will specify, or script, how the copy operation will transform its list. The usage will be the same:
var list1 = new ImmutableList<string>("Hello");
// rules is a specification object, that takes commands to run in the copied collection
var list2 = list1.Copy(rules => rules.Append("World!"));
Now you can be creative with the rules names and you can only expose the functionality that you want Copy to support, not the entire capabilities of an IList.
For the chaining usage, you can create a reasonable constructor (which will not use chaining, of course):
public ImmutableList(params T[] elements) ...
...
var list = new ImmutableList<string>("Hello", "immutable", "World");
Or use the same delegate in another constructor:
var list = new ImmutableList<string>(rules =>
rules
.Append("Hello")
.Append("immutable")
.Append("World")
);
This assumes that the rules.Append method returns this.
This is what it would look like with your latest example:
var suite = new TestSuite<string, int>(x => x.Length);
var otherSuite = suite.Copy(rules =>
rules
.Append(x => Int32.Parse(x))
.Append(x => x.GetHashCode())
);
A few random thoughts:
ImmutableAdd()
Append()
ImmutableList<T>(ImmutableList<T> originalList, T newItem) Constructor
DateTime in C# uses Add. So why not use the same name? As long the users of your class understand the class is immutable.
I think the key thing you're trying to get at that's hard to express is the nonpermutation, so maybe something with a generative word in it, something like CopyWith() or InstancePlus().
I don't think the English language will let you imply immutability in an unmistakable way while using a verb that means the same thing as "Add". "Plus" almost does it, but people can still make the mistake.
The only way you're going to prevent your users from mistaking the object for something mutable is by making it explicit, either through the name of the object itself or through the name of the method (as with the verbose options like "GetCopyWith" or "CopyAndAdd").
So just go with your favourite, "Plus."
First, an interesting starting point:
http://en.wikipedia.org/wiki/Naming_conventions_(programming) ...In particular, check the "See Also" links at the bottom.
I'm in favor of either Plus or And, effectively equally.
Plus and And are both math-based in etymology. As such, both connote mathematical operation; both yield an expression which reads naturally as expressions which may resolve into a value, which fits with the method having a return value. And bears additional logic connotation, but both words apply intuitively to lists. Add connotes action performed on an object, which conflicts with the method's immutable semantics.
Both are short, which is especially important given the primitiveness of the operation. Simple, frequently-performed operations deserve shorter names.
Expressing immutable semantics is something I prefer to do via context. That is, I'd rather simply imply that this entire block of code has a functional feel; assume everything is immutable. That might just be me, however. I prefer immutability to be the rule; if it's done, it's done a lot in the same place; mutability is the exception.
How about Chain() or Attach()?
I prefer Plus (and Minus). They are easily understandable and map directly to operations involving well known immutable types (the numbers). 2+2 doesn't change the value of 2, it returns a new, equally immutable, value.
Some other possibilities:
Splice()
Graft()
Accrete()
How about mate, mateWith, or coitus, for those who abide. In terms of reproducing mammals are generally considered immutable.
Going to throw Union out there too. Borrowed from SQL.
Apparently I'm the first Obj-C/Cocoa person to answer this question.
NNString *empty = [[NSString alloc] init];
NSString *list1 = [empty stringByAppendingString:#"Hello"];
NSString *list2 = [list1 stringByAppendingString:#"immutable"];
NSString *list3 = [list2 stringByAppendingString:#"word"];
Not going to win any code golf games with this.
I think "Add" or "Plus" sounds fine. The name of the list itself should be enough to convey the list's immutability.
Maybe there are some words which remember me more of making a copy and add stuff to that instead of mutating the instance (like "Concatenate"). But i think having some symmetry for those words for other actions would be good to have too. I don't know of a similar word for "Remove" that i think of the same kind like "Concatenate". "Plus" sounds little strange to me. I wouldn't expect it being used in a non-numerical context. But that could aswell come from my non-english background.
Maybe i would use this scheme
AddToCopy
RemoveFromCopy
InsertIntoCopy
These have their own problems though, when i think about it. One could think they remove something or add something to an argument given. Not sure about it at all. Those words do not play nice in chaining either, i think. Too wordy to type.
Maybe i would just use plain "Add" and friends too. I like how it is used in math
Add 1 to 2 and you get 3
Well, certainly, a 2 remains a 2 and you get a new number. This is about two numbers and not about a list and an element, but i think it has some analogy. In my opinion, add does not necessarily mean you mutate something. I certainly see your point that having a lonely statement containing just an add and not using the returned new object does not look buggy. But I've now also thought some time about that idea of using another name than "add" but i just can't come up with another name, without making me think "hmm, i would need to look at the documentation to know what it is about" because its name differs from what I would expect to be called "add". Just some weird thought about this from litb, not sure it makes sense at all :)
Looking at http://thesaurus.reference.com/browse/add and http://thesaurus.reference.com/browse/plus I found gain and affix but I'm not sure how much they imply non-mutation.
I think that Plus() and Minus() or, alternatively, Including(), Excluding() are reasonable at implying immutable behavior.
However, no naming choice will ever make it perfectly clear to everyone, so I personally believe that a good xml doc comment would go a very long way here. VS throws these right in your face when you write code in the IDE - they're hard to ignore.
Append - because, note that names of the System.String methods suggest that they mutate the instance, but they don't.
Or I quite like AfterAppending:
void test()
{
Bar bar = new Bar();
List list = bar.AfterAppending("foo");
}
list.CopyWith(element)
As does Smalltalk :)
And also list.copyWithout(element) that removes all occurrences of an element, which is most useful when used as list.copyWithout(null) to remove unset elements.
I would go for Add, because I can see the benefit of a better name, but the problem would be to find different names for every other immutable operation which might make the class quite unfamiliar if that makes sense.

Is hard-coding literals ever acceptable?

The code base I'm currently working on is littered with hard-coded values.
I view all hard coded values as a code smell and I try to eliminate them where possible...however there are some cases that I am unsure about.
Here are two examples that I can think of that make me wonder what the best practice is:
1. MyTextBox.Text = someCondition ? "Yes" : "No"
2. double myPercentage = myValue / 100;
In the first case, is the best thing to do to create a class that allows me to do MyHelper.Yes and MyHelper.No or perhaps something similar in a config file (though it isn't likely to change and who knows if there might ever be a case where its usage would be case sensitive).
In the second case, finding a percentage by dividing by 100 isn't likely to ever change unless the laws of mathematics change...but I still wonder if there is a better way.
Can anyone suggest an appropriate way to deal with this sort of hard coding? And can anyone think of any places where hard coding is an acceptable practice?
And can anyone think of any places where hard coding is an acceptable practice?
Small apps
Single man projects
Throw aways
Short living projects
For short anything that won't be maintained by others.
Gee I've just realized how much being maintainer coder hurt me in the past :)
The real question isn't about hard coding, but rather repetition. If you take the excellent advice found in "The Pragmatic Programmer", simply Don't Repeat Yourself (DRY).
Taking the principle of DRY, it is fine to hardcode something at any point. However, once you use that particular value again, refactor so this value is only hardcoded once.
Of course hard-coding is sometimes acceptable. Following dogma is rarely as useful a practice as using your brain.
(For an example of this, perhaps it's interesting to go back to the goto wars. How many programmers do you know that will swear by all things holy that goto is evil? Why then does Steve McConnell devote a dozen pages to a measured discussion of the subject in Code Complete?)
Sure, there's a lot of hard-gained experience that tells us that small throw-away applications often mutate into production code, but that's no reason for zealotry. The agilists tell us we should do the simplest thing that could possibly work and refactor when needed.
That's not to say that the "simplest thing" shouldn't be readable code. It may make perfect sense, even in a throw-away spike to write:
const MAX_CACHE_RECORDS = 50
foo = GetNewCache(MAX_CACHE_RECORDS)
This is regardless of the fact that in three iterations time, someone might ask for the number of cache records to be configurable, and you might end up refactoring the constant away.
Just remember, if you go to the extremes of stuff like
const ONE_HUNDRED = 100
const ONE_HUNDRED_AND_ONE = 101
we'll all come to The Daily WTF and laugh at you. :-)
Think! That's all.
It's never good and you just proved it...
double myPercentage = myValue / 100;
This is NOT percentage. What you wanted to write is :
double myPercentage = (myValue / 100) * 100;
Or more correctly :
double myPercentage = (myValue / myMaxValue) * 100;
But this hard coded 100 messed with your mind... So go for the getPercentage method that Colen suggested :)
double getpercentage(double myValue, double maxValue)
{
return (myValue / maxValue) * 100;
}
Also as ctacke suggested, in the first case you will be in a world of pain if you ever need to localize these literals. It's never too much trouble to add a couple more variables and/or functions
The first case will kill you if you ever need to localize. Moving it to some static or constant that is app-wide would at least make localizing it a little easier.
Case 1: When should you hard-code stuff: when you have no reason to think that it will ever change. That said, you should NEVER hard code stuff in-line. Take the time to make static variables or global variables or whatever your language gives you. Do them in the class in question, and if you notice that two classes or areas of your code share the same value FOR THE SAME REASON (meaning it's not just coincidence), point them to the same place.
Case 2: For case case 2, you're correct: the laws of "percentage" will not change (being reasonable, here), so you can hard code inline.
Case 3: The third case is where you think the thing could change but you don't want to/have time to bother loading ResourceBundles or XML or whatever. In that case, you use whatever centralizing mechanism you can -- the hated Singleton class is a good one -- and go with that until you actually have need to deal with the problem.
The third case is tricky, though: it's extraordinarily hard to internationalize an application without really doing it... so you will want to hard-code stuff and just hope that, when the i18n guys come knocking, your code is not the worst-tasting code around :)
Edit: Let me mention that I've just finished a refactoring project in which the prior developer had placed the MySql connect strings in 100+ places in the code (PHP). Sometimes they were uppercase, sometimes they were lower case, etc., so they were hard to search and replace (though Netbeans and PDT did help a lot). There are reasons why he/she did this (a project called POG basically forces this stupidity), but there is just nothing that seems less like good code than repeating the same thing in a million places.
The better way for your second example would be to define an inline function:
double getpercentage(double myValue)
{
return(myValue / 100);
}
...
double myPercentage = getpercentage(myValue);
That way it's a lot more obvious what you're doing.
Hardcoded literals should appear in unit tests for the test values, unless there is so much reuse of a value within a single test class that a local constant is useful.
The unit tests are a description of expected values without any abstraction or redirection.
Imagine yourself reading the test - you want the information literally in front of you.
The only time I use constants for test values is when many tests repeat a value (itself a bit suspicious) and the value may be subject to change.
I do use constants for things like names of test files to compare.
I don't think that your second is really an example of hardcoding. That's like having a Halve() method that takes in a value to use to divide by; doesn't make sense.
Beyond that, example 1, if you want to change the language for your app, you don't want to have to change the class, so it should absolutely be in a config.
Hard coding should be avoided like Dracula avoids the sun. It'll come back to bite you in the ass eventually.
"hardcoding" is the wrong thing to worry about. The point is not whether special values are in code or in config files, the point is:
If the value could ever change, how much work is that and how hard is it to find? Putting it in one place and referring to that place elsewhere is not much work and therefore a way to play it safe.
Will maintainance programmers definitely understand why the value is what it is? If there is any doubt whatsoever, use a named constant that explains the meaning.
Both of these goals can be achieved without any need for config files; in fact I'd avoid those if possible. "putting stuff in config files means it's easier to change" is a myth, unless either
you actually want to support customers changing the values themselves
no value that could possibly be put in the config file can cause a bug (buffer overflow, anyone?)
your build and deployment process sucks
The text for the conditions should be in a resource file; that's what it's there for.
Not normally (Are hard-coding literals acceptable)
Another way at looking at this is how using a good naming convention
for constants used in-place of hard coded literals provides additional
documentation in the program.
Even if the number is used only once, it can still be hard to recognized
and may even be hard to find for future changes.
IMHO, making programs easier to read should be second nature to a
seasoned software professional. Raw numbers rarely communicate
meaningfully.
The extra time taken to use a well named constant will make the
code readability (easy to recall to the mind) and useful for future
re-mining (code re-use).
I tend to view it in terms of the project's scope and size.
Some simple projects that I am a solo dev on? Sure, I hard code lots of things. Tools I write that only I will ever use? Sure, if it gets the job done.
But, in working on larger, team projects? I agree, they are suspect and usually the product of laziness. Tag them for review and see if you can spot a pattern where they can be abstracted away.
In your example, the text box should be localizable, so why not a class that handles that?
Remember that you WILL forget the meaning of any non-obvious hard-coded value.
So be certain to put a short comment after each to remind you.
A Delphi example:
Length := Length * 0.3048; { 0.3048 converts feet to meters }
no.
What is a simple throw away app today will be driving your entire enterprise tomorrow. Always use best practices or you'll regret it.
Code always evolves. When you initially write stuff hard coding is the easiest way to go. Later when a need arrives to change the value it can be improved. In some cases the need never comes.
The need can arrive in many forms:
The value is used in many places and it needs to be changed by a programmer. In this case a constant is clearly needed.
User needs to be able to change the value.
I don't see the need to avoid hard coding. I do see the need to change things when there is a clear need.
Totally separate issue is that of course the code needs to be readable and this means that there might be a need for a comment for the hard coded value.
For the first value, it really depends. If you don't anticipate any kind of wide-spread adoption of your application and internationalization will never be an issue, I think it's mostly fine. However, if you are writing some kind of open source software or something with a larger audience consider the fact that it may one day need to be translated. In that case, you may be better off using string resources.
It's okay as long as you don't do refactoring, unit-testing, peer code reviews. And, you don't want repeat customers. Who cares?
I once had a boss who refused to not hardcode something because in his mind it gave him full control over the software and the items related to the software. Problem was, when the hardware died that ran the software the server got renamed... meaning he had to find his code. That took a while. I simply found a hex editor and hacked around it instead of waiting.
I normally add a set of helper methods for strings and numbers.
For example when I have strings such as 'yes' and 'no' I have a function called __ so I call __('yes'); which starts out in the project by just returning the first parameter but when I need to do more complex stuff (such as internationaizaton) it's already there and the param can be used a key.
Another example is VAT (form of UK tax) in online shops, recently it changed from 17.5% to 15%. Any one who hard coded VAT by doing:
$vat = $price * 0.175;
had to then go through all references and change it to 0.15, instead the super usefull way of doing it would be to have a function or variable for VAT.
In my opinion anything that could change should be written in a changeable way. If I find myself doing the same thing more than 5 times in the same day then it becomes a function or a config var.
Hard coding should be banned forever. Althought in you very simple examples i don't see anything wrong using them in any kind of project.
In my opinion hard coding is when you believe that a variable/value/define etc. will never change and create all your code based on that belief.
Example of such hard coding is the book Teach Yourself C in 24 Hours that everybody should avoid.

Spartan Programming

I really enjoyed Jeff's post on Spartan Programming. I agree that code like that is a joy to read. Unfortunately, I'm not so sure it would necessarily be a joy to work with.
For years I have read about and adhered to the "one-expression-per-line" practice. I have fought the good fight and held my ground when many programming books countered this advice with example code like:
while (bytes = read(...))
{
...
}
while (GetMessage(...))
{
...
}
Recently, I've advocated one expression per line for more practical reasons - debugging and production support. Getting a log file from production that claims a NullPointer exception at "line 65" which reads:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
is frustrating and entirely avoidable. Short of grabbing an expert with the code that can choose the "most likely" object that was null ... this is a real practical pain.
One expression per line also helps out quite a bit while stepping through code. I practice this with the assumption that most modern compilers can optimize away all the superfluous temp objects I've just created ...
I try to be neat - but cluttering my code with explicit objects sure feels laborious at times. It does not generally make the code easier to browse - but it really has come in handy when tracing things down in production or stepping through my or someone else's code.
What style do you advocate and can you rationalize it in a practical sense?
In The Pragmatic Programmer Hunt and Thomas talk about a study they term the Law of Demeter and it focuses on the coupling of functions to modules other than there own. By allowing a function to never reach a 3rd level in it's coupling you significantly reduce the number of errors and increase the maintainability of the code.
So:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
Is close to a felony because we are 4 objects down the rat hole. That means to change something in one of those objects I have to know that you called this whole stack right here in this very method. What a pain.
Better:
Account.getUser();
Note this runs counter to the expressive forms of programming that are now really popular with mocking software. The trade off there is that you have a tightly coupled interface anyway, and the expressive syntax just makes it easier to use.
I think the ideal solution is to find a balance between the extremes. There is no way to write a rule that will fit in all situations; it comes with experience. Declaring each intermediate variable on its own line will make reading the code more difficult, which will also contribute to the difficulty in maintenance. By the same token, debugging is much more difficult if you inline the intermediate values.
The 'sweet spot' is somewhere in the middle.
One expression per line.
There is no reason to obfuscate your code. The extra time you take typing the few extra terms, you save in debug time.
I tend to err on the side of readability, not necessarily debuggability. The examples you gave should definitely be avoided, but I feel that judicious use of multiple expressions can make the code more concise and comprehensible.
I'm usually in the "shorter is better" camp. Your example is good:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
I would cringe if I saw that over four lines instead of one--I don't think it'd make it easier to read or understand. The way you presented it here, it's clear that you're digging for a single object. This isn't better:
obja State = session.getState();
objb Account = State.getAccount();
objc AccountNumber = Account.getAccountNumber();
ObjectA a = getTheUser(AccountNumber);
This is a compromise:
objb Account = session.getState().getAccount();
ObjectA a = getTheUser(Account.getAccountNumber());
but I still prefer the single line expression. Here's an anecdotal reason: it's difficult for me to reread and error-check the 4-liner right now for dumb typos; the single line doesn't have this problem because there are simply fewer characters.
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
This is a bad example, probably because you just wrote something from the top of your head.
You are assigning, to variable named a of type ObjectA, the return value of a function named getTheUser.
So let's assume you wrote this instead:
User u = getTheUser(session.getState().getAccount().getAccountNumber());
I would break this expression like so:
Account acc = session.getState().getAccount();
User user = getTheUser( acc.getAccountNumber() );
My reasoning is: how would I think about what I am doing with this code?
I would probably think: "first I need to get the account from the session and then I get the user using that account's number".
The code should read the way you think. Variables should refer to the main entities involved; not so much to their properties (so I wouldn't store the account number in a variable).
A second factor to have in mind is: will I ever need to refer to this entity again in this context?
If, say, I'm pulling more stuff out of the session state, I would introduce SessionState state = session.getState().
This all seems obvious, but I'm afraid I have some difficulty putting in words why it makes sense, not being a native English speaker and all.
Maintainability, and with it, readability, is king. Luckily, shorter very often means more readable.
Here are a few tips I enjoy using to slice and dice code:
Variable names: how would you describe this variable to someone else on your team? You would not say "the numberOfLinesSoFar integer". You would say "numLines" or something similar - comprehensible and short. Don't pretend like the maintainer doesn't know the code at all, but make sure you yourself could figure out what the variable is, even if you forgot your own act of writing it. Yes, this is kind of obvious, but it's worth more effort than I see many coders put into it, so I list it first.
Control flow: Avoid lots of closing clauses at once (a series of }'s in C++). Usually when you see this, there's a way to avoid it. A common case is something like
:
if (things_are_ok) {
// Do a lot of stuff.
return true;
} else {
ExpressDismay(error_str);
return false;
}
can be replaced by
if (!things_are_ok) return ExpressDismay(error_str);
// Do a lot of stuff.
return true;
if we can get ExpressDismay (or a wrapper thereof) to return false.
Another case is:
Loop iterations: the more standard, the better. For shorter loops, it's good to use one-character iterators when the variable is never used except as an index into a single object.
The particular case I would argue here is against the "right" way to use an STL container:
for (vector<string>::iterator a_str = my_vec.begin(); a_str != my_vec.end(); ++a_str)
is a lot wordier, and requires overloaded pointer operators *a_str or a_str->size() in the loop. For containers that have fast random access, the following is a lot easier to read:
for (int i = 0; i < my_vec.size(); ++i)
with references to my_vec[i] in the loop body, which won't confuse anyone.
Finally, I often see coders take pride in their line number counts. But it's not the line numbers that count! I'm not sure of the best way to implement this, but if you have any influence over your coding culture, I'd try to shift the reward toward those with compact classes :)
Good explanation. I think this is version of the general Divide and Conquer mentality.