How should substring() work? - language-agnostic

I do not understand why Java's [String.substring() method](http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#substring(int,%20int%29) is specified the way it is. I can't tell it to start at a numbered-position and return a specified number of characters; I have to compute the end position myself. And if I specify an end position beyond the end of the String, instead of just returning the rest of the String for me, Java throws an Exception.
I'm used to languages where substring() (or substr()) takes two parameters: a start position, and a length. Is this objectively better than the way Java does it, and if so, can you prove it? What's the best language specification for substring() that you have seen, and when if ever would it be a good idea for a language to do things differently? Is that IndexOutOfBoundsException that Java throws a good design idea, or not? Does all this just come down to personal preference?

There are times when the second parameter being a length is more convenient, and there are times when the second parameter being the "offset to stop before" is more convenient. Likewise there are times when "if I give you something that's too big, just go to the end of the string" is convenient, and there are times when it indicates a bug and should really throw an exception.
The second parameter being a length is useful if you've got a fixed length of field. For instance:
// C#
String guid = fullString.Substring(offset, 36);
The second parameter being an offset is useful if you're going up to another delimited:
// Java
int nextColon = fullString.indexOf(':', start);
if (start == -1)
{
// Handle error
}
else
{
String value = fullString.substring(start, nextColon);
}
Typically, the one you want to use is the opposite to the one that's provided on your current platform, in my experience :)

I'm used to languages where
substring() (or substr()) takes two
parameters: a start position, and a
length. Is this objectively better
than the way Java does it, and if so,
can you prove it?
No, it's not objectively better. It all depends on the context in which you want to use it. If you want to extract a substring of a specific length, it's bad, but if you want to extract a substring that ends at, say, the first occurrence of "." in the string, it's better than if you first had to compute a length. The question is: which requirement is more common? I'd say the latter. Of course, the best solution would be to have both versions in the API, but if you need the length-based one all the time, using a static utility method isn't that horrible.
As for the exception, yeah, that's definitely good design. You asked for something specific, and when you can't get that specific thing, the API should not try to guess what you might have wanted instead - that way, bugs become apparent more quickly.
Also, Java DOES have an alternative substring() method that returns the substring from a start index until the end of the string.

second parameter should be optional, first parameter should accept negative values..

If you leave off the 2nd parameter it will go to the end of the string for you without you having to compute it.

Having gotten some feedback, I see when the second-parameter-as-index scenario is useful, but so far all of those scenarios seem to be working around other language/API limitations. For example, the API doesn't provide a convenient routine to give me the Strings before and after the first colon in the input String, so instead I get that String's index and call substring(). (And this explains why the second position parameter in substr() overshoots the desired index by 1, IMO.)
It seems to me that with a more comprehensive set of string-processing functions in the language's toolkit, the second-parameter-as-index scenario loses out to second-parameter-as-length. But somebody please post me a counterexample. :)

If you store this away, the problem should stop plaguing your dreams and you'll finally achieve a good night's rest:
public String skipsSubstring(String s, int index, int length) {
return s.subString(index, index+length);
}

Related

What's a good naming convention for methods that take conditional action?

Let's say I have a method, Foo(). There are only certain times when Foo() is appropriate, as determined by the method ShouldFooNow(). However, there are many times when the program must consider if Foo() is appropriate at this time. So instead of writing:
if ShouldFooNow():
Foo()
everywhere, I just make that into one function:
def __name():
if ShouldFooNow():
Foo()
What would be a good name for this method? I'm having a hard time coming up with a good convention. IfNecessaryFoo() is awkward, particularly if Foo() has a longer name. DoFooIfShould()? Even more awkward.
What would be a better name style?
I think you're pretty close. Put the action/intent at the head of the method name, for easier alphabetic searching. If I were writing something like that, I'd consider
FooIfNecessary()
FooIfRequired()
Say, for instance,
ElevatePermissionsIfNecessary()
You could use EnsureFoo().
For example, the method EnsurePermissions() will take the appropriate action if needed. If the permissions are already correct, the method won't do anything.
I've recently started using the convention:
FooIf(args, bool);
Where args are any arguments that the method takes and bool is either expecting a boolean value or a Func of some kind that resolves to a boolean. Then, within that method, I check the bool and run the logic. Keeps such assertions down to one line and looks clean to me.
Example in my C# code for logging:
public void WarnIf<T>(T value, string message, Func<T, bool> isTrue)
{
if (isTrue(value)) _log.Warn(message);
}
Then I would call it with something like:
WarnIf(someObject, "This is a warning message to be logged.", s => s.SomeCondition == true);
(That caller may not be correct, but you get the idea... I don't have the code in front of me right now.)
Michael Petrotta's answer (IfNecessary or IfRequired suffix) is good, but I prefer a shorter alternative: IfNeeded.
ElevatePermissionsIfNeeded()
And if you want something even shorter I would consider a prefix like May or Might:
MayElevatePermissions()
MightElevatePermissions()
I don't see what's wrong with the original code:
if shouldFoo():
Foo();
is perfectly clear IMHO.
Not just that, but it clearly separates concerns of deciding about doing the action, vs the action itself.
Another option for a similar question with a slightly different approach to avoiding the postfix:
https://softwareengineering.stackexchange.com/a/161754/262009

Should I always/ever/never initialize object fields to default values?

Code styling question here.
I looked at this question which asks if the .NET CLR will really always initialize field values. (The answer is yes.) But it strikes me that I'm not sure that it's always a good idea to have it do this. My thinking is that if I see a declaration like this:
int myBlorgleCount = 0;
I have a pretty good idea that the programmer expects the count to start at zero, and is okay with that, at least for the immediate future. On the other hand, if I just see:
int myBlorgleCount;
I have no real immediate idea if 0 is a legal or reasonable value. And if the programmer just starts reading and modifying it, I don't know whether the programmer meant to start using it before they set a value to it, or if they were expecting it to be zero, etc.
On the other hand, some fairly smart people, and the Visual Studio code cleanup utility, tell me to remove these redundant declarations. What is the general consensus on this? (Is there a consensus?)
I marked this as language agnostic, but if there is an odd case out there where it's specifically a good idea to go against the grain for a particular language, that's probably worth pointing out.
EDIT: While I did put that this question was language agnostic, it obviously doesn't apply to languages like C, where no value initialization is done.
EDIT: I appreciate John's answer, but it is exactly what I'm not looking for. I understand that .NET (or Java or whatever) will do the job and initialize the values consistently and correctly. What I'm saying is that if I see code that is modifying a value that hasn't been previously explicitly set in code, I, as a code maintainer, don't know if the original coder meant it to be the default value, or just forgot to set the value, or was expecting it to be set somewhere else, etc.
Think long term maintenance.
Keep the code as explicit as possible.
Don't rely on language specific ways to initialize if you don't have to. Maybe a newer version of the language will work differently?
Future programmers will thank you.
Management will thank you.
Why obfuscate things even the slightest?
Update: Future maintainers may come from a different background. It really isn't about what is "right" it is more what will be easiest in the long run.
You are always safe in assuming the platform works the way the platform works. The .NET platform initializes all fields to default values. If you see a field that is not initialized by the code, it means the field is initialized by the CLR, not that it is uninitialized.
This concern is valid for platforms which do not guarantee initialization, but not here. In .NET, is more often indicates ignorance from the developer, thinking initialization is necessary.
Another unnecessary hangover from the past is the following:
string foo = null;
foo = MethodCall();
I've seen that from people who should know better.
I think that it makes sense to initialize the values if it clarifies the developer's intent.
In C#, there's no overhead as the values are all initialized anyway. In C/C++, uninitialized values will contain garbage/unknown values (whatever was in the memory location), so initialization was more important.
I think it should be done if it really helps to make the code more understandable.
But I think this is a general problem with all language features. My opinion on that is: If it is an official feature of the language, you can use it. (Of course there are some anti-features which should be used with caution or avoided at all, like a missing option explicit in Visual Basic or diamond inheritance in C++)
There was I time when I was very paranoid and added all kinds of unnecessary initializations, explicit casts, über-paranoid try-finally blocks, ... I once even thought about ignoring auto-boxing and replacing all occurrences with explicit type conversions, just "to be on the safe side".
The problem is: There is no end. You can avoid almost all language features, because you do not want to trust them.
Remember: It's only magic until you understand it :)
I agree with you; it may be verbose, but I like to see:
int myBlorgleCount = 0;
Now, I always initial strings though:
string myString = string.Empty;
(I just hate null strings.)
In the case where I cannot immediately set it to something useful
int myValue = SomeMethod();
I will set it to 0. That is more to avoid having to think about what the value would be otherwise. For me, the fact that integers are always set to 0 is not on the tip of my fingers, so when I see
int myValue;
it will take me a second to pull up that fact and remember what it will be set to, disrupting my thought process.
For someone who has that knowledge readily available, they will encounter
int myValue = 0;
and wonder why the hell is that person setting it to zero, when the compiler would just do it for them. This thought would interrupt their thought process.
So do which ever makes the most sense for both you and the team you are working in. If the common practice is to set it, then set it, otherwise don't.
In my experience I've found that explicitly initializing local variables (in .NET) adds more clutter than clarity.
Class-wide variables, on the other hand should always be initialized. In the past we defined system-wide custom "null" values for common variable types. This way we could always know what was uninitialized by error and what was initialized on purpose.
I always initialize fields explicitly in the constructor. For me, it's THE place to do it.
I think a lot of that comes down to past experiences.
In older and unamanged languages, the expectation is that the value is unknown. This expectation is retained by programmers coming from these languages.
Almost all modern or managed languages have defined values for recently created variables, whether that's from class constructors or language features.
For now, I think it's perfectly fine to initialize a value; what was once implicit becomes explicit. In the long run, say, in the next 10 to 20 years, people may start learning that a default value is possible, expected, and known - especially if they stay consistent across languages (eg, empty string for strings, 0 for numerics).
You Should do it, there is no need to, but it is better if you do so, because you never know if the language you are using initialize the values. By doing it yourself, you ensure your values are both initialized and with standard predefined values set.
There is nothing wrong on doing it except perhaps a bit of 'time wasted'. I would recommend it strongly. While the commend by John is quite informative, on general use it is better to go the safe path.
I usually do it for strings and in some cases collections where I don't want nulls floating around.
The general consensus where I work is "Not to do it explicitly for value types."
I wouldn't do it. C# initializes an int to zero anyways, so the two lines are functionally equivalent. One is just longer and redundant, although more descriptive to a programmer who doesn't know C#.
This is tagged as language-agnostic but most of the answers are regarding C#.
In C and C++, the best practice is to always initialize your values. There are some cases where this will be done for you such as static globals, but there shouldn't be a performance hit of any kind for redundantly initializing these values with most compilers.
I wouldn't initialise them. If you keep the declaration as close as possible to the first use, then there shouldn't be any confusion.
Another thing to remember is, if you are gonna use automatic properties, you have to rely on implicit values, like:
public int Count { get; set; }
http://www.geekherocomic.com/2009/07/27/common-pitfalls-initialize-your-variables/
If a field will often have new values stored into it without regard for what was there previously, and if it should behave as though a zero was stored there initially but there's nothing "special" about zero, then the value should be stored explicitly.
If the field represents a count or total which will never have a non-zero value written to it directly, but will instead always have other amounts added or subtracted, then zero should be considered an "empty" value, and thus need not be explicitly stated.
To use a crude analogy, consider the following two conditions:
`if (xposition != 0) ...
`if ((flags & WoozleModes.deluxe) != 0) ...
In the former scenario, comparison to the literal zero makes sense because it is checking for a position which is semantically no different from any other. In the second scenario, however, I would suggest that the comparison to the literal zero adds nothing to readability because code isn't really interested in whether the value of the expression (flags & WoozleModes.deluxe) happens to be a number other than zero, but rather whether it's "non-empty".
I don't know of any programming languages that provide separate ways of distinguishing numeric values for "zero" and "empty", other than by not requiring the use of literal zeros when indicating emptiness.

What's the best name for a non-mutating "add" method on an immutable collection? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 11 months ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Sorry for the waffly title - if I could come up with a concise title, I wouldn't have to ask the question.
Suppose I have an immutable list type. It has an operation Foo(x) which returns a new immutable list with the specified argument as an extra element at the end. So to build up a list of strings with values "Hello", "immutable", "world" you could write:
var empty = new ImmutableList<string>();
var list1 = empty.Foo("Hello");
var list2 = list1.Foo("immutable");
var list3 = list2.Foo("word");
(This is C# code, and I'm most interested in a C# suggestion if you feel the language is important. It's not fundamentally a language question, but the idioms of the language may be important.)
The important thing is that the existing lists are not altered by Foo - so empty.Count would still return 0.
Another (more idiomatic) way of getting to the end result would be:
var list = new ImmutableList<string>().Foo("Hello")
.Foo("immutable")
.Foo("word");
My question is: what's the best name for Foo?
EDIT 3: As I reveal later on, the name of the type might not actually be ImmutableList<T>, which makes the position clear. Imagine instead that it's TestSuite and that it's immutable because the whole of the framework it's a part of is immutable...
(End of edit 3)
Options I've come up with so far:
Add: common in .NET, but implies mutation of the original list
Cons: I believe this is the normal name in functional languages, but meaningless to those without experience in such languages
Plus: my favourite so far, it doesn't imply mutation to me. Apparently this is also used in Haskell but with slightly different expectations (a Haskell programmer might expect it to add two lists together rather than adding a single value to the other list).
With: consistent with some other immutable conventions, but doesn't have quite the same "additionness" to it IMO.
And: not very descriptive.
Operator overload for + : I really don't like this much; I generally think operators should only be applied to lower level types. I'm willing to be persuaded though!
The criteria I'm using for choosing are:
Gives the correct impression of the result of the method call (i.e. that it's the original list with an extra element)
Makes it as clear as possible that it doesn't mutate the existing list
Sounds reasonable when chained together as in the second example above
Please ask for more details if I'm not making myself clear enough...
EDIT 1: Here's my reasoning for preferring Plus to Add. Consider these two lines of code:
list.Add(foo);
list.Plus(foo);
In my view (and this is a personal thing) the latter is clearly buggy - it's like writing "x + 5;" as a statement on its own. The first line looks like it's okay, until you remember that it's immutable. In fact, the way that the plus operator on its own doesn't mutate its operands is another reason why Plus is my favourite. Without the slight ickiness of operator overloading, it still gives the same connotations, which include (for me) not mutating the operands (or method target in this case).
EDIT 2: Reasons for not liking Add.
Various answers are effectively: "Go with Add. That's what DateTime does, and String has Replace methods etc which don't make the immutability obvious." I agree - there's precedence here. However, I've seen plenty of people call DateTime.Add or String.Replace and expect mutation. There are loads of newsgroup questions (and probably SO ones if I dig around) which are answered by "You're ignoring the return value of String.Replace; strings are immutable, a new string gets returned."
Now, I should reveal a subtlety to the question - the type might not actually be an immutable list, but a different immutable type. In particular, I'm working on a benchmarking framework where you add tests to a suite, and that creates a new suite. It might be obvious that:
var list = new ImmutableList<string>();
list.Add("foo");
isn't going to accomplish anything, but it becomes a lot murkier when you change it to:
var suite = new TestSuite<string, int>();
suite.Add(x => x.Length);
That looks like it should be okay. Whereas this, to me, makes the mistake clearer:
var suite = new TestSuite<string, int>();
suite.Plus(x => x.Length);
That's just begging to be:
var suite = new TestSuite<string, int>().Plus(x => x.Length);
Ideally, I would like my users not to have to be told that the test suite is immutable. I want them to fall into the pit of success. This may not be possible, but I'd like to try.
I apologise for over-simplifying the original question by talking only about an immutable list type. Not all collections are quite as self-descriptive as ImmutableList<T> :)
In situations like that, I usually go with Concat. That usually implies to me that a new object is being created.
var p = listA.Concat(listB);
var k = listA.Concat(item);
I'd go with Cons, for one simple reason: it means exactly what you want it to.
I'm a huge fan of saying exactly what I mean, especially in source code. A newbie will have to look up the definition of Cons only once, but then read and use that a thousand times. I find that, in the long term, it's nicer to work with systems that make the common case easier, even if the up-front cost is a little bit higher.
The fact that it would be "meaningless" to people with no FP experience is actually a big advantage. As you pointed out, all of the other words you found already have some meaning, and that meaning is either slightly different or ambiguous. A new concept should have a new word (or in this case, an old one). I'd rather somebody have to look up the definition of Cons, than to assume incorrectly he knows what Add does.
Other operations borrowed from functional languages often keep their original names, with no apparent catastrophes. I haven't seen any push to come up with synonyms for "map" and "reduce" that sound more familiar to non-FPers, nor do I see any benefit from doing so.
(Full disclosure: I'm a Lisp programmer, so I already know what Cons means.)
Actually I like And, especially in the idiomatic way. I'd especially like it if you had a static readonly property for the Empty list, and perhaps make the constructor private so you always have to build from the empty list.
var list = ImmutableList<string>.Empty.And("Hello")
.And("Immutable")
.And("Word");
Whenever I'm in a jam with nomenclature, I hit up the interwebs.
thesaurus.com returns this for "add":
Definition: adjoin, increase; make
further comment
Synonyms: affix,
annex, ante, append, augment, beef
up, boost, build up, charge up,
continue, cue in, figure in, flesh
out, heat up, hike, hike up, hitch on,
hook on, hook up with, include, jack
up, jazz up, join together, pad,
parlay, piggyback, plug into, pour it
on, reply, run up, say further, slap
on, snowball, soup up, speed up,
spike, step up, supplement, sweeten,
tack on, tag
I like the sound of Adjoin, or more simply Join. That is what you're doing, right? The method could also apply to joining other ImmutableList<>'s.
Personally, I like .With(). If I was using the object, after reading the documentation or the code comments, it would be clear what it does, and it reads ok in the source code.
object.With("My new item as well");
Or, you add "Along" with it.. :)
object.AlongWith("this new item");
I ended up going with Add for all of my Immutable Collections in BclExtras. The reason being is that it's an easy predictable name. I'm not worried so much about people confusing Add with a mutating add since the name of the type is prefixed with Immutable.
For awhile I considered Cons and other functional style names. Eventually I discounted them because they're not nearly as well known. Sure functional programmers will understand but they're not the majority of users.
Other Names: you mentioned:
Plus: I'm wishy/washing on this one. For me this doesn't distinguish it as being a non-mutating operation anymore than Add does
With: Will cause issues with VB (pun intended)
Operator overloading: Discoverability would be an issue
Options I considered:
Concat: String's are Immutable and use this. Unfortunately it's only really good for adding to the end
CopyAdd: Copy what? The source, the list?
AddToNewList: Maybe a good one for List. But what about a Collection, Stack, Queue, etc ...
Unfortunately there doesn't really seem to be a word that is
Definitely an immutable operation
Understandable to the majority of users
Representable in less than 4 words
It gets even more odd when you consider collections other than List. Take for instance Stack. Even first year programmers can tell you that Stacks have a Push/Pop pair of methods. If you create an ImmutableStack and give it a completely different name, lets call it Foo/Fop, you've just added more work for them to use your collection.
Edit: Response to Plus Edit
I see where you're going with Plus. I think a stronger case would actually be Minus for remove. If I saw the following I would certainly wonder what in the world the programmer was thinking
list.Minus(obj);
The biggest problem I have with Plus/Minus or a new pairing is it feels like overkill. The collection itself already has a distinguishing name, the Immutable prefix. Why go further by adding vocabulary whose intent is to add the same distinction as the Immutable prefix already did.
I can see the call site argument. It makes it clearer from the standpoint of a single expression. But in the context of the entire function it seems unnecessary.
Edit 2
Agree that people have definitely been confused by String.Concat and DateTime.Add. I've seen several very bright programmers hit this problem.
However I think ImmutableList is a different argument. There is nothing about String or DateTime that establishes it as Immutable to a programmer. You must simply know that it's immutable via some other source. So the confusion is not unexpected.
ImmutableList does not have that problem because the name defines it's behavior. You could argue that people don't know what Immutable is and I think that's also valid. I certainly didn't know it till about year 2 in college. But you have the same issue with whatever name you choose instead of Add.
Edit 3: What about types like TestSuite which are immutable but do not contain the word?
I think this drives home the idea that you shouldn't be inventing new method names. Namely because there is clearly a drive to make types immutable in order to facilitate parallel operations. If you focus on changing the name of methods for collections, the next step will be the mutating method names on every type you use that is immutable.
I think it would be a more valuable effort to instead focus on making types identifiable as Immutable. That way you can solve the problem without rethinking every mutating method pattern out there.
Now how can you identify TestSuite as Immutable? In todays environment I think there are a few ways
Prefix with Immutable: ImmutableTestSuite
Add an Attribute which describes the level of Immutablitiy. This is certainly less discoverable
Not much else.
My guess/hope is development tools will start helping this problem by making it easy to identify immutable types simply by sight (different color, stronger font, etc ...). But I think that's the answer though over changing all of the method names.
I think this may be one of those rare situations where it's acceptable to overload the + operator. In math terminology, we know that + doesn't append something to the end of something else. It always combines two values together and returns a new resulting value.
For example, it's intuitively obvious that when you say
x = 2 + 2;
the resulting value of x is 4, not 22.
Similarly,
var empty = new ImmutableList<string>();
var list1 = empty + "Hello";
var list2 = list1 + "immutable";
var list3 = list2 + "word";
should make clear what each variable is going to hold. It should be clear that list2 is not changed in the last line, but instead that list3 is assigned the result of appending "word" to list2.
Otherwise, I would just name the function Plus().
To be as clear as possible, you might want to go with the wordier CopyAndAdd, or something similar.
I would call it Extend() or maybe ExtendWith() if you feel like really verbose.
Extends means adding something to something else without changing it. I think this is very relevant terminology in C# since this is similar to the concept of extension methods - they "add" a new method to a class without "touching" the class itself.
Otherwise, if you really want to emphasize that you don't modify the original object at all, using some prefix like Get- looks like unavoidable to me.
Added(), Appended()
I like to use the past tense for operations on immutable objects. It conveys the idea that you aren't changing the original object, and it's easy to recognize when you see it.
Also, because mutating method names are often present-tense verbs, it applies to most of the immutable-method-name-needed cases you run into. For example an immutable stack has the methods "pushed" and "popped".
I like mmyers suggestion of CopyAndAdd. In keeping with a "mutation" theme, maybe you could go with Bud (asexual reproduction), Grow, Replicate, or Evolve? =)
EDIT: To continue with my genetic theme, how about Procreate, implying that a new object is made which is based on the previous one, but with something new added.
This is probably a stretch, but in Ruby there is a commonly used notation for the distinction: add doesn't mutate; add! mutates. If this is an pervasive problem in your project, you could do that too (not necessarily with non-alphabetic characters, but consistently using a notation to indicate mutating/non-mutating methods).
Join seems appropriate.
Maybe the confusion stems from the fact that you want two operations in one. Why not separate them? DSL style:
var list = new ImmutableList<string>("Hello");
var list2 = list.Copy().With("World!");
Copy would return an intermediate object, that's a mutable copy of the original list. With would return a new immutable list.
Update:
But, having an intermediate, mutable collection around is not a good approach. The intermediate object should be contained in the Copy operation:
var list1 = new ImmutableList<string>("Hello");
var list2 = list1.Copy(list => list.Add("World!"));
Now, the Copy operation takes a delegate, which receives a mutable list, so that it can control the copy outcome. It can do much more than appending an element, like removing elements or sorting the list. It can also be used in the ImmutableList constructor to assemble the initial list without intermediary immutable lists.
public ImmutableList<T> Copy(Action<IList<T>> mutate) {
if (mutate == null) return this;
var list = new List<T>(this);
mutate(list);
return new ImmutableList<T>(list);
}
Now there's no possibility of misinterpretation by the users, they will naturally fall into the pit of success.
Yet another update:
If you still don't like the mutable list mention, even now that it's contained, you can design a specification object, that will specify, or script, how the copy operation will transform its list. The usage will be the same:
var list1 = new ImmutableList<string>("Hello");
// rules is a specification object, that takes commands to run in the copied collection
var list2 = list1.Copy(rules => rules.Append("World!"));
Now you can be creative with the rules names and you can only expose the functionality that you want Copy to support, not the entire capabilities of an IList.
For the chaining usage, you can create a reasonable constructor (which will not use chaining, of course):
public ImmutableList(params T[] elements) ...
...
var list = new ImmutableList<string>("Hello", "immutable", "World");
Or use the same delegate in another constructor:
var list = new ImmutableList<string>(rules =>
rules
.Append("Hello")
.Append("immutable")
.Append("World")
);
This assumes that the rules.Append method returns this.
This is what it would look like with your latest example:
var suite = new TestSuite<string, int>(x => x.Length);
var otherSuite = suite.Copy(rules =>
rules
.Append(x => Int32.Parse(x))
.Append(x => x.GetHashCode())
);
A few random thoughts:
ImmutableAdd()
Append()
ImmutableList<T>(ImmutableList<T> originalList, T newItem) Constructor
DateTime in C# uses Add. So why not use the same name? As long the users of your class understand the class is immutable.
I think the key thing you're trying to get at that's hard to express is the nonpermutation, so maybe something with a generative word in it, something like CopyWith() or InstancePlus().
I don't think the English language will let you imply immutability in an unmistakable way while using a verb that means the same thing as "Add". "Plus" almost does it, but people can still make the mistake.
The only way you're going to prevent your users from mistaking the object for something mutable is by making it explicit, either through the name of the object itself or through the name of the method (as with the verbose options like "GetCopyWith" or "CopyAndAdd").
So just go with your favourite, "Plus."
First, an interesting starting point:
http://en.wikipedia.org/wiki/Naming_conventions_(programming) ...In particular, check the "See Also" links at the bottom.
I'm in favor of either Plus or And, effectively equally.
Plus and And are both math-based in etymology. As such, both connote mathematical operation; both yield an expression which reads naturally as expressions which may resolve into a value, which fits with the method having a return value. And bears additional logic connotation, but both words apply intuitively to lists. Add connotes action performed on an object, which conflicts with the method's immutable semantics.
Both are short, which is especially important given the primitiveness of the operation. Simple, frequently-performed operations deserve shorter names.
Expressing immutable semantics is something I prefer to do via context. That is, I'd rather simply imply that this entire block of code has a functional feel; assume everything is immutable. That might just be me, however. I prefer immutability to be the rule; if it's done, it's done a lot in the same place; mutability is the exception.
How about Chain() or Attach()?
I prefer Plus (and Minus). They are easily understandable and map directly to operations involving well known immutable types (the numbers). 2+2 doesn't change the value of 2, it returns a new, equally immutable, value.
Some other possibilities:
Splice()
Graft()
Accrete()
How about mate, mateWith, or coitus, for those who abide. In terms of reproducing mammals are generally considered immutable.
Going to throw Union out there too. Borrowed from SQL.
Apparently I'm the first Obj-C/Cocoa person to answer this question.
NNString *empty = [[NSString alloc] init];
NSString *list1 = [empty stringByAppendingString:#"Hello"];
NSString *list2 = [list1 stringByAppendingString:#"immutable"];
NSString *list3 = [list2 stringByAppendingString:#"word"];
Not going to win any code golf games with this.
I think "Add" or "Plus" sounds fine. The name of the list itself should be enough to convey the list's immutability.
Maybe there are some words which remember me more of making a copy and add stuff to that instead of mutating the instance (like "Concatenate"). But i think having some symmetry for those words for other actions would be good to have too. I don't know of a similar word for "Remove" that i think of the same kind like "Concatenate". "Plus" sounds little strange to me. I wouldn't expect it being used in a non-numerical context. But that could aswell come from my non-english background.
Maybe i would use this scheme
AddToCopy
RemoveFromCopy
InsertIntoCopy
These have their own problems though, when i think about it. One could think they remove something or add something to an argument given. Not sure about it at all. Those words do not play nice in chaining either, i think. Too wordy to type.
Maybe i would just use plain "Add" and friends too. I like how it is used in math
Add 1 to 2 and you get 3
Well, certainly, a 2 remains a 2 and you get a new number. This is about two numbers and not about a list and an element, but i think it has some analogy. In my opinion, add does not necessarily mean you mutate something. I certainly see your point that having a lonely statement containing just an add and not using the returned new object does not look buggy. But I've now also thought some time about that idea of using another name than "add" but i just can't come up with another name, without making me think "hmm, i would need to look at the documentation to know what it is about" because its name differs from what I would expect to be called "add". Just some weird thought about this from litb, not sure it makes sense at all :)
Looking at http://thesaurus.reference.com/browse/add and http://thesaurus.reference.com/browse/plus I found gain and affix but I'm not sure how much they imply non-mutation.
I think that Plus() and Minus() or, alternatively, Including(), Excluding() are reasonable at implying immutable behavior.
However, no naming choice will ever make it perfectly clear to everyone, so I personally believe that a good xml doc comment would go a very long way here. VS throws these right in your face when you write code in the IDE - they're hard to ignore.
Append - because, note that names of the System.String methods suggest that they mutate the instance, but they don't.
Or I quite like AfterAppending:
void test()
{
Bar bar = new Bar();
List list = bar.AfterAppending("foo");
}
list.CopyWith(element)
As does Smalltalk :)
And also list.copyWithout(element) that removes all occurrences of an element, which is most useful when used as list.copyWithout(null) to remove unset elements.
I would go for Add, because I can see the benefit of a better name, but the problem would be to find different names for every other immutable operation which might make the class quite unfamiliar if that makes sense.

Are hard-coded STRINGS ever acceptable?

Similar to Is hard-coding literals ever acceptable?, but I'm specifically thinking of "magic strings" here.
On a large project, we have a table of configuration options like these:
Name Value
---- -----
FOO_ENABLED Y
BAR_ENABLED N
...
(Hundreds of them).
The common practice is to call a generic function to test an option like this:
if (config_options.value('FOO_ENABLED') == 'Y') ...
(Of course, this same option may need to be checked in many places in the system code.)
When adding a new option, I was considering adding a function to hide the "magic string" like this:
if (config_options.foo_enabled()) ...
However, colleagues thought I'd gone overboard and objected to doing this, preferring the hard-coding because:
That's what we normally do
It makes it easier to see what's going on when debugging the code
The trouble is, I can see their point! Realistically, we are never going to rename the options for any reason, so about the only advantage I can think of for my function is that the compiler would catch any typo like fo_enabled(), but not 'FO_ENABLED'.
What do you think? Have I missed any other advantages/disadvantages?
If I use a string once in the code, I don't generally worry about making it a constant somewhere.
If I use a string twice in the code, I'll consider making it a constant.
If I use a string three times in the code, I'll almost certainly make it a constant.
if (config_options.isTrue('FOO_ENABLED')) {...
}
Restrict your hard coded Y check to one place, even if it means writing a wrapper class for your Map.
if (config_options.isFooEnabled()) {...
}
Might seem okay until you have 100 configuration options and 100 methods (so here you can make a judgement about future application growth and needs before deciding on your implementation). Otherwise it is better to have a class of static strings for parameter names.
if (config_options.isTrue(ConfigKeys.FOO_ENABLED)) {...
}
I realise the question is old, but it came up on my margin.
AFAIC, the issue here has not been identified accurately, either in the question, or the answers. Forget about 'harcoding strings" or not, for a moment.
The database has a Reference table, containing config_options. The PK is a string.
There are two types of PKs:
Meaningful Identifiers, that the users (and developers) see and use. These PKs are supposed to be stable, they can be relied upon.
Meaningless Id columns which the users should never see, that the developers have to be aware of, and code around. These cannot be relied upon.
It is ordinary, normal, to write code using the absolute value of a meaningful PK IF CustomerCode = "IBM" ... or IF CountryCode = "AUS" etc.
referencing the absolute value of a meaningless PK is not acceptable (due to auto-increment; gaps being changed; values being replaced wholesale).
.
Your reference table uses meaningful PKs. Referencing those literal strings in code is unavoidable. Hiding the value will make maintenance more difficult; the code is no longer literal; your colleagues are right. Plus there is the additional redundant function that chews cycles. If there is a typo in the literal, you will soon find that out during Dev testing, long before UAT.
hundreds of functions for hundreds of literals is absurd. If you do implement a function, then Normalise your code, and provide a single function that can be used for any of the hundreds of literals. In which case, we are back to a naked literal, and the function can be dispensed with.
the point is, the attempt to hide the literal has no value.
.
It cannot be construed as "hardcoding", that is something quite different. I think that is where your issue is, identifying these constructs as "hardcoded". It is just referencing a Meaningfull PK literally.
Now from the perspective of any code segment only, if you use the same value a few times, you can improve the code by capturing the literal string in a variable, and then using the variable in the rest of the code block. Certainly not a function. But that is an efficiency and good practice issue. Even that does not change the effect IF CountryCode = #cc_aus
I really should use constants and no hard coded literals.
You can say they won't be changed, but you may never know. And it is best to make it a habit. To use symbolic constants.
In my experience, this kind of issue is masking a deeper problem: failure to do actual OOP and to follow the DRY principle.
In a nutshell, capture the decision at startup time by an appropriate definition for each action inside the if statements, and then throw away both the config_options and the run-time tests.
Details below.
The sample usage was:
if (config_options.value('FOO_ENABLED') == 'Y') ...
which raises the obvious question, "What's going on in the ellipsis?", especially given the following statement:
(Of course, this same option may need to be checked in many places in the system code.)
Let's assume that each of these config_option values really does correspond to a single problem domain (or implementation strategy) concept.
Instead of doing this (repeatedly, in various places throughout the code):
Take a string (tag),
Find its corresponding other string (value),
Test that value as a boolean-equivalent,
Based on that test, decide whether to perform some action.
I suggest encapsulating the concept of a "configurable action".
Let's take as an example (obviously just as hypthetical as FOO_ENABLED ... ;-) that your code has to work in either English units or metric units. If METRIC_ENABLED is "true", convert user-entered data from metric to English for internal computation, and convert back prior to displaying results.
Define an interface:
public interface MetricConverter {
double toInches(double length);
double toCentimeters(double length);
double toPounds(double weight);
double toKilograms(double weight);
}
which identifies in one place all the behavior associated with the concept of METRIC_ENABLED.
Then write concrete implementations of all the ways those behaviors are to be carried out:
public class NullConv implements MetricConverter {
double toInches(double length) {return length;}
double toCentimeters(double length) {return length;}
double toPounds(double weight) {return weight;}
double toKilograms(double weight) {return weight;}
}
and
// lame implementation, just for illustration!!!!
public class MetricConv implements MetricConverter {
public static final double LBS_PER_KG = 2.2D;
public static final double CM_PER_IN = 2.54D
double toInches(double length) {return length * CM_PER_IN;}
double toCentimeters(double length) {return length / CM_PER_IN;}
double toPounds(double weight) {return weight * LBS_PER_KG;}
double toKilograms(double weight) {return weight / LBS_PER_KG;}
}
At startup time, instead of loading a bunch of config_options values, initialize a set of configurable actions, as in:
MetricConverter converter = (metricOption()) ? new MetricConv() : new NullConv();
(where the expression metricOption() above is a stand-in for whatever one-time-only check you need to make, including looking at the value of METRIC_ENABLED ;-)
Then, wherever the code would have said:
double length = getLengthFromGui();
if (config_options.value('METRIC_ENABLED') == 'Y') {
length = length / 2.54D;
}
// do some computation to produce result
// ...
if (config_options.value('METRIC_ENABLED') == 'Y') {
result = result * 2.54D;
}
displayResultingLengthOnGui(result);
rewrite it as:
double length = converter.toInches(getLengthFromGui());
// do some computation to produce result
// ...
displayResultingLengthOnGui(converter.toCentimeters(result));
Because all of the implementation details related to that one concept are now packaged cleanly, all future maintenance related to METRIC_ENABLED can be done in one place. In addition, the run-time trade-off is a win; the "overhead" of invoking a method is trivial compared with the overhead of fetching a String value from a Map and performing String#equals.
I believe that the two reasons you have mentioned, Possible misspelling in string, that cannot be detected until run time and the possibility (although slim) of a name change would justify your idea.
On top of that you can get typed functions, now it seems you only store booleans, what if you need to store an int, a string etc. I would rather use get_foo() with a type, than get_string("FOO") or get_int("FOO").
I think there are two different issues here:
In the current project, the convention of using hard-coded strings is already well established, so all the developers working on the project are familiar with it. It might be a sub-optimal convention for all the reasons that have been listed, but everybody familiar with the code can look at it and instinctively knows what the code is supposed to do. Changing the code so that in certain parts, it uses the "new" functionality will make the code slightly harder to read (because people will have to think and remember what the new convention does) and thus a little harder to maintain. But I would guess that changing over the whole project to the new convention would potentially be prohibitively expensive unless you can quickly script the conversion.
On a new project, symbolic constants are the way IMO, for all the reasons listed. Especially because anything that makes the compiler catch errors at compile time that would otherwise be caught by a human at run time is a very useful convention to establish.
Another thing to consider is intent. If you are on a project that requires localization hard coded strings can be ambiguous. Consider the following:
const string HELLO_WORLD = "Hello world!";
print(HELLO_WORLD);
The programmer's intent is clear. Using a constant implies that this string does not need to be localized. Now look at this example:
print("Hello world!");
Here we aren't so sure. Did the programmer really not want this string to be localized or did the programmer forget about localization while he was writing this code?
I too prefer a strongly-typed configuration class if it is used through-out the code. With properly named methods you don't lose any readability. If you need to do conversions from strings to another data type (decimal/float/int), you don't need to repeat the code that does the conversion in multiple places and can cache the result so the conversion only takes place once. You've already got the basis of this in place already so I don't think it would take much to get used to the new way of doing things.

What is the term for "catching" a return value

I was training a new developer the other day and realized I don't know the actual term for "catching" a return value in a variable. For example, consider this pseudocoded method:
String updateString(newPart) {
string += newPart;
return string;
}
Assume this is being called to simply update the string - the return value is not needed:
updateString("add this");
Now, assume we want to do something with the returned value. We want to change the call so that we can use the newly updated string. I found myself saying "catch the return value", meaning I wanted to see:
String returnedString = updateString("add this");
So, if you were trying to ask someone to make this change, what terminology would you use? Is it different in different languages (since technically, you may be calling either a function or a method, depending on the language)?
assign the return value to a variable?
Returned values can be assigned or discarded/ignored/not used/[insert synonym here].
There isn't really a technical term for it.
I would say "returnedString is to be initialised with the return value of updateString".
"Catch" makes me think of exceptions, which is a bit misleading. How about something like "use" or "store" or "assign"?
Common ones that I know:
You assign a value to a variable.
You store a value into a variable.
check the function's return value, do not ignore return values
In the example, you're simply assigning the return value of the function to a new variable.
When describing the behavior of that single line of code, it doesn't really matter that the return value is not essential to the use of the function. However, in a broader context, it is very important to know what purpose this "Interesting Return Value" serves.
As others have said there isn't really a word for what you describe. However, here's a bit of terminology for you to chew on: the example you give looks like it could be a Fluent Interface.
I suggest "cache", meaning store it for later.
Maybe there's a subliminal reason you're saying "catch".
It's better too state the purpose rather than the implementation details (because actual implementation can be different in different programming langugages).
Generally speaking:
- Save the return value of the call.
If you know the return value is a result of something:
- Save the result of the call.
If you know the return value is to signify a status (such as error):
- Save the status of the call.
By using the word "save", you can use that same statement across the board, regardless of the mechanism used in that particular language to save the return value.