Related
When your in a situation where you need to return two things in a single method, what is the best approach?
I understand the philosophy that a method should do one thing only, but say you have a method that runs a database select and you need to pull two columns. I'm assuming you only want to traverse through the database result set once, but you want to return two columns worth of data.
The options I have come up with:
Use global variables to hold returns. I personally try and avoid globals where I can.
Pass in two empty variables as parameters then assign the variables inside the method, which now is a void. I don't like the idea of methods that have a side effects.
Return a collection that contains two variables. This can lead to confusing code.
Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
This is not entirely language-agnostic: in Lisp, you can actually return any number of values from a function, including (but not limited to) none, one, two, ...
(defun returns-two-values ()
(values 1 2))
The same thing holds for Scheme and Dylan. In Python, I would actually use a tuple containing 2 values like
def returns_two_values():
return (1, 2)
As others have pointed out, you can return multiple values using the out parameters in C#. In C++, you would use references.
void
returns_two_values(int& v1, int& v2)
{
v1 = 1; v2 = 2;
}
In C, your method would take pointers to locations, where your function should store the result values.
void
returns_two_values(int* v1, int* v2)
{
*v1 = 1; *v2 = 2;
}
For Java, I usually use either a dedicated class, or a pretty generic little helper (currently, there are two in my private "commons" library: Pair<F,S> and Triple<F,S,T>, both nothing more than simple immutable containers for 2 resp. 3 values)
I would create data transfer objects. If it is a group of information (first and last name) I would make a Name class and return that. #4 is the way to go. It seems like more work up front (which it is), but makes it up in clarity later.
If it is a list of records (rows in a database) I would return a Collection of some sort.
I would never use globals unless the app is trivial.
Not my own thoughts (Uncle Bob's):
If there's cohesion between those two variables - I've heard him say, you're missing a class where those two are fields. (He said the same thing about functions with long parameter lists.)
On the other hand, if there is no cohesion, then the function does more than one thing.
I think the most preferred approach is to build a container (may it be a class or a struct - if you don't want to create a separate class for this, struct is the way to go) that will hold all the parameters to be returned.
In the C/C++ world it would actually be quite common to pass two variables by reference (an example, your no. 2).
I think it all depends on the scenario.
Thinking from a C# mentality:
1: I would avoid globals as a solution to this problem, as it is accepted as bad practice.
4: If the two return values are uniquely tied together in some way or form that it could exist as its own object, then you can return a single object that holds the two values. If this object is only being designed and used for this method's return type, then it likely isn't the best solution.
3: A collection is a great option if the returned values are the same type and can be thought of as a collection. However, if the specific example needs 2 items, and each item is it's 'own' thing -> maybe one represents the beginning of something, and the other represents the end, and the returned items are not being used interchangably, then this may not be the best option.
2: I like this option the best, if 4, and 3 do not make sense for your scenario. As stated in 3, if you wanted to get two objects that represent the beginning and end items of something. Then I would use parameters by reference (or out parameters, again, depending on how it's all being used). This way your parameters can explicitly define their purpose: MethodCall(ref object StartObject, ref object EndObject)
Personally I try to use languages that allow functions to return something more than a simple integer value.
First, you should distinguish what you want: an arbitrary-length return or fixed-length return.
If you want your method to return an arbitrary number of arguments, you should stick to collection returns. Because the collections--whatever your language is--are specifically tied to fulfill such a task.
But sometimes you just need to return two values. How does returning two values--when you're sure it's always two values--differ from returning one value? No way it differs, I say! And modern languages, including perl, ruby, C++, python, ocaml etc allow function to return tuples, either built-in or as a third-party syntactic sugar (yes, I'm talking about boost::tuple). It looks like that:
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
Specifying an "out parameter", in my opinion, is overused due to the limitations of older languages and paradigms learned those days. But there still are many cases when it's usable (if your method needs to modify an object passed as parameter, that object being not the class that contains a method).
The conclusion is that there's no generic answer--each situation has its own solution. But one common thing there is: it's not violation of any paradigm that function returns several items. That's a language limitation later somehow transferred to human mind.
Python (like Lisp) also allows you to return any number of
values from a function, including (but not limited to)
none, one, two
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
Some languages make doing #3 native and easy. Example: Perl. "return ($a, $b);". Ditto Lisp.
Barring that, check if your language has a collection suited to the task, ala pair/tuple in C++
Barring that, create a pair/tuple class and/or collection and re-use it, especially if your language supports templating.
If your function has return value(s), it's presumably returning it/them for assignment to either a variable or an implied variable (to perform operations on, for instance.) Anything you can usefully express as a variable (or a testable value) should be fair game, and should dictate what you return.
Your example mentions a row or a set of rows from a SQL query. Then you reasonably should be ready to deal with those as objects or arrays, which suggests an appropriate answer to your question.
When your in a situation where you
need to return two things in a single
method, what is the best approach?
It depends on WHY you are returning two things.
Basically, as everyone here seems to agree, #2 and #4 are the two best answers...
I understand the philosophy that a
method should do one thing only, but
say you have a method that runs a
database select and you need to pull
two columns. I'm assuming you only
want to traverse through the database
result set once, but you want to
return two columns worth of data.
If the two pieces of data from the database are related, such as a customer's First Name and Last Name, I would indeed still consider this to be doing "one thing."
On the other hand, suppose you have come up with a strange SELECT statement that returns your company's gross sales total for a given date, and also reads the name of the customer that placed the first sale for today's date. Here you're doing two unrelated things!
If it's really true that performance of this strange SELECT statement is much better than doing two SELECT statements for the two different pieces of data, and both pieces of data really are needed on a frequent basis (so that the entire application would be slower if you didn't do it that way), then using this strange SELECT might be a good idea - but you better be prepared to demonstrate why your way really makes a difference in perceived response time.
The options I have come up with:
1 Use global variables to hold returns. I personally try and avoid
globals where I can.
There are some situations where creating a global is the right thing to do. But "returning two things from a function" is not one of those situations. Doing it for this purpose is just a Bad Idea.
2 Pass in two empty variables as parameters then assign the variables
inside the method, which now is a
void.
Yes, that's usually the best idea. This is exactly why "by reference" (or "output", depending on which language you're using) parameters exist.
I don't like the idea of methods that have a side effects.
Good theory, but you can take it too far. What would be the point of calling SaveCustomer() if that method didn't have a side-effect of saving the customer's data?
By Reference parameters are understood to be parameters that contain returned data.
3 Return a collection that contains two variables. This can lead to confusing code.
True. It wouldn't make sense, for instance, to return an array where element 0 was the first name and element 1 was the last name. This would be a Bad Idea.
4 Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
Yes and no. As you say, I wouldn't want to create an object called FirstAndLastNames just to be used by one method. But if there was already an object which had basically this information, then it would make perfect sense to use it here.
If I was returning two of the exact same thing, a collection might be appropriate, but in general I would usually build a specialized class to hold exactly what I needed.
And if if you are returning two things today from those two columns, tomorrow you might want a third. Maintaining a custom object is going to be a lot easier than any of the other options.
Use var/out parameters or pass variables by reference, not by value. In Delphi:
function ReturnTwoValues(out Param1: Integer):Integer;
begin
Param1 := 10;
Result := 20;
end;
If you use var instead of out, you can pre-initialize the parameter.
With databases, you could have an out parameter per column and the result of the function would be a boolean indicating if the record is retrieved correctly or not. (Although I would use a single record class to hold the column values.)
As much as it pains me to do it, I find the most readable way to return multiple values in PHP (which is what I work with, mostly) is using a (multi-dimensional) array, like this:
function doStuff($someThing)
{
// do stuff
$status = 1;
$message = 'it worked, good job';
return array('status' => $status, 'message' => $message);
}
Not pretty, but it works and it's not terribly difficult to figure out what's going on.
I generally use tuples. I mainly work in C# and its very easy to design generic tuple constructs. I assume it would be very similar for most languages which have generics. As an aside, 1 is a terrible idea, and 3 only works when you are getting two returns that are the same type unless you work in a language where everything derives from the same basic type (i.e. object). 2 and 4 are also good choices. 2 doesn't introduce any side effects a priori, its just unwieldy.
Use std::vector, QList, or some managed library container to hold however many X you want to return:
QList<X> getMultipleItems()
{
QList<X> returnValue;
for (int i = 0; i < countOfItems; ++i)
{
returnValue.push_back(<your data here>);
}
return returnValue;
}
For the situation you described, pulling two fields from a single table, the appropriate answer is #4 given that two properties (fields) of the same entity (table) will exhibit strong cohesion.
Your concern that "it might be confusing to create a class just for the purpose of a return" is probably not that realistic. If your application is non-trivial you are likely going to need to re-use that class/object elsewhere anyway.
You should also consider whether the design of your method is primarily returning a single value, and you are getting another value for reference along with it, or if you really have a single returnable thing like first name - last name.
For instance, you might have an inventory module that queries the number of widgets you have in inventory. The return value you want to give is the actual number of widgets.. However, you may also want to record how often someone is querying inventory and return the number of queries so far. In that case it can be tempting to return both values together. However, remember that you have class vars availabe for storing data, so you can store an internal query count, and not return it every time, then use a second method call to retrieve the related value. Only group the two values together if they are truly related. If they are not, use separate methods to retrieve them separately.
Haskell also allows multiple return values using built in tuples:
sumAndDifference :: Int -> Int -> (Int, Int)
sumAndDifference x y = (x + y, x - y)
> let (s, d) = sumAndDifference 3 5 in s * d
-16
Being a pure language, options 1 and 2 are not allowed.
Even using a state monad, the return value contains (at least conceptually) a bag of all relevant state, including any changes the function just made. It's just a fancy convention for passing that state through a sequence of operations.
I will usually opt for approach #4 as I prefer the clarity of knowing what the function produces or calculate is it's return value (rather than byref parameters). Also, it lends to a rather "functional" style in program flow.
The disadvantage of option #4 with generic tuple classes is it isn't much better than returning a collection (the only gain is type safety).
public IList CalculateStuffCollection(int arg1, int arg2)
public Tuple<int, int> CalculateStuffType(int arg1, int arg2)
var resultCollection = CalculateStuffCollection(1,2);
var resultTuple = CalculateStuffTuple(1,2);
resultCollection[0] // Was it index 0 or 1 I wanted?
resultTuple.A // Was it A or B I wanted?
I would like a language that allowed me to return an immutable tuple of named variables (similar to a dictionary, but immutable, typesafe and statically checked). But, sadly, such an option isn't available to me in the world of VB.NET, it may be elsewhere.
I dislike option #2 because it breaks that "functional" style and forces you back into a procedural world (when often I don't want to do that just to call a simple method like TryParse).
I have sometimes used continuation-passing style to work around this, passing a function value as an argument, and returning that function call passing the multiple values.
Objects in place of function values in languages without first-class functions.
My choice is #4. Define a reference parameter in your function. That pointer references to a Value Object.
In PHP:
class TwoValuesVO {
public $expectedOne;
public $expectedTwo;
}
/* parameter $_vo references to a TwoValuesVO instance */
function twoValues( & $_vo ) {
$vo->expectedOne = 1;
$vo->expectedTwo = 2;
}
In Java:
class TwoValuesVO {
public int expectedOne;
public int expectedTwo;
}
class TwoValuesTest {
void twoValues( TwoValuesVO vo ) {
vo.expectedOne = 1;
vo.expectedTwo = 2;
}
}
The following code illustrates a pattern I sometimes see, whereby an object is transformed implicitly as it is passed as a parameter across a number of method calls.
var o = new MyReferenceType();
DoSomeWorkAndPossiblyModifyO(o);
DoYetMoreWorkAndPossiblyFurtherModifyO(o);
//now use o...
This feels wrong to me (it hardly feels object oriented). Is it acceptable?
Based on your method names, I would argue that there is nothing implicit in the transformation. This pattern would be acceptable. If, on the other hand your methods had names like printO(o) or compareTo(o), but actually modified the Object o, the design would be bad.
It is acceptable but usually bad style.
The usual "good" approach is:
DoSomeWorkAndModify(&o); // explicit reference means we will be accepting changes
o = DoSomeWorkAndReturnModified(o); // much more elastic because you often want to keep original.
The approach you presented makes sense when o is huge, and making a copy of it in memory is out of question, or if it's a function you (and nobody else = private) use very frequently and don't want to bother with the & syntax. Otherwise it's laziness that results in some really difficult to detect bugs.
It depends entirely on what the methods actually do, besides modifying that object.
For instance, an object primarily related to keeping some state in memory might for instance not have anything related to persisting that state anywhere.
The methods could for instance load data from a database, and update the object with that information.
However! Since I program mostly in C# and thus .NET, which is a wholly object-oriented language, I would actually write your code like this:
var o = new MyReferenceType();
SomeOtherClass.DoSomeWorkAndPossiblyModifyO(o);
SomeOtherClass.DoYetMoreWorkAndPossiblyFurtherModifyO(o);
//now use o...
In which case the actual name of that other class (or those other classes if there's 2 involved) would give me a big clue as to what is actually happening and/or the context.
Example:
Person p = new Person();
DatabaseContext.FetchAllLazilyLoadedProperties(p);
DatabaseContext.Save(p); // updates primary key property with new ID
I am curious to know about this.
whenever I write a function which have to return multiple values, either I have to use pass by reference or create an array store values in it and pass them.
Why all the Object Orinented languages functions are not allowed to return multiple parameters as we pass them as input. Like is there anything inbuilt structure of the language which is restricting from doing this.
Dont you think it will be fun and easy if we are allowed to do so.
It's not true that all Object-Oriented languages follow this paradigm.
e.g. in Python (from here):
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
a will be 9 and b will be 27.
The difference between the traditional
OutTypeA SomeFunction(out OutTypeB, TypeC someOtherInputParam)
and your
{ OutTypeA, OutTypeB } SomeFunction(TypeC someOtherInputParam)
is just syntactic sugar. Also, the tradition of returning one single parameter type allows writing in the easy readable natural language of result = SomeFunction(...). It's just convenience and ease of use.
And yes, as others said, you have tuples in some languages.
This is likely because of the way processors have been designed and hence carried over to modern languages such as Java or C#. The processor can load multiple things (pointers) into parameter registers but only has one return value register that holds a pointer.
I do agree that not all OOP languages only support returning one value, but for the ones that "apparently" do, this I think is the reason why.
Also for returning a tuple, pair or struct for that matter in C/C++, essentially, the compiler is returning a pointer to that object.
First answer: They don't. many OOP languages allow you to return a tuple. This is true for instance in python, in C++ you have pair<> and in C++0x a fully fledged tuple<> is in TR1.
Second answer: Because that's the way it should be. A method should be short and do only one thing and thus can be argued, only need to return one thing.
In PHP, it is like that because the only way you can receive a value is by assigning the function to a variable (or putting it in place of a variable). Although I know array_map allows you to do return something & something;
To return multiple parameters, you return an single object that contains both of those parameters.
public MyResult GetResult(x)
{
return new MyResult { Squared = Math.Pow(x,2), Cubed = Math.Pow(x,3) };
}
For some languages you can create anonymous types on the fly. For others you have to specify a return object as a concrete class. One observation with OO is you do end up with a lot of little classes.
The syntactic niceties of python (see #Cowan's answer) are up to the language designer. The compiler / runtime could creating an anonymous class to hold the result for you, even in a strongly typed environment like the .net CLR.
Yes it can be easier to read in some circumstances, and yes it would be nice. However, if you read Eric Lippert's blog, you'll often read dialogue's and hear him go on about how there are many nice features that could be implemented, but there's a lot of effort that goes into every feature, and some things just don't make the cut because in the end they can't be justified.
It's not a restriction, it is just the architecture of the Object Oriented and Structured programming paradigms. I don't know if it would be more fun if functions returned more than one value, but it would be sure more messy and complicated. I think the designers of the above programming paradigms thought about it, and they probably had good reasons not to implement that "feature" -it is unnecessary, since you can already return multiple values by packing them in some kind of collection. Programming languages are designed to be compact, so usually unnecessary features are not implemented.
Closed. This question is opinion-based. It is not currently accepting answers.
Closed 11 months ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Sorry for the waffly title - if I could come up with a concise title, I wouldn't have to ask the question.
Suppose I have an immutable list type. It has an operation Foo(x) which returns a new immutable list with the specified argument as an extra element at the end. So to build up a list of strings with values "Hello", "immutable", "world" you could write:
var empty = new ImmutableList<string>();
var list1 = empty.Foo("Hello");
var list2 = list1.Foo("immutable");
var list3 = list2.Foo("word");
(This is C# code, and I'm most interested in a C# suggestion if you feel the language is important. It's not fundamentally a language question, but the idioms of the language may be important.)
The important thing is that the existing lists are not altered by Foo - so empty.Count would still return 0.
Another (more idiomatic) way of getting to the end result would be:
var list = new ImmutableList<string>().Foo("Hello")
.Foo("immutable")
.Foo("word");
My question is: what's the best name for Foo?
EDIT 3: As I reveal later on, the name of the type might not actually be ImmutableList<T>, which makes the position clear. Imagine instead that it's TestSuite and that it's immutable because the whole of the framework it's a part of is immutable...
(End of edit 3)
Options I've come up with so far:
Add: common in .NET, but implies mutation of the original list
Cons: I believe this is the normal name in functional languages, but meaningless to those without experience in such languages
Plus: my favourite so far, it doesn't imply mutation to me. Apparently this is also used in Haskell but with slightly different expectations (a Haskell programmer might expect it to add two lists together rather than adding a single value to the other list).
With: consistent with some other immutable conventions, but doesn't have quite the same "additionness" to it IMO.
And: not very descriptive.
Operator overload for + : I really don't like this much; I generally think operators should only be applied to lower level types. I'm willing to be persuaded though!
The criteria I'm using for choosing are:
Gives the correct impression of the result of the method call (i.e. that it's the original list with an extra element)
Makes it as clear as possible that it doesn't mutate the existing list
Sounds reasonable when chained together as in the second example above
Please ask for more details if I'm not making myself clear enough...
EDIT 1: Here's my reasoning for preferring Plus to Add. Consider these two lines of code:
list.Add(foo);
list.Plus(foo);
In my view (and this is a personal thing) the latter is clearly buggy - it's like writing "x + 5;" as a statement on its own. The first line looks like it's okay, until you remember that it's immutable. In fact, the way that the plus operator on its own doesn't mutate its operands is another reason why Plus is my favourite. Without the slight ickiness of operator overloading, it still gives the same connotations, which include (for me) not mutating the operands (or method target in this case).
EDIT 2: Reasons for not liking Add.
Various answers are effectively: "Go with Add. That's what DateTime does, and String has Replace methods etc which don't make the immutability obvious." I agree - there's precedence here. However, I've seen plenty of people call DateTime.Add or String.Replace and expect mutation. There are loads of newsgroup questions (and probably SO ones if I dig around) which are answered by "You're ignoring the return value of String.Replace; strings are immutable, a new string gets returned."
Now, I should reveal a subtlety to the question - the type might not actually be an immutable list, but a different immutable type. In particular, I'm working on a benchmarking framework where you add tests to a suite, and that creates a new suite. It might be obvious that:
var list = new ImmutableList<string>();
list.Add("foo");
isn't going to accomplish anything, but it becomes a lot murkier when you change it to:
var suite = new TestSuite<string, int>();
suite.Add(x => x.Length);
That looks like it should be okay. Whereas this, to me, makes the mistake clearer:
var suite = new TestSuite<string, int>();
suite.Plus(x => x.Length);
That's just begging to be:
var suite = new TestSuite<string, int>().Plus(x => x.Length);
Ideally, I would like my users not to have to be told that the test suite is immutable. I want them to fall into the pit of success. This may not be possible, but I'd like to try.
I apologise for over-simplifying the original question by talking only about an immutable list type. Not all collections are quite as self-descriptive as ImmutableList<T> :)
In situations like that, I usually go with Concat. That usually implies to me that a new object is being created.
var p = listA.Concat(listB);
var k = listA.Concat(item);
I'd go with Cons, for one simple reason: it means exactly what you want it to.
I'm a huge fan of saying exactly what I mean, especially in source code. A newbie will have to look up the definition of Cons only once, but then read and use that a thousand times. I find that, in the long term, it's nicer to work with systems that make the common case easier, even if the up-front cost is a little bit higher.
The fact that it would be "meaningless" to people with no FP experience is actually a big advantage. As you pointed out, all of the other words you found already have some meaning, and that meaning is either slightly different or ambiguous. A new concept should have a new word (or in this case, an old one). I'd rather somebody have to look up the definition of Cons, than to assume incorrectly he knows what Add does.
Other operations borrowed from functional languages often keep their original names, with no apparent catastrophes. I haven't seen any push to come up with synonyms for "map" and "reduce" that sound more familiar to non-FPers, nor do I see any benefit from doing so.
(Full disclosure: I'm a Lisp programmer, so I already know what Cons means.)
Actually I like And, especially in the idiomatic way. I'd especially like it if you had a static readonly property for the Empty list, and perhaps make the constructor private so you always have to build from the empty list.
var list = ImmutableList<string>.Empty.And("Hello")
.And("Immutable")
.And("Word");
Whenever I'm in a jam with nomenclature, I hit up the interwebs.
thesaurus.com returns this for "add":
Definition: adjoin, increase; make
further comment
Synonyms: affix,
annex, ante, append, augment, beef
up, boost, build up, charge up,
continue, cue in, figure in, flesh
out, heat up, hike, hike up, hitch on,
hook on, hook up with, include, jack
up, jazz up, join together, pad,
parlay, piggyback, plug into, pour it
on, reply, run up, say further, slap
on, snowball, soup up, speed up,
spike, step up, supplement, sweeten,
tack on, tag
I like the sound of Adjoin, or more simply Join. That is what you're doing, right? The method could also apply to joining other ImmutableList<>'s.
Personally, I like .With(). If I was using the object, after reading the documentation or the code comments, it would be clear what it does, and it reads ok in the source code.
object.With("My new item as well");
Or, you add "Along" with it.. :)
object.AlongWith("this new item");
I ended up going with Add for all of my Immutable Collections in BclExtras. The reason being is that it's an easy predictable name. I'm not worried so much about people confusing Add with a mutating add since the name of the type is prefixed with Immutable.
For awhile I considered Cons and other functional style names. Eventually I discounted them because they're not nearly as well known. Sure functional programmers will understand but they're not the majority of users.
Other Names: you mentioned:
Plus: I'm wishy/washing on this one. For me this doesn't distinguish it as being a non-mutating operation anymore than Add does
With: Will cause issues with VB (pun intended)
Operator overloading: Discoverability would be an issue
Options I considered:
Concat: String's are Immutable and use this. Unfortunately it's only really good for adding to the end
CopyAdd: Copy what? The source, the list?
AddToNewList: Maybe a good one for List. But what about a Collection, Stack, Queue, etc ...
Unfortunately there doesn't really seem to be a word that is
Definitely an immutable operation
Understandable to the majority of users
Representable in less than 4 words
It gets even more odd when you consider collections other than List. Take for instance Stack. Even first year programmers can tell you that Stacks have a Push/Pop pair of methods. If you create an ImmutableStack and give it a completely different name, lets call it Foo/Fop, you've just added more work for them to use your collection.
Edit: Response to Plus Edit
I see where you're going with Plus. I think a stronger case would actually be Minus for remove. If I saw the following I would certainly wonder what in the world the programmer was thinking
list.Minus(obj);
The biggest problem I have with Plus/Minus or a new pairing is it feels like overkill. The collection itself already has a distinguishing name, the Immutable prefix. Why go further by adding vocabulary whose intent is to add the same distinction as the Immutable prefix already did.
I can see the call site argument. It makes it clearer from the standpoint of a single expression. But in the context of the entire function it seems unnecessary.
Edit 2
Agree that people have definitely been confused by String.Concat and DateTime.Add. I've seen several very bright programmers hit this problem.
However I think ImmutableList is a different argument. There is nothing about String or DateTime that establishes it as Immutable to a programmer. You must simply know that it's immutable via some other source. So the confusion is not unexpected.
ImmutableList does not have that problem because the name defines it's behavior. You could argue that people don't know what Immutable is and I think that's also valid. I certainly didn't know it till about year 2 in college. But you have the same issue with whatever name you choose instead of Add.
Edit 3: What about types like TestSuite which are immutable but do not contain the word?
I think this drives home the idea that you shouldn't be inventing new method names. Namely because there is clearly a drive to make types immutable in order to facilitate parallel operations. If you focus on changing the name of methods for collections, the next step will be the mutating method names on every type you use that is immutable.
I think it would be a more valuable effort to instead focus on making types identifiable as Immutable. That way you can solve the problem without rethinking every mutating method pattern out there.
Now how can you identify TestSuite as Immutable? In todays environment I think there are a few ways
Prefix with Immutable: ImmutableTestSuite
Add an Attribute which describes the level of Immutablitiy. This is certainly less discoverable
Not much else.
My guess/hope is development tools will start helping this problem by making it easy to identify immutable types simply by sight (different color, stronger font, etc ...). But I think that's the answer though over changing all of the method names.
I think this may be one of those rare situations where it's acceptable to overload the + operator. In math terminology, we know that + doesn't append something to the end of something else. It always combines two values together and returns a new resulting value.
For example, it's intuitively obvious that when you say
x = 2 + 2;
the resulting value of x is 4, not 22.
Similarly,
var empty = new ImmutableList<string>();
var list1 = empty + "Hello";
var list2 = list1 + "immutable";
var list3 = list2 + "word";
should make clear what each variable is going to hold. It should be clear that list2 is not changed in the last line, but instead that list3 is assigned the result of appending "word" to list2.
Otherwise, I would just name the function Plus().
To be as clear as possible, you might want to go with the wordier CopyAndAdd, or something similar.
I would call it Extend() or maybe ExtendWith() if you feel like really verbose.
Extends means adding something to something else without changing it. I think this is very relevant terminology in C# since this is similar to the concept of extension methods - they "add" a new method to a class without "touching" the class itself.
Otherwise, if you really want to emphasize that you don't modify the original object at all, using some prefix like Get- looks like unavoidable to me.
Added(), Appended()
I like to use the past tense for operations on immutable objects. It conveys the idea that you aren't changing the original object, and it's easy to recognize when you see it.
Also, because mutating method names are often present-tense verbs, it applies to most of the immutable-method-name-needed cases you run into. For example an immutable stack has the methods "pushed" and "popped".
I like mmyers suggestion of CopyAndAdd. In keeping with a "mutation" theme, maybe you could go with Bud (asexual reproduction), Grow, Replicate, or Evolve? =)
EDIT: To continue with my genetic theme, how about Procreate, implying that a new object is made which is based on the previous one, but with something new added.
This is probably a stretch, but in Ruby there is a commonly used notation for the distinction: add doesn't mutate; add! mutates. If this is an pervasive problem in your project, you could do that too (not necessarily with non-alphabetic characters, but consistently using a notation to indicate mutating/non-mutating methods).
Join seems appropriate.
Maybe the confusion stems from the fact that you want two operations in one. Why not separate them? DSL style:
var list = new ImmutableList<string>("Hello");
var list2 = list.Copy().With("World!");
Copy would return an intermediate object, that's a mutable copy of the original list. With would return a new immutable list.
Update:
But, having an intermediate, mutable collection around is not a good approach. The intermediate object should be contained in the Copy operation:
var list1 = new ImmutableList<string>("Hello");
var list2 = list1.Copy(list => list.Add("World!"));
Now, the Copy operation takes a delegate, which receives a mutable list, so that it can control the copy outcome. It can do much more than appending an element, like removing elements or sorting the list. It can also be used in the ImmutableList constructor to assemble the initial list without intermediary immutable lists.
public ImmutableList<T> Copy(Action<IList<T>> mutate) {
if (mutate == null) return this;
var list = new List<T>(this);
mutate(list);
return new ImmutableList<T>(list);
}
Now there's no possibility of misinterpretation by the users, they will naturally fall into the pit of success.
Yet another update:
If you still don't like the mutable list mention, even now that it's contained, you can design a specification object, that will specify, or script, how the copy operation will transform its list. The usage will be the same:
var list1 = new ImmutableList<string>("Hello");
// rules is a specification object, that takes commands to run in the copied collection
var list2 = list1.Copy(rules => rules.Append("World!"));
Now you can be creative with the rules names and you can only expose the functionality that you want Copy to support, not the entire capabilities of an IList.
For the chaining usage, you can create a reasonable constructor (which will not use chaining, of course):
public ImmutableList(params T[] elements) ...
...
var list = new ImmutableList<string>("Hello", "immutable", "World");
Or use the same delegate in another constructor:
var list = new ImmutableList<string>(rules =>
rules
.Append("Hello")
.Append("immutable")
.Append("World")
);
This assumes that the rules.Append method returns this.
This is what it would look like with your latest example:
var suite = new TestSuite<string, int>(x => x.Length);
var otherSuite = suite.Copy(rules =>
rules
.Append(x => Int32.Parse(x))
.Append(x => x.GetHashCode())
);
A few random thoughts:
ImmutableAdd()
Append()
ImmutableList<T>(ImmutableList<T> originalList, T newItem) Constructor
DateTime in C# uses Add. So why not use the same name? As long the users of your class understand the class is immutable.
I think the key thing you're trying to get at that's hard to express is the nonpermutation, so maybe something with a generative word in it, something like CopyWith() or InstancePlus().
I don't think the English language will let you imply immutability in an unmistakable way while using a verb that means the same thing as "Add". "Plus" almost does it, but people can still make the mistake.
The only way you're going to prevent your users from mistaking the object for something mutable is by making it explicit, either through the name of the object itself or through the name of the method (as with the verbose options like "GetCopyWith" or "CopyAndAdd").
So just go with your favourite, "Plus."
First, an interesting starting point:
http://en.wikipedia.org/wiki/Naming_conventions_(programming) ...In particular, check the "See Also" links at the bottom.
I'm in favor of either Plus or And, effectively equally.
Plus and And are both math-based in etymology. As such, both connote mathematical operation; both yield an expression which reads naturally as expressions which may resolve into a value, which fits with the method having a return value. And bears additional logic connotation, but both words apply intuitively to lists. Add connotes action performed on an object, which conflicts with the method's immutable semantics.
Both are short, which is especially important given the primitiveness of the operation. Simple, frequently-performed operations deserve shorter names.
Expressing immutable semantics is something I prefer to do via context. That is, I'd rather simply imply that this entire block of code has a functional feel; assume everything is immutable. That might just be me, however. I prefer immutability to be the rule; if it's done, it's done a lot in the same place; mutability is the exception.
How about Chain() or Attach()?
I prefer Plus (and Minus). They are easily understandable and map directly to operations involving well known immutable types (the numbers). 2+2 doesn't change the value of 2, it returns a new, equally immutable, value.
Some other possibilities:
Splice()
Graft()
Accrete()
How about mate, mateWith, or coitus, for those who abide. In terms of reproducing mammals are generally considered immutable.
Going to throw Union out there too. Borrowed from SQL.
Apparently I'm the first Obj-C/Cocoa person to answer this question.
NNString *empty = [[NSString alloc] init];
NSString *list1 = [empty stringByAppendingString:#"Hello"];
NSString *list2 = [list1 stringByAppendingString:#"immutable"];
NSString *list3 = [list2 stringByAppendingString:#"word"];
Not going to win any code golf games with this.
I think "Add" or "Plus" sounds fine. The name of the list itself should be enough to convey the list's immutability.
Maybe there are some words which remember me more of making a copy and add stuff to that instead of mutating the instance (like "Concatenate"). But i think having some symmetry for those words for other actions would be good to have too. I don't know of a similar word for "Remove" that i think of the same kind like "Concatenate". "Plus" sounds little strange to me. I wouldn't expect it being used in a non-numerical context. But that could aswell come from my non-english background.
Maybe i would use this scheme
AddToCopy
RemoveFromCopy
InsertIntoCopy
These have their own problems though, when i think about it. One could think they remove something or add something to an argument given. Not sure about it at all. Those words do not play nice in chaining either, i think. Too wordy to type.
Maybe i would just use plain "Add" and friends too. I like how it is used in math
Add 1 to 2 and you get 3
Well, certainly, a 2 remains a 2 and you get a new number. This is about two numbers and not about a list and an element, but i think it has some analogy. In my opinion, add does not necessarily mean you mutate something. I certainly see your point that having a lonely statement containing just an add and not using the returned new object does not look buggy. But I've now also thought some time about that idea of using another name than "add" but i just can't come up with another name, without making me think "hmm, i would need to look at the documentation to know what it is about" because its name differs from what I would expect to be called "add". Just some weird thought about this from litb, not sure it makes sense at all :)
Looking at http://thesaurus.reference.com/browse/add and http://thesaurus.reference.com/browse/plus I found gain and affix but I'm not sure how much they imply non-mutation.
I think that Plus() and Minus() or, alternatively, Including(), Excluding() are reasonable at implying immutable behavior.
However, no naming choice will ever make it perfectly clear to everyone, so I personally believe that a good xml doc comment would go a very long way here. VS throws these right in your face when you write code in the IDE - they're hard to ignore.
Append - because, note that names of the System.String methods suggest that they mutate the instance, but they don't.
Or I quite like AfterAppending:
void test()
{
Bar bar = new Bar();
List list = bar.AfterAppending("foo");
}
list.CopyWith(element)
As does Smalltalk :)
And also list.copyWithout(element) that removes all occurrences of an element, which is most useful when used as list.copyWithout(null) to remove unset elements.
I would go for Add, because I can see the benefit of a better name, but the problem would be to find different names for every other immutable operation which might make the class quite unfamiliar if that makes sense.
What rules do you use to name your variables?
Where are single letter vars allowed?
How much info do you put in the name?
How about for example code?
What are your preferred meaningless variable names? (after foo & bar)
Why are they spelled "foo" and "bar" rather than FUBAR
function startEditing(){
if (user.canEdit(currentDocument)){
editorControl.setEditMode(true);
setButtonDown(btnStartEditing);
}
}
Should read like a narrative work.
One rule I always follow is this: if a variable encodes a value that is in some particular units, then those units have to be part of the variable name. Example:
int postalCodeDistanceMiles;
decimal reactorCoreTemperatureKelvin;
decimal altitudeMsl;
int userExperienceWongBakerPainScale
I will NOT be responsible for crashing any Mars landers (or the equivalent failure in my boring CRUD business applications).
Well it all depends on the language you are developing in. As I am currently using C# I tend you use the following.
camelCase for variables.
camelCase for parameters.
PascalCase for properties.
m_PascalCase for member variables.
Where are single letter vars allows?
I tend to do this in for loops but feel a bit guilty whenever I do so. But with foreach and lambda expressions for loops are not really that common now.
How much info do you put in the name?
If the code is a bit difficult to understand write a comment. Don't turn a variable name into a comment, i.e .
int theTotalAccountValueIsStoredHere
is not required.
what are your preferred meaningless variable names? (after foo & bar)
i or x. foo and bar are a bit too university text book example for me.
why are they spelled "foo" and "bar" rather than FUBAR?
Tradition
These are all C# conventions.
Variable-name casing
Case indicates scope. Pascal-cased variables are fields of the owning class. Camel-cased variables are local to the current method.
I have only one prefix-character convention. Backing fields for class properties are Pascal-cased and prefixed with an underscore:
private int _Foo;
public int Foo { get { return _Foo; } set { _Foo = value; } }
There's some C# variable-naming convention I've seen out there - I'm pretty sure it was a Microsoft document - that inveighs against using an underscore prefix. That seems crazy to me. If I look in my code and see something like
_Foo = GetResult();
the very first thing that I ask myself is, "Did I have a good reason not to use a property accessor to update that field?" The answer is often "Yes, and you'd better know what that is before you start monkeying around with this code."
Single-letter (and short) variable names
While I tend to agree with the dictum that variable names should be meaningful, in practice there are lots of circumstances under which making their names meaningful adds nothing to the code's readability or maintainability.
Loop iterators and array indices are the obvious places to use short and arbitrary variable names. Less obvious, but no less appropriate in my book, are nonce usages, e.g.:
XmlWriterSettings xws = new XmlWriterSettings();
xws.Indent = true;
XmlWriter xw = XmlWriter.Create(outputStream, xws);
That's from C# 2.0 code; if I wrote it today, of course, I wouldn't need the nonce variable:
XmlWriter xw = XmlWriter.Create(
outputStream,
new XmlWriterSettings() { Indent=true; });
But there are still plenty of places in C# code where I have to create an object that you're just going to pass elsewhere and then throw away.
A lot of developers would use a name like xwsTemp in those circumstances. I find that the Temp suffix is redundant. The fact that I named the variable xws in its declaration (and I'm only using it within visual range of that declaration; that's important) tells me that it's a temporary variable.
Another place I'll use short variable names is in a method that's making heavy use of a single object. Here's a piece of production code:
internal void WriteXml(XmlWriter xw)
{
if (!Active)
{
return;
}
xw.WriteStartElement(Row.Table.TableName);
xw.WriteAttributeString("ID", Row["ID"].ToString());
xw.WriteAttributeString("RowState", Row.RowState.ToString());
for (int i = 0; i < ColumnManagers.Length; i++)
{
ColumnManagers[i].Value = Row.ItemArray[i];
xw.WriteElementString(ColumnManagers[i].ColumnName, ColumnManagers[i].ToXmlString());
}
...
There's no way in the world that code would be easier to read (or safer to modify) if I gave the XmlWriter a longer name.
Oh, how do I know that xw isn't a temporary variable? Because I can't see its declaration. I only use temporary variables within 4 or 5 lines of their declaration. If I'm going to need one for more code than that, I either give it a meaningful name or refactor the code using it into a method that - hey, what a coincidence - takes the short variable as an argument.
How much info do you put in the name?
Enough.
That turns out to be something of a black art. There's plenty of information I don't have to put into the name. I know when a variable's the backing field of a property accessor, or temporary, or an argument to the current method, because my naming conventions tell me that. So my names don't.
Here's why it's not that important.
In practice, I don't need to spend much energy figuring out variable names. I put all of that cognitive effort into naming types, properties and methods. This is a much bigger deal than naming variables, because these names are very often public in scope (or at least visible throughout the namespace). Names within a namespace need to convey meaning the same way.
There's only one variable in this block of code:
RowManager r = (RowManager)sender;
// if the settings allow adding a new row, add one if the context row
// is the last sibling, and it is now active.
if (Settings.AllowAdds && r.IsLastSibling && r.Active)
{
r.ParentRowManager.AddNewChildRow(r.RecordTypeRow, false);
}
The property names almost make the comment redundant. (Almost. There's actually a reason why the property is called AllowAdds and not AllowAddingNewRows that a lot of thought went into, but it doesn't apply to this particular piece of code, which is why there's a comment.) The variable name? Who cares?
Pretty much every modern language that had wide use has its own coding standards. These are a great starting point. If all else fails, just use whatever is recommended. There are exceptions of course, but these are general guidelines. If your team prefers certain variations, as long as you agree with them, then that's fine as well.
But at the end of the day it's not necessarily what standards you use, but the fact that you have them in the first place and that they are adhered to.
I only use single character variables for loop control or very short functions.
for(int i = 0; i< endPoint; i++) {...}
int max( int a, int b) {
if (a > b)
return a;
return b;
}
The amount of information depends on the scope of the variable, the more places it could be used, the more information I want to have the name to keep track of its purpose.
When I write example code, I try to use variable names as I would in real code (although functions might get useless names like foo or bar).
See Etymology of "Foo"
What rules do you use to name your variables?
Typically, as I am a C# developer, I follow the variable naming conventions as specified by the IDesign C# Coding Standard for two reasons
1) I like it, and find it easy to read.
2) It is the default that comes with the Code Style Enforcer AddIn for Visual Studio 2005 / 2008 which I use extensively these days.
Where are single letter vars allows?
There are a few places where I will allow single letter variables. Usually these are simple loop indexers, OR mathematical concepts like X,Y,Z coordinates. Other than that, never! (Everywhere else I have used them, I have typically been bitten by them when rereading the code).
How much info do you put in the name?
Enough to know PRECISELY what the variable is being used for. As Robert Martin says:
The name of a variable, function, or
class, should answer all the big
questions. It should tell you why it
exists, what it does, and how it is
used. If a name requires a comment,
then the name does not reveal its
intent.
From Clean Code - A Handbook of Agile Software Craftsmanship
I never use meaningless variable names like foo or bar, unless, of course, the code is truly throw-away.
For loop variables, I double up the letter so that it's easier to search for the variable within the file. For example,
for (int ii=0; ii < array.length; ii++)
{
int element = array[ii];
printf("%d", element);
}
What rules do you use to name your variables? I've switched between underscore between words (load_vars), camel casing (loadVars) and no spaces (loadvars). Classes are always CamelCase, capitalized.
Where are single letter vars allows? Loops, mostly. Temporary vars in throwaway code.
How much info do you put in the name? Enough to remind me what it is while I'm coding. (Yes this can lead to problems later!)
what are your preferred meaningless variable names? (after foo & bar) temp, res, r. I actually don't use foo and bar a good amount.
What rules do you use to name your variables?
I need to be able to understand it in a year's time. Should also conform with preexisting style.
Where are single letter vars allows?
ultra-obvious things. E.g. char c; c = getc(); Loop indicies(i,j,k).
How much info do you put in the name?
Plenty and lots.
how about for example code?
Same as above.
what are your preferred meaningless variable names? (after foo & bar)
I don't like having meaningless variable names. If a variable doesn't mean anything, why is it in my code?
why are they spelled "foo" and "bar" rather than FUBAR
Tradition.
The rules I adhere to are;
Does the name fully and accurately describe what the variable represents?
Does the name refer to the real-world problem rather than the programming language solution?
Is the name long enough that you don't have to puzzle it out?
Are computed value qualifiers, if any, at the end of the name?
Are they specifically instantiated only at the point once required?
What rules do you use to name your variables?
camelCase for all important variables, CamelCase for all classes
Where are single letter vars allows?
In loop constructs and in mathematical funktions where the single letter var name is consistent with the mathematical definition.
How much info do you put in the name?
You should be able to read the code like a book. Function names should tell you what the function does (scalarProd(), addCustomer(), etc)
How about for example code?
what are your preferred meaningless variable names? (after foo & bar)
temp, tmp, input, I never really use foo and bar.
I would say try to name them as clearly as possible. Never use single letter variables and only use 'foo' and 'bar' if you're just testing something out (e.g., in interactive mode) and won't use it in production.
I like to prefix my variables with what they're going to be: str = String, int = Integer, bool = Boolean, etc.
Using a single letter is quick and easy in Loops: For i = 0 to 4...Loop
Variables are made to be a short but descriptive substitute for what you're using. If the variable is too short, you might not understand what it's for. If it's too long, you'll be typing forever for a variable that represents 5.
Foo & Bar are used for example code to show how the code works. You can use just about any different nonsensical characters to use instead. I usually just use i, x, & y.
My personal opinion of foo bar vs. fu bar is that it's too obvious and no one likes 2-character variables, 3 is much better!
In DSLs and other fluent interfaces often variable- and method-name taken together form a lexical entity. For example, I personally like the (admittedly heretic) naming pattern where the verb is put into the variable name rather than the method name. #see 6th Rule of Variable Naming
Also, I like the spartan use of $ as variable name for the main variable of a piece of code. For example, a class that pretty prints a tree structure can use $ for the StringBuffer inst var. #see This is Verbose!
Otherwise I refer to the Programmer's Phrasebook by Einar Hoest. #see http://www.nr.no/~einarwh/phrasebook/
I always use single letter variables in for loops, it's just nicer-looking and easier to read.
A lot of it depends on the language you're programming in too, I don't name variables the same in C++ as I do in Java (Java lends itself better to the excessively long variable names imo, but this could just a personal preference. Or it may have something to do with how Java built-ins are named...).
locals: fooBar;
members/types/functions FooBar
interfaces: IFooBar
As for me, single letters are only valid if the name is classic; i/j/k for only for local loop indexes, x,y,z for vector parts.
vars have names that convey meaning but are short enough to not wrap lines
foo,bar,baz. Pickle is also a favorite.
I learned not to ever use single-letter variable names back in my VB3 days. The problem is that if you want to search everywhere that a variable is used, it's kinda hard to search on a single letter!
The newer versions of Visual Studio have intelligent variable searching functions that avoid this problem, but old habits and all that. Anyway, I prefer to err on the side of ridiculous.
for (int firstStageRocketEngineIndex = 0; firstStageRocketEngineIndex < firstStageRocketEngines.Length; firstStageRocketEngineIndex++)
{
firstStageRocketEngines[firstStageRocketEngineIndex].Ignite();
Thread.Sleep(100); // Don't start them all at once. That would be bad.
}
It's pretty much unimportant how you name variables. You really don't need any rules, other than those specified by the language, or at minimum, those enforced by your compiler.
It's considered polite to pick names you think your teammates can figure out, but style rules don't really help with that as much as people think.
Since I work as a contractor, moving among different companies and projects, I prefer to avoid custom naming conventions. They make it more difficult for a new developer, or a maintenance developer, to become acquainted with (and follow) the standard being used.
So, while one can find points in them to disagree with, I look to the official Microsoft Net guidelines for a consistent set of naming conventions.
With some exceptions (Hungarian notation), I think consistent usage may be more useful than any arbitrary set of rules. That is, do it the same way every time.
.
I work in MathCAD and I'm happy because MathCAD gives me increadable possibilities in naming and I use them a lot. And I can`t understand how to programm without this.
To differ one var from another I have to include a lot of information in the name,for example:
1.On the first place - that is it -N for quantity,F for force and so on
2.On the second - additional indices - for direction of force for example
3.On the third - indexation inside vector or matrix var,for convinience I put var name in {} or [] brackets to show its dimensions.
So,as conclusion my var name is like
N.dirs / Fx i.row / {F}.w.(i,j.k) / {F}.w.(k,i.j).
Sometimes I have to add name of coordinate system for vector values
{F}.{GCS}.w.(i,j.k) / {F}.{LCS}.w.(i,j.k)
And as final step I add name of the external module in BOLD at the end of external function or var like Row.MTX.f([M]) because MathCAD doesn't have help string for function.
Use variables that describes clearly what it contains. If the class is going to get big, or if it is in the public scope the variable name needs to be described more accurately. Of course good naming makes you and other people understand the code better.
for example: use "employeeNumber" insetead of just "number".
use Btn or Button in the end of the name of variables reffering to buttons, str for strings and so on.
Start variables with lower case, start classes with uppercase.
example of class "MyBigClass", example of variable "myStringVariable"
Use upper case to indicate a new word for better readability. Don't use "_", because it looks uglier and takes longer time to write.
for example: use "employeeName".
Only use single character variables in loops.
Updated
First off, naming depends on existing conventions, whether from language, framework, library, or project. (When in Rome...) Example: Use the jQuery style for jQuery plugins, use the Apple style for iOS apps. The former example requires more vigilance (since JavaScript can get messy and isn't automatically checked), while the latter example is simpler since the standard has been well-enforced and followed. YMMV depending on the leaders, the community, and especially the tools.
I will set aside all my naming habits to follow any existing conventions.
In general, I follow these principles, all of which center around programming being another form of interpersonal communication through written language.
Readability - important parts should have solid names; but these names should not be a replacement for proper documentation of intent. The test for code readability is if you can come back to it months later and still be understanding enough to not toss the entire thing upon first impression. This means avoiding abbreviation; see the case against Hungarian notation.
Writeability - common areas and boilerplate should be kept simple (esp. if there's no IDE), so code is easier and more fun to write. This is a bit inspired by Rob Pyke's style.
Maintainability - if I add the type to my name like arrItems, then it would suck if I changed that property to be an instance of a CustomSet class that extends Array. Type notes should be kept in documentation, and only if appropriate (for APIs and such).
Standard, common naming - For dumb environments (text editors): Classes should be in ProperCase, variables should be short and if needed be in snake_case and functions should be in camelCase.
For JavaScript, it's a classic case of the restraints of the language and the tools affecting naming. It helps to distinguish variables from functions through different naming, since there's no IDE to hold your hand while this and prototype and other boilerplate obscure your vision and confuse your differentiation skills. It's also not uncommon to see all the unimportant or globally-derived vars in a scope be abbreviated. The language has no import [path] as [alias];, so local vars become aliases. And then there's the slew of different whitespacing conventions. The only solution here (and anywhere, really) is proper documentation of intent (and identity).
Also, the language itself is based around function level scope and closures, so that amount of flexibility can make blocks with variables in 2+ scope levels feel very messy, so I've seen naming where _ is prepended for each level in the scope chain to the vars in that scope.
I do a lot of php in nowadays, It was not always like that though and I have learned a couple of tricks when it comes to variable naming.
//this is my string variable
$strVar = "";
//this would represent an array
$arrCards = array();
//this is for an integer
$intTotal = NULL:
//object
$objDB = new database_class();
//boolean
$blValid = true;