I often find myself trying to come up with good names for complementary pairs of variables; where two variables denote opposing concepts, two participants in some sort of duologue, and so on.
This might be better explained by a counter-example - I maintain an app that prints two graphics as part of a print advertisement. They're stored in the database as TopLogo and LowerLogo, which I have to stop and double-check every time I use them because I'm expecting top to complement bottom, and lower should complement upper.
There's some obvious examples that I think work well:
client / server
source / target for copying/moving data or files from one variable to another
minimum / maximum
but there's some concepts that just don't lend themselves to such neat naming schemes. For example, when paging through records, does 'last' mean 'final' or 'previous' ? I recently saw some code that used firstPage, previousPage, nextPage and finalPage to avoid the ambiuous lastPage completely, which I thought was very beat, hence this question.
Do you have any particularly neat variable name pairs you'd care to share with us? (Bonus points if they're the same length, which makes the code so much neater in monospaced fonts.)
Like with all kinds of code style conventions, consistency is what you should strive for.
I would have the development team agree on "standard" pairs of prefixes for common scenarios like "source/destination" or "from/to" and then stick with them for the whole project. As long as every developer is aware of what is meant with a particular prefix in the codebase, it is easier to avoid misunderstandings.
Exceptions to the rule should be clarified in the documentation if the variable is part of a public API, or in comments within the code, if it's visibility is restricted to a single class or method.
In my databases you'll find many valid-state temporal ("history") tables containing a pair of columns named start_date and end_date. No bonus points for me, then, because I'd rather use the commonly used 'end' than try to come up with an intuitive alternative with the same number of characters as the word 'start'.
I tend to prefer these generic terms even when more context-specific terms may be viable e.g. preferring employee_start_date over employee_hire_date (what if their employment started for a reason other than being formally hiring e.g. their company was the subject of an acquisition). That said, I'd prefer person_birth_date over person_start_date :)
While one does try to be semantically coherent in obvious cases -- e.g., maximum goes with minimum, and not "lowest" -- in well-structured OO code (which isn't all code, I know) the problem disappears with a good IDE. Classes are short, methods are short, and variables are few in each method. So it doesn't matter what you call the variable pairs so long as they're clear. Your code might not look professional, but real quality is in the code, not in the look of your code.
The problem further disappears if there is good JavaDoc or whatever the documentation system is, and if have good Class names that go with them. For instance, if you have an instance of a Connection class and it has a method a method called setDestination, that's okay, but if you know that setDestination takes one parameter called destination and it's of the Server class, you're cool... even though you might prefer to call it target, aimHere, placeToSendTheData, or whatever (and the corresponding names, source, comingFromHere, and placeToGetTheDataFrom). Plus the doc system says what the thing is for, and that is priceless.
This next thing might sound stupid and I'm sure I'll get voted down here on StackOverflow, but unique non-professional sounding variable names have a great advantage: I know that my variables have names like placeWeWantTheDataToGo (and the IDE takes care of typing it), but the "serious" guys who do the JDK would never use such silly names. So I know immediately that the variable is one of mine. Incidentally, when I worked with developers in Spain and Italy, they write code with Spanish variable names (not always, but usually). This causes the same effect: we can quickly see that the Conexion class is ours, but the Connection class is not.
[Also, instead of typing your variable names, assign them a constant String somewhere in your code and use that, so if they called it lower or downer instead of low, you're still okay.]
Yes, I do try to name complementary sets of variables systematically so that the symmetry is clear. It is not always easy; sometimes, not even possible. Well, not possible using the rules I lay down for myself - which means I usually try to have the names the same length. The 'top' and 'lower' example would drive me batty (assuming I'm not batty already, which is far from certain); I'd probably use 'upper' and 'lower' because those are the same length; 'top' and 'bottom' would frustrate me too because of the difference in length.
Related
I have a GUI tool that manages state sequences. One component is a class that contains a set of states, your typical DFA state machine. For now, I'll call this a StateSet (I have a more specific name in mind for the actual class that makes sense, but this name I think will suffice for the purpose of this question.)
However, I have another class that has a collection (possibly partially unordered) of those state sets, and lists them in a particular order. and I'm trying to come up with a good name for it - not just for internal code, but for customers to refer to it.
The role of this particular second collection is to encapsulate the entire currently used/available collection of StateSets that the user has created. All of the StateSets will be used eventually in the application. A good analogy would be a hand of cards versus the entire table: The 'table' contains all of the currently available hands, while the 'hand' contains a particular collection of cards.
I've got these as starter ideas I could throw out for the class name; I'm not comfortable with either at the moment:
Sequence (maybe...with something else tacked on to the name)
StateSetSet (reasonable for code, but not for customers)
And as ewernli mentions, these are really technical terms, which don't really convey a the idea well. Any other suggestions or ideas?
Sequence - Definitely NOT. It's too generic, and doesn't have any real semantic meaning.
StateSetSet - While more semantically correct, this is confusing. You have a sequence, which implies order, which is different from a set, which does not.
That being said, the best option, IMO, is StateSetSequence, as it implies you have a sequence of StateSet instances.
What is the role/function of you StateSetSet?
StateSetSet or Sequence are technical terms.
Prefer a term that convey the role/function of the class.
That could well be something like History, Timeline, WorldSnapshot,...
EDIT
According to your updated description, StateSet looks to me like StateSpace (the space of all possible states). If the user can then interactively create something, it might be appropriate to speak of a Workspace. If the user creates various state spaces of interest, I would then go for StateSpaceWorkspace. Isn't that a cool name :)
"StateSets" may be sufficient.
Others:
StateSetList
StateSetLister
StateSetListing
StateSetSequencer
I like StateSetArrangement, implying an ordering without implying anything about the underlying storage mechanisms.
I always wondered: Are there any hard facts which would indicate that either shorter or longer identifiers are better?
Example:
clrscr()
opposed to
ClearScreen()
Short identifiers should be faster to read because there are fewer characters but longer identifiers often better resemble natural language and therefore also should be faster to read.
Are there other aspects which suggest either a short or a verbose style?
EDIT: Just to clarify: I didn't ask: "What would you do in this case?". I asked for reasons to prefer one over the other, i.e. this is not a poll question.
Please, if you can, add some reason on why one would prefer one style over the other.
I'd go for clarity over verbosity or brevit.
ClearScreen()
is easier to parse than
clrscr()
in my opinion, but
ClearScreenBeforeRerenderingPage()
is just noise.
Abbreviations put a much greater burden on the reader. They are ambiguous; they are indirect; they are harder to discriminate. They burden the writer, too, for s/he must always be asking, "was that Cmd for Command, or Cmnd... or Cm?". They clash - a given abbreviation rule could produce the same abbreviation for two (or more) different words.
Because they are ambiguous, the reader must take time to think about what word is intended; if the word itself is present, the reader need only think about its meaning.
Because they are indirect, they are analogous to a pointer - just as there's a little processing time consumed by every pointer dereference, there's a little (human) processing time consumed, and additional memory occupied, by every abbreviation.
Certainly .NET developers should be following the .NET Naming Guidelines.
This suggests that the full names should be used, not abbreviations:
Do not use abbreviations or contractions as parts of identifier names. For example, use GetWindow instead of GetWin.
Personally I like to try a follow the wisdom of Clean Code by Uncle Bob.
To me it suggests that Code should read like prose. By using descriptive names and ensuring that methods have a single responsibility we can write code that accurately describes the programmers intent obviating the need for comments (in most cases).
He goes on to make the observation that when we write code, we often spend 90% of the time reading the surrounding code and dependent code. Therefore by writing code that is inherently readable we can be far more productive in our writing of code.
I remember a talk from Larry Wall some time ago where he talked about the verbosity of identifiers when you have non-native English speakers in your team.
ClearScreenBeforeRerenderingPage()
parses fine if you're a native English speaker. However he suggests (and experience shows) that:
Clear_Screen_Before_Rerendering_Page()
is much better.
The effect is exacerbated when you don't use both upper and lower case.
My basic rule is that every line of code is written only once, but read 10, 100, or 1000 times. According to this, the effort of typing is totally irrelevant. All that counts is the effort to read something.
Of course, this alone still leaves enough room for subjective opinions (is clrscrn readable enough?), but at least is narrows the field somehow.
Please go directly to the relevant chapter of Steve McConnell's "Code Complete" (sanitised Amazon link).
Specifically, chapter 11, "The Power of Variable Names".
My personal preference is to have verbose public identifiers and short private ones.
By public I mean class names, method names, global variables and constants, packages, namespaces - in short anything that can be accessed from a large number of places and by large number of people.
By private I mean local variables, private members, sometimes parameters - stuff that is only accessed from inside limited local scope and by single developer only.
Also consider where it's going to be used, ClearScreen() is likely to appear on its own.
However you cringe when new programmers who have learned that the identifier must be easily readable, produce code like.
screenCoordinateVertical = gradientOfLine * screenCoordinateHoriontal + screenCoordinateOrigin;
instead of
y = m*x + C;
Every developer should know how to touch type. Adding a few extra characters is not a productivity issue unless you're a hunt and peck typist.
With that said, I agree with the previous comments about balance. As with so many answers here, "it depends". But I favor verbosity and clarity. Taking out vowels is for old DBAs.
Always using full words in identifiers also helps to maintain consistency.
With abbreviations there is always the question whether the word was abbreviated, and if yes how.
For example, right now I'm looking at code which has index abbreviated as ndx or idx in various places.
For very local context it is OK to abbreviate, but then I'd use only the first letter of each word to guarantee consistency. E.g. sb for StringBuilder.
As a programmer I do much more thinking while programming than typing. So the extra time typing a longer identifier is of no relevance. And today my IDE is supporting me, I now have only to type some letters and the IDE let me choose from legal identifiers. So the productivity-argument against longer identifiers is today more invalid than it was a few years ago.
On the other side you gain readability if you choose more meaningful identifiers. Since you will read source-code more often than writing it, this is very important. Another point is, that abbreviations often are ambiguous. So do you abbreviate number as no, or as num? That makes errors more probable, as you choose the wrong identifier.
I think you'll find precious few hard facts, but lot of opinion on this subject.
The Wikipedia page on this topic links to a paper on a cost/benefit analysis of identifier naming issues (External Links section), but no language I know of bases its official or accepted naming conventions on the basis of a "scientific" study.
Looking at code in a social context, you should follow the naming conventions imposed by:
The project
The company
The programming language
.. in that order.
It's really all about finding a balance between the two, that is easy enough to read while at the same time not overly verbose. Many people have a personal dislike for Java or Win32's elaborate function/class names, but many others dislike very short identifiers as well.
Most modern IDEs (and even older ones) have an auto-complete feature, so it doesn't really take more time to type a long identifier (once it is declared of course). So I'd go for clarity over brevity, it makes the program much easier to read and more self-explaining
Nothing wrong with a short identifier as long as its obvious what it means.
Personally I'd lean toward the latter because i prefer to be verbose (Though i try not to be so verbose as MS and their CoMarshallInterthreadInterfaceInStream function) but as long as your Clear Screen function is not called "F()" I don't see a problem :)
Naming conventions and coding style are often discussed topics.
That said, the naming conventions are always very subjective -- to people and platform.
Bottom line is always -- let things make sense (yes, very subjective).
Wikibook search -- Naming identifiers
Also one has to think of where the identifier is, the well known i as iteration counter is a valid example:
for(int i=0;i<10;i++){
doSomething();
}
If the context is simple the identifier should reflect this accordingly.
No one has mentioned the negative impact on readability of identifiers that are too long. Once you start making identifiers that are 20, 30, 40 or more characters long you cannot write a reasonable statement on one line of text that is readable. Lines of code should be limited to about 80 characters. Anything longer is impossible to read. That is why newspapers are printed in columns. Even this webpage keeps the text column narrow so that it can be read without having to scan back and forth.
I will expand here on a comment I made to When a method has too many parameters? where the OP was having minor problems with someone else's function which had 97 parameters.
I am a great believer in writing maintainable code (and it is often easier to write than to read, hence Steve McConnell(praise be upon his name)'s phrase "write only code").
Since statistics how that most car accidents happen at junctions and my experience (ymmv) shows that most "anomalies" occur at interfaces, I will list some things that I do to attempt to avoid misunderstandings at interfaces and invite your comments if I am going badly wrong.
But, more importantly, I invite your suggestions for making things even more prophylactic (see, there is a question after all - how to improve things?).
Adequate documentation, in the form of (up to date) DoxyGen format comments describing the nature and porpoise of each parameter.
absolutely NO back-door shenanigans with global variables as hidden parameters.
try to limit parameters to six or eight. If more, pass related parameters as a structure; if they are not related then seriously reconsider the function. If it needs so much information, is it too complex to maintain? Can it be broken down into several smaller functions?
use the CONST as often as possible and meaningful.
a coding standard that says that input parameters come first, then output only, and finally input/output, which are modified by the function.
I also #define some empty macros to make declarations even easier to read:
#define INPUT
#define OUTPUT
#define MODIFY
bool DoSomething(INPUT int howOften, MODIFY Wdiget *myWidget, OUTPUT WidgetPtr * const nextWidget)
Just a few ideas. How can I improve on these? Thanks.
Addressing your points in order:
Well-designed types usually render Doxygen format comments a waste of time.
While true as stated ("shenanigans" are bad by definition), not all use of globals is really as bad as many people imply. If you have to pass a parameter more than about four times before it's really used, chances are that a global will be less error prone.
Eight or even six parameters is usually excessive. Any more than two or three starts to indicate that the function is doing more than one thing. One obvious exception is a constructor that aggregates a number of other items into an object (e.g. an address object that takes a street name, number, city, country, postal code, etc., as inputs).
Better stated as "write const-correct code."
Given C++'s default parameter capability, it's generally best to sort in ascending order of likelihood to use a default value.
Don't. Just don't! If it's not obvious what are inputs and what are outputs, that pretty much proves that the basic design is fatally flawed.
As for ideas I think are actually good:
As implied in the first point, concentrate on types. Once you get them right, most of the other problems just disappear.
Use a few (even just one) central theme(s). For Lisp, everything is a list. For Unix, everything is a file (and files are all simple streams of bytes). Emulate this simplicity.
Edit: replying to comments:
While you do have something of a point, my experience still indicates that documentation produced with Doxygen (and similar such as javadoc) is almost universally useless. In theory the tool doesn't prevent decent documentation, but in fact it's rare at best.
Globals certainly can cause problems -- but I'm old enough to have used Fortran back before it provided much alternative, and with some care it really wasn't nearly as bad as many people imply. A lot of the stories seem to be at least third hand, with a bit of extra "spice" added each time they're re-told. I've seen one story that sounds a lot like an exaggerated version of one I told a couple decades ago or so...
Hm...Markdown formatting doesn't seem to approve of my skipping numbers.
And again...
My comment was specific to C++, but quite a few other languages also support default parameters and/or overloading, and it can apply about as well to most of them. Even without it, a call like f(param1, param2, 0,0,0); is pretty easy to see as having default parameters. To an extent, ordering by usage is handy, but when you do the order you pick doesn't matter nearly as much as simply being consistent.
True, a void * parameter doesn't tell you much -- but a MODIFY void * is little better. A real type and consistent use of const provides far more information and gets checked by the compiler. Other languages may not have/use const, but they probably don't have macros either. OTOH, some directly support what you want -- e.g., Ada has in, out and inout specifiers.
I am not sure we will end at a single point of agreement about this, everyone will come up with different ideas (good or bad in each others perspective). Having said that, i find Code Complete to be a good place to go to when I am stuck with this sort of problems.
A big peeve of mine is control coupling between functions. (Control coupling is when one module controls the execution flow of another, by passing flags telling the called function what to do.)
For example (cut & paste from code I just had to work on):
void UartEnable(bool enable, int baud);
as opposed to:
void UartEnable(int baud);
void UartDisable(void);
Put another way -- parameters are for passing "data", not "control".
I'd use the 'rule' put forward by Uncle Bob in his book Clean Code.
These the ones I think I remember:
2 parameters are ok, 3 are bad, more need refactoring
Comments are a sign of bad names. So there should be none, and the purpose of the function and the parameters should be clear from the names
make the method short. Aim for below 10 lines of code.
While filling in The Object Oriented Concepts Survey (To provide some academic researchers with real-life data on software design), I came upon this question:
What is the limit N of maximum methods you allow in your classes?
The survey then goes on asking if you refactor your classes once you reach this limit N.
I've honestly never thought about such a limit while designing my applications and wonder what the reasoning behind this is. Why would I want to self-impose myself an arbitrary number which probably is very dependent on the classes functionality?
You don't have to limit N of maximum. But you have to follow 'High Cohesion' principe. And don't create all-can-do-whatever-it-is classes.
I suppose there is some N after which you should start worrying. But it is really depends on the class itself and its primary goal.
The idea that there's a magic number that we can base a Rule on is the usual squeamishness from those whose desire to impose order on the universe outweighs their sense.
That said, if you have more than 20 or so methods in a class, there's a good chance it's doing too much and violating the SRP.
I wouldn't put an arbitrary limit on things either, but I would say that once a class has more than somewhere in the 10-20 public methods range I'd take a serious look at what that class is doing. Back in my J2EE days, we called them Enterprise Java Melons.
Same rule applies for the length of individual methods. I've seen classes that had only one or two methods, but each of those methods was hundreds of lines of code.
Since I started breaking classes down to a single responsibility, I don't usually approach a place where it gets questionable.
Also, a well-designed class may have 30 methods, and a poorly designed one may have 3 (Umm, 30 is pushing it, but the point is--this isn't necessarily even a good metric, kind of like counting kloc)
Your framework / language can necessitate a lot of methods without business logic too.
Counting the number of non-trivial methods with business logic in them might be interesting--I'd say around 4 or 5 would be appropriate.
I was surprised how many methods the JDK classes actually have in them when I was looking at the source code, but they are so well broken, so small and so easily read that it wasn't a problem at all to have 20.
Like others have pointed out there generally isn't some arbitrary number of methods at which point I'll say "That's too many methods!" Sometimes the opposite can be just as bad, such as when an object has a monolithic do-everything method that spans hundreds of lines.
That being said, if I open up a source file I haven't looked at before and see more than 10-20 methods I will probably scan through it to see if it can't be re-factored in some way.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How many parameters are too many?
I was just writing a function that took in several values and it got me thinking. When is the number number of arguments to a function / method too many? When (if) does it signal a flawed design? Do you design / refactor the function to take in structs, arrays, pointers, etc to decrease the amount of arguments? Do you refactor the data coming in just to decrease the number of arguments? It seems that this could be a little less applicable in OOP designs, though. Just curious to see how others view the issue.
EDIT: For reference the function I just wrote took in 5 parameters. I use the definition of several that my AP Econ teacher gave me. More than 2; less than 7.
I don't know, but I know it when I see it.
According to Steve McConnell in Code Complete, you should
Limit the number of a routine's
parameters to about seven
If you have to ask then that's probably too many.
I generally believe that if the parameters are functionally related (e.g., coordinates or color components), they should be encapsulated as a class for good measures.
Not that I always follow this myself ;)
Robert C. Martin (Uncle Bob) recommends 3 as a maximum in Clean Code: A Handbook of Agile Software Craftsmanship
I don't have the book with me at the moment but his reasoning has to do with one, two and, to a lesser extent, three argument functions reading well and clearly showing the purpose of the function.
This of course goes hand in hand with his recommendation of very short, well named functions that adhere to the Single Responsibility Principal.
Quick answer: When you have to stop and ask that question, you've got too many.
Personally I like to keep the number under six. If more is needed, then the solution depends on the problem. One approach is to use "setter" functions to give the values to an object that will eventually perform the function you desire. Another option is to use a struct, as you mentioned. Either way, you can't really go wrong.
Well it would most certainly depend on what your function is doing as far as how many would be considered "too many". Having said that, it is certainly possible to have a function with a lot of different parameters that are options on how to handle certain cases inside the function, and having overloads to those functions with sane default values for those options.
With the pervasiveness of Intellisense (or equivalent in other IDEs) and tooltips showing the comments from the XML Documentation in Visual Studio, I don't really think that there's a firm answer to this question.
Too much parameter is a "Code Smell".
You can divide into multiple methods or use class to regroup variable that have something in common.
To put a number for the "Too much" is something very subjective and depend of your organization and the language you use, A rule of thumb is that if you can't read the signature of your method and have an idea of what is it doing than you might have too much information. Personnaly, I try not to go over 5 parameters.
For me is 5.
It is hard to manage ( remember name, order, etc ) beyond that. Plus If I come that far I have versions with default values that call this one.
Depends on the Function as well, if your function requires heavy user intervention or variables, I wouldn't go past 7-8 range. As far as average number of parameters to go with, 5-6 is the sweet spot in my opinion. If you are using more than that you might want to consider class objects as parameters or other smaller functions.
It varies from person to person. Personally, when I have trouble immediately understanding what a function call is doing by reading the invocation in code, it is time to refactor to take the strain off of my gray cells.
I've heard that 7 figure as well, but I somehow feel that it stems from a time when all you could pass where primitive values.
Nowadays you can pass a reference to an object that encapsulates some complex state (and behaviour). Using 7 of those would definitely be too much.
My personal goal is to avoid using more than 4.
It depends strongly on the types of the arguments. If they are all integers then 2 can be too many. (how do I remember which order?) If any argument accepts null, then the number drops drastically.
The real answer comes from asking yourself:
how easy is it to understand calls when I'm reading code?
how easy is it to remember the correct arguments and argument order when writing code?
And it depends of the programming language.. In C, it's really not rare to see functions with 7 parameters.. However, in C#, I have rarely seen more than 5 parameters and I personally use less than 3 usually.
// In C
draw_dot(x, y, size, red, green, blue, alpha)
// In C#
Point point(x,y);
Color color(red,green,blue,alpha);
Tool.DrawDot(point, color);
I would say maximum 4 . Anything above , I think should be placed within a class .