Is `cd` being faster than `addpath` a motivation to use it? - cd

One of my answers was recently downvoted for suggesting use of cd(path_to_toolbox) rather than one of the path tools, such as addpath or rmpath. Given the fervent criticism I received I must imagine that there are very good reasons for using the path tools, presumably they are in some way more robust, especially when code is distributed to other systems.
Then I decided to clock the performance of cd versus addpath and was surprised to find the following result. Prior to each trial I cleared the workspace and created a string array with alternating paths:
clear
clc
p1 = 'c:\MATLAB7\toolbox\symbolic\#sym\';
p2 = matlabroot;
newpath = repmat(' ',100,100);
for ii=1:2:99
newpath(ii,1:length(p1)) = p1;
newpath(ii+1,1:length(p2)) = p2;
end
Then I ran either addpath or cd as follows:
tic
for ii=1:100
addpath(newpath(ii,:))
end
toc
Elapsed time is 13.437000 seconds.
tic
for ii=1:100
cd(newpath(ii,:))
end
toc
Elapsed time is 1.078000 seconds.
Any comments on whether there are conditions under which use of cd might be justified, for instance to set the path to a function (toolbox or otherwise), are appreciated. While it may be considered sloppy, I have used cd for many years and while the slowdown can be appreciable if used repeatedly, I find that if it is not used in highly iterated parts of a program the slowdown is worth the simplicity it brings to coding. Notably, addpath is not more complicated to use, but now I seem to have a real reason to prefer cd: it's actually faster.
Edit
As a postscript to this post I plead mea culpa to perverse use of cd (and in this example, addpath). There should however be room for such usage in what is a language that is frequently used for quick-and-dirty scripting. It should be kept in mind that there is a gradation of expertise among the users of matlab, and in some cases less "advanced" and seemingly sloppy programming techniques can in fact be construed as advantageous in the short term (if not the long term, or where version and directory structure management might become problematic).
As an appendix I include some links to posts on SO and beyond that address built-in function overriding, shadowing, and the like, where addpath (and I would argue cd too) can be used:
How to unhide an overriden function?
How to get a handle to an overriden built-in function?
How to wrap an already existing function with a new function of the same name
http://www.mathworks.in/matlabcentral/newsreader/view_thread/264354

Obviously as the path gets longer, there would be more locations MATLAB has to search to look for functions, scripts, classes, etc.. So I imagine it would have a negative impact on performance if you have a really long path
On the other hand, the current directory is just one location that has to be searched (respecting the order of precedence of course).
Plus it is not fair to compare the two, unless it is ok for you to put all your files in a single folder.
Just a note about your coding style: you could use a cellarray of strings rather than a char-matrix to store newpath:
newpath = cell(100,1);
for i=1:100
newpath{i} = '...';
end

I think that if you use cd to add something to the path like this, most of the disadvantages should be avoided:
function addpathwithcd(pathToAdd)
currentPath = pwd;
cd(pathToAdd);
cd(currentPath);
However, after doing a (very small) test, this does not seem to be faster for me than simply using addpath(pathToAdd).
Actually this is a bit of a suprise for me as you record a factor 13 speed difference while I only use CD twice, thus i expected a factor 6 speed difference or so.

Related

Explain the difference between Docstring and Comment with an appropriate example in python? [duplicate]

I'm a bit confused over the difference between docstrings and comments in python.
In my class my teacher introduced something known as a 'design recipe', a set of steps that will supposedly help us students plot and organize our coding better in Python. From what I understand, the below is an example of the steps we follow - this so call design recipe (the stuff in the quotations):
def term_work_mark(a0_mark, a1_mark, a2_mark, ex_mark, midterm_mark):
''' (float, float, float, float, float) -> float
Takes your marks on a0_mark, a1_mark, a2_mark, ex_mark and midterm_mark,
calculates their respective weight contributions and sums these
contributions to deliver your overall term mark out of a maximum of 55 (This
is because the exam mark is not taken account of in this function)
>>>term_work_mark(5, 5, 5, 5, 5)
11.8
>>>term_work_mark(0, 0, 0, 0, 0)
0.0
'''
a0_component = contribution(a0_mark, a0_max_mark, a0_weight)
a1_component = contribution(a1_mark, a1_max_mark, a1_weight)
a2_component = contribution(a2_mark, a2_max_mark, a2_weight)
ex_component = contribution(ex_mark, exercises_max_mark,exercises_weight)
mid_component = contribution(midterm_mark, midterm_max_mark, midterm_weight)
return (a0_component + a1_component + a2_component + ex_component +
mid_component)
As far as I understand this is basically a docstring, and in our version of a docstring it must include three things: a description, examples of what your function should do if you enter it in the python shell, and a 'type contract', a section that shows you what types you enter and what types the function will return.
Now this is all good and done, but our assignments require us to also have comments which explain the nature of our functions, using the token '#' symbol.
So, my question is, haven't I already explained what my function will do in the description section of the docstring? What's the point of adding comments if I'll essentially be telling the reader the exact same thing?
It appears your teacher is a fan of How to Design Programs ;)
I'd tackle this as writing for two different audiences who won't always overlap.
First there are the docstrings; these are for people who are going to be using your code without needing or wanting to know how it works. Docstrings can be turned into actual documentation. Consider the official Python documentation - What's available in each library and how to use it, no implementation details (Unless they directly relate to use)
Secondly there are in-code comments; these are to explain what is going on to people (generally you!) who want to extend the code. These will not normally be turned into documentation as they are really about the code itself rather than usage. Now there are about as many opinions on what makes for good comments (or lack thereof) as there are programmers. My personal rules of thumb for adding comments are to explain:
Parts of the code that are necessarily complex. (Optimisation comes to mind)
Workarounds for code you don't have control over, that may otherwise appear illogical
I'll admit to TODOs as well, though I try to keep that to a minimum
Where I've made a choice of a simpler algorithm where a better performing (but more complex) option can go if performance in that section later becomes critical
Since you're coding in an academic setting, and it sounds like your lecturer is going for verbose, I'd say just roll with it. Use code comments to explain how you are doing what you say you are doing in the design recipe.
I believe that it's worth to mention what PEP8 says, I mean, the pure concept.
Docstrings
Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.
Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the def line.
PEP 257 describes good docstring conventions. Note that most importantly, the """ that ends a multiline docstring should be on a line by itself, e.g.:
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""
For one liner docstrings, please keep the closing """ on the same line.
Comments
Block comments
Generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).
Paragraphs inside a block comment are separated by a line containing a single #.
Inline Comments
Use inline comments sparingly.
An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.
Inline comments are unnecessary and in fact distracting if they state the obvious.
Don't do this:
x = x + 1 # Increment x
But sometimes, this is useful:
x = x + 1 # Compensate for border
Reference
https://www.python.org/dev/peps/pep-0008/#documentation-strings
https://www.python.org/dev/peps/pep-0008/#inline-comments
https://www.python.org/dev/peps/pep-0008/#block-comments
https://www.python.org/dev/peps/pep-0257/
First of all, for formatting your posts you can use the help options above the text area you type your post.
And about comments and doc strings, the doc string is there to explain the overall use and basic information of the methods. On the other hand comments are meant to give specific information on blocks or lines, #TODO is used to remind you what you want to do in future, definition of variables and so on. By the way, in IDLE the doc string is shown as a tool tip when you hover over the method's name.
Quoting from this page http://www.pythonforbeginners.com/basics/python-docstrings/
Python documentation strings (or docstrings) provide a convenient way
of associating documentation with Python modules, functions, classes,
and methods.
An object's docsting is defined by including a string constant as the
first statement in the object's definition.
It's specified in source code that is used, like a comment, to
document a specific segment of code.
Unlike conventional source code comments the docstring should describe
what the function does, not how.
All functions should have a docstring
This allows the program to inspect these comments at run time, for
instance as an interactive help system, or as metadata.
Docstrings can be accessed by the __doc__ attribute on objects.
Docstrings can be accessed through a program (__doc__) where as inline comments cannot be accessed.
Interactive help systems like in bpython and IPython can use docstrings to display the docsting during the development. So that you dont have to visit the program everytime.

Having Multiple Commands for Calling a Specific Programming Language: To Provide a Delimiter-less Option or Not?

After re-reading the off/on topic lists, I'm still not certain if this question is best posted to this site, so apologies in advance, if it is not.
Overview:
I am working on a project that mixes several programming languages and we are trying to determine important considerations for the command used to call one in particular.
For definiteness, I will list the specific languages; however, I think the principles ought to be general, so familiarity with these specific languages is not really essential.
Specific Context
Specifically, we are using: Maxima, KaTeX, Markdown and HTML). While building the prototype, we have used the following (I believe, standard) conventions:
KaTeX delimited by $ $ or $$ $$;
HTML delimited by < > </ > pairs;
Markdown works anywhere in the body, except within KaTeX or Maxima environments;
The only non-standard convention we used during this design phase was to call on Maxima using \comp{<Maxima commands>}. This command works within all the other environments (which is desired).
Now that we are ready to start using the platform, it has become apparent that this temporary command for calling Maxima is cumbersome for our users. The vast majority of use cases involve simply calling a single variable or function, e.g.
As such, we have $\eval{function-name()}(\eval{variable-name})$
as opposed to actually using Maxima for computation, e.g.
Here, it is clear that $\eval{a} + \eval{b} = \eval{a+b}$
(where \eval{a+b} would return the actual sum, as calculated by Maxima).
As such, our users would prefer a delimiter-less command option for invoking a single variable or function, e.g. \#<variable-name-in-Maxima> and \#<function-name>(<argument>) (where # is some reserved character not used in the other languages), while also having a delimited alternative for the (much less frequent) cases where they actually want to use Maxima for computation; perhaps something like \#{a+b}.
However, we have a general sense that this is not a best practice, even though we can't foresee any specific issue.
"Research" / Comparisons:
Indeed, there is precedence for delimit-less expressions for single arguments like x^2 (on any calculator) or Knuth's a \over b in TeX (which persists in LaTeX with \frac12 being parsed as \frac{1}{2}.
IIRC Knuth's point was that this delimit-less notation was more semantic (and so, in his view, preferable), and because delimiters can be added, ambiguity can be avoided, whenever the need arises: e.g. x^{22}, {a+b}\over{c+d} and \frac{12}{3}.
The Question, Proper:
Can anyone point to or explain actual shortcomings / risks associated with a dual solution like:
\#<var>, \#<function>(<arg>) and,
\#[<extended expression>],
(where # is a reserved (& escapable) character), for calling one language amongst others, as opposed to only using a delimited command?
Any alternative suggestions for how to achieve the ease-of-use and more semantic code enabled by the above solution, while keeping the code unambiguous would be very much welcome and appreciated.

should I write more descriptive function names or add comments?

This is a language agnostic question, but I'm wandering what people prefer in terms of readability and maintainability... My hypothetical situation is that I'm writing a function which given a sequence will return a copy with all duplicate element removed and the order reversed.
/*
*This is an extremely well written function to return a sequence containing
*all the unique elements of OriginalSequence with their order reversed
*/
ReturnSequence SequenceFunction(OriginalSequence)
{...}
OR
UniqueAndReversedSequence MakeSequenceUniqueAndReversed(OriginalSequence)
{....}
The above is supposed to be a lucid example of using comments in the first instance or using very verbose function names in the second to describe the actions of the function.
Cheers,
Richard
I prefer the verbose function name as it make the call-site more readable. Of course, some function names (like your example) can get really long.
Perhaps a better name for your example function would be ReverseAndDedupe. Uh oh, now it is a little more clear that we have a function with two responsibilities*. Perhaps it would be even better to split this out into two functions: Reverse and Dedupe.
Now the call-site becomes even more readable:
Reverse(Dedupe(someSequence))
*Note: My rule of thumb is that any function that contains "and" in the name has too many responsibilities and needs to be split up in to separate functions.
Personally I prefer the second way - it's easy to see from the function name what it does - and because the code inside the function is well written anyway it'll be easy to work out exactly what happens inside it.
The problem I find with comments is they very quickly go out of date - there's no compile time check to ensure your comment is correct!
Also, you don't get access to the comment in the places where the function is actually called.
Very much a subjective question though!
Ideally you would do a combination of the two. Try to keep your method names concise but descriptive enough to get a good idea of what it's going to do. If there is any possibility of lack of clarity in the method name, you should have comments to assist the reader in the logic.
Even with descriptive names you should still be concise. I think what you have in the example is overkill. I would have written
UniqueSequence Reverse(Sequence)
I comment where there's an explanation in order that a descriptive name cannot adequately convey. If there's a peculiarity with a library that forced me to do something that appears non-standard or value in dropping a comment inline, I'll do that but otherwise I rely upon well-named methods and don't comment things a lot - except while I'm writing the code, and those are for myself. They get removed when it is done, typically.
Generally speaking, function header comments are just more lines to maintain and require the reader to look at both the comment and the code and then decide which is correct if they aren't in correspondence. Obviously the truth is always in the code. The comment may say X but comments don't compile to machine code (typically) so...
Comment when necessary and make a habit of naming things well. That's what I do.
I'd probably do one of these:
Call it ReverseAndDedupe (or DedupeAndReverse, depending which one it is -- I'd expect Dedupe alone to keep the first occurrence and discard later ones, so the two operations do not commute). All functions make some postcondition true, so Make can certainly go in order to shorten a too-long name. Functions don't generally need to be named for the types they operate on, and if they are then it should be in a consistent format. So Sequence can probably be removed from your proposed name too, or if it can't then I'd probably call it Sequence_ReverseAndDedupe.
Not create this function at all, make sure that callers can either do Reverse(Dedupe(x)) or Dedupe(Reverse(x)), depending which they actually want. It's no more code for them to write, so only an issue of whether there's some cunning optimization that only applies when you do both at once. Avoiding an intermediate copy might qualify there, but the general point is that if you can't name your function concisely, make sure there's a good reason why it's doing so many different things.
Call it ReversedAndDeduped if it returns a copy of the original sequence - this is a trick I picked up from Python, where l.sort() sorts the list l in place, and sorted(l) doesn't modify a list l at all.
Give it a name specific to the domain it's used in, rather than trying to make it so generic. Why am I deduping and reversing this list? There might be some term of art that means a list in that state, or some function which can only be performed on such a list. So I could call it 'Renuberate' (because a reversed, deduped list is known as a list "in Renuberated form", or 'MakeFrobbable' (because Frobbing requires this format).
I'd also comment it (or much better, document it), to explain what type of deduping it guarantees (if any - perhaps the implementation is left free to remove whichever dupes it likes so long as it gets them all).
I wouldn't comment it "extremely well written", although I might comment "highly optimized" to mean "this code is really hard to work with, but goes like the clappers, please don't touch it without running all the performance tests".
I don't think I'd want to go as far as 5-word function names, although I expect I have in the past.

Is hard-coding literals ever acceptable?

The code base I'm currently working on is littered with hard-coded values.
I view all hard coded values as a code smell and I try to eliminate them where possible...however there are some cases that I am unsure about.
Here are two examples that I can think of that make me wonder what the best practice is:
1. MyTextBox.Text = someCondition ? "Yes" : "No"
2. double myPercentage = myValue / 100;
In the first case, is the best thing to do to create a class that allows me to do MyHelper.Yes and MyHelper.No or perhaps something similar in a config file (though it isn't likely to change and who knows if there might ever be a case where its usage would be case sensitive).
In the second case, finding a percentage by dividing by 100 isn't likely to ever change unless the laws of mathematics change...but I still wonder if there is a better way.
Can anyone suggest an appropriate way to deal with this sort of hard coding? And can anyone think of any places where hard coding is an acceptable practice?
And can anyone think of any places where hard coding is an acceptable practice?
Small apps
Single man projects
Throw aways
Short living projects
For short anything that won't be maintained by others.
Gee I've just realized how much being maintainer coder hurt me in the past :)
The real question isn't about hard coding, but rather repetition. If you take the excellent advice found in "The Pragmatic Programmer", simply Don't Repeat Yourself (DRY).
Taking the principle of DRY, it is fine to hardcode something at any point. However, once you use that particular value again, refactor so this value is only hardcoded once.
Of course hard-coding is sometimes acceptable. Following dogma is rarely as useful a practice as using your brain.
(For an example of this, perhaps it's interesting to go back to the goto wars. How many programmers do you know that will swear by all things holy that goto is evil? Why then does Steve McConnell devote a dozen pages to a measured discussion of the subject in Code Complete?)
Sure, there's a lot of hard-gained experience that tells us that small throw-away applications often mutate into production code, but that's no reason for zealotry. The agilists tell us we should do the simplest thing that could possibly work and refactor when needed.
That's not to say that the "simplest thing" shouldn't be readable code. It may make perfect sense, even in a throw-away spike to write:
const MAX_CACHE_RECORDS = 50
foo = GetNewCache(MAX_CACHE_RECORDS)
This is regardless of the fact that in three iterations time, someone might ask for the number of cache records to be configurable, and you might end up refactoring the constant away.
Just remember, if you go to the extremes of stuff like
const ONE_HUNDRED = 100
const ONE_HUNDRED_AND_ONE = 101
we'll all come to The Daily WTF and laugh at you. :-)
Think! That's all.
It's never good and you just proved it...
double myPercentage = myValue / 100;
This is NOT percentage. What you wanted to write is :
double myPercentage = (myValue / 100) * 100;
Or more correctly :
double myPercentage = (myValue / myMaxValue) * 100;
But this hard coded 100 messed with your mind... So go for the getPercentage method that Colen suggested :)
double getpercentage(double myValue, double maxValue)
{
return (myValue / maxValue) * 100;
}
Also as ctacke suggested, in the first case you will be in a world of pain if you ever need to localize these literals. It's never too much trouble to add a couple more variables and/or functions
The first case will kill you if you ever need to localize. Moving it to some static or constant that is app-wide would at least make localizing it a little easier.
Case 1: When should you hard-code stuff: when you have no reason to think that it will ever change. That said, you should NEVER hard code stuff in-line. Take the time to make static variables or global variables or whatever your language gives you. Do them in the class in question, and if you notice that two classes or areas of your code share the same value FOR THE SAME REASON (meaning it's not just coincidence), point them to the same place.
Case 2: For case case 2, you're correct: the laws of "percentage" will not change (being reasonable, here), so you can hard code inline.
Case 3: The third case is where you think the thing could change but you don't want to/have time to bother loading ResourceBundles or XML or whatever. In that case, you use whatever centralizing mechanism you can -- the hated Singleton class is a good one -- and go with that until you actually have need to deal with the problem.
The third case is tricky, though: it's extraordinarily hard to internationalize an application without really doing it... so you will want to hard-code stuff and just hope that, when the i18n guys come knocking, your code is not the worst-tasting code around :)
Edit: Let me mention that I've just finished a refactoring project in which the prior developer had placed the MySql connect strings in 100+ places in the code (PHP). Sometimes they were uppercase, sometimes they were lower case, etc., so they were hard to search and replace (though Netbeans and PDT did help a lot). There are reasons why he/she did this (a project called POG basically forces this stupidity), but there is just nothing that seems less like good code than repeating the same thing in a million places.
The better way for your second example would be to define an inline function:
double getpercentage(double myValue)
{
return(myValue / 100);
}
...
double myPercentage = getpercentage(myValue);
That way it's a lot more obvious what you're doing.
Hardcoded literals should appear in unit tests for the test values, unless there is so much reuse of a value within a single test class that a local constant is useful.
The unit tests are a description of expected values without any abstraction or redirection.
Imagine yourself reading the test - you want the information literally in front of you.
The only time I use constants for test values is when many tests repeat a value (itself a bit suspicious) and the value may be subject to change.
I do use constants for things like names of test files to compare.
I don't think that your second is really an example of hardcoding. That's like having a Halve() method that takes in a value to use to divide by; doesn't make sense.
Beyond that, example 1, if you want to change the language for your app, you don't want to have to change the class, so it should absolutely be in a config.
Hard coding should be avoided like Dracula avoids the sun. It'll come back to bite you in the ass eventually.
"hardcoding" is the wrong thing to worry about. The point is not whether special values are in code or in config files, the point is:
If the value could ever change, how much work is that and how hard is it to find? Putting it in one place and referring to that place elsewhere is not much work and therefore a way to play it safe.
Will maintainance programmers definitely understand why the value is what it is? If there is any doubt whatsoever, use a named constant that explains the meaning.
Both of these goals can be achieved without any need for config files; in fact I'd avoid those if possible. "putting stuff in config files means it's easier to change" is a myth, unless either
you actually want to support customers changing the values themselves
no value that could possibly be put in the config file can cause a bug (buffer overflow, anyone?)
your build and deployment process sucks
The text for the conditions should be in a resource file; that's what it's there for.
Not normally (Are hard-coding literals acceptable)
Another way at looking at this is how using a good naming convention
for constants used in-place of hard coded literals provides additional
documentation in the program.
Even if the number is used only once, it can still be hard to recognized
and may even be hard to find for future changes.
IMHO, making programs easier to read should be second nature to a
seasoned software professional. Raw numbers rarely communicate
meaningfully.
The extra time taken to use a well named constant will make the
code readability (easy to recall to the mind) and useful for future
re-mining (code re-use).
I tend to view it in terms of the project's scope and size.
Some simple projects that I am a solo dev on? Sure, I hard code lots of things. Tools I write that only I will ever use? Sure, if it gets the job done.
But, in working on larger, team projects? I agree, they are suspect and usually the product of laziness. Tag them for review and see if you can spot a pattern where they can be abstracted away.
In your example, the text box should be localizable, so why not a class that handles that?
Remember that you WILL forget the meaning of any non-obvious hard-coded value.
So be certain to put a short comment after each to remind you.
A Delphi example:
Length := Length * 0.3048; { 0.3048 converts feet to meters }
no.
What is a simple throw away app today will be driving your entire enterprise tomorrow. Always use best practices or you'll regret it.
Code always evolves. When you initially write stuff hard coding is the easiest way to go. Later when a need arrives to change the value it can be improved. In some cases the need never comes.
The need can arrive in many forms:
The value is used in many places and it needs to be changed by a programmer. In this case a constant is clearly needed.
User needs to be able to change the value.
I don't see the need to avoid hard coding. I do see the need to change things when there is a clear need.
Totally separate issue is that of course the code needs to be readable and this means that there might be a need for a comment for the hard coded value.
For the first value, it really depends. If you don't anticipate any kind of wide-spread adoption of your application and internationalization will never be an issue, I think it's mostly fine. However, if you are writing some kind of open source software or something with a larger audience consider the fact that it may one day need to be translated. In that case, you may be better off using string resources.
It's okay as long as you don't do refactoring, unit-testing, peer code reviews. And, you don't want repeat customers. Who cares?
I once had a boss who refused to not hardcode something because in his mind it gave him full control over the software and the items related to the software. Problem was, when the hardware died that ran the software the server got renamed... meaning he had to find his code. That took a while. I simply found a hex editor and hacked around it instead of waiting.
I normally add a set of helper methods for strings and numbers.
For example when I have strings such as 'yes' and 'no' I have a function called __ so I call __('yes'); which starts out in the project by just returning the first parameter but when I need to do more complex stuff (such as internationaizaton) it's already there and the param can be used a key.
Another example is VAT (form of UK tax) in online shops, recently it changed from 17.5% to 15%. Any one who hard coded VAT by doing:
$vat = $price * 0.175;
had to then go through all references and change it to 0.15, instead the super usefull way of doing it would be to have a function or variable for VAT.
In my opinion anything that could change should be written in a changeable way. If I find myself doing the same thing more than 5 times in the same day then it becomes a function or a config var.
Hard coding should be banned forever. Althought in you very simple examples i don't see anything wrong using them in any kind of project.
In my opinion hard coding is when you believe that a variable/value/define etc. will never change and create all your code based on that belief.
Example of such hard coding is the book Teach Yourself C in 24 Hours that everybody should avoid.

Spartan Programming

I really enjoyed Jeff's post on Spartan Programming. I agree that code like that is a joy to read. Unfortunately, I'm not so sure it would necessarily be a joy to work with.
For years I have read about and adhered to the "one-expression-per-line" practice. I have fought the good fight and held my ground when many programming books countered this advice with example code like:
while (bytes = read(...))
{
...
}
while (GetMessage(...))
{
...
}
Recently, I've advocated one expression per line for more practical reasons - debugging and production support. Getting a log file from production that claims a NullPointer exception at "line 65" which reads:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
is frustrating and entirely avoidable. Short of grabbing an expert with the code that can choose the "most likely" object that was null ... this is a real practical pain.
One expression per line also helps out quite a bit while stepping through code. I practice this with the assumption that most modern compilers can optimize away all the superfluous temp objects I've just created ...
I try to be neat - but cluttering my code with explicit objects sure feels laborious at times. It does not generally make the code easier to browse - but it really has come in handy when tracing things down in production or stepping through my or someone else's code.
What style do you advocate and can you rationalize it in a practical sense?
In The Pragmatic Programmer Hunt and Thomas talk about a study they term the Law of Demeter and it focuses on the coupling of functions to modules other than there own. By allowing a function to never reach a 3rd level in it's coupling you significantly reduce the number of errors and increase the maintainability of the code.
So:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
Is close to a felony because we are 4 objects down the rat hole. That means to change something in one of those objects I have to know that you called this whole stack right here in this very method. What a pain.
Better:
Account.getUser();
Note this runs counter to the expressive forms of programming that are now really popular with mocking software. The trade off there is that you have a tightly coupled interface anyway, and the expressive syntax just makes it easier to use.
I think the ideal solution is to find a balance between the extremes. There is no way to write a rule that will fit in all situations; it comes with experience. Declaring each intermediate variable on its own line will make reading the code more difficult, which will also contribute to the difficulty in maintenance. By the same token, debugging is much more difficult if you inline the intermediate values.
The 'sweet spot' is somewhere in the middle.
One expression per line.
There is no reason to obfuscate your code. The extra time you take typing the few extra terms, you save in debug time.
I tend to err on the side of readability, not necessarily debuggability. The examples you gave should definitely be avoided, but I feel that judicious use of multiple expressions can make the code more concise and comprehensible.
I'm usually in the "shorter is better" camp. Your example is good:
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
I would cringe if I saw that over four lines instead of one--I don't think it'd make it easier to read or understand. The way you presented it here, it's clear that you're digging for a single object. This isn't better:
obja State = session.getState();
objb Account = State.getAccount();
objc AccountNumber = Account.getAccountNumber();
ObjectA a = getTheUser(AccountNumber);
This is a compromise:
objb Account = session.getState().getAccount();
ObjectA a = getTheUser(Account.getAccountNumber());
but I still prefer the single line expression. Here's an anecdotal reason: it's difficult for me to reread and error-check the 4-liner right now for dumb typos; the single line doesn't have this problem because there are simply fewer characters.
ObjectA a = getTheUser(session.getState().getAccount().getAccountNumber());
This is a bad example, probably because you just wrote something from the top of your head.
You are assigning, to variable named a of type ObjectA, the return value of a function named getTheUser.
So let's assume you wrote this instead:
User u = getTheUser(session.getState().getAccount().getAccountNumber());
I would break this expression like so:
Account acc = session.getState().getAccount();
User user = getTheUser( acc.getAccountNumber() );
My reasoning is: how would I think about what I am doing with this code?
I would probably think: "first I need to get the account from the session and then I get the user using that account's number".
The code should read the way you think. Variables should refer to the main entities involved; not so much to their properties (so I wouldn't store the account number in a variable).
A second factor to have in mind is: will I ever need to refer to this entity again in this context?
If, say, I'm pulling more stuff out of the session state, I would introduce SessionState state = session.getState().
This all seems obvious, but I'm afraid I have some difficulty putting in words why it makes sense, not being a native English speaker and all.
Maintainability, and with it, readability, is king. Luckily, shorter very often means more readable.
Here are a few tips I enjoy using to slice and dice code:
Variable names: how would you describe this variable to someone else on your team? You would not say "the numberOfLinesSoFar integer". You would say "numLines" or something similar - comprehensible and short. Don't pretend like the maintainer doesn't know the code at all, but make sure you yourself could figure out what the variable is, even if you forgot your own act of writing it. Yes, this is kind of obvious, but it's worth more effort than I see many coders put into it, so I list it first.
Control flow: Avoid lots of closing clauses at once (a series of }'s in C++). Usually when you see this, there's a way to avoid it. A common case is something like
:
if (things_are_ok) {
// Do a lot of stuff.
return true;
} else {
ExpressDismay(error_str);
return false;
}
can be replaced by
if (!things_are_ok) return ExpressDismay(error_str);
// Do a lot of stuff.
return true;
if we can get ExpressDismay (or a wrapper thereof) to return false.
Another case is:
Loop iterations: the more standard, the better. For shorter loops, it's good to use one-character iterators when the variable is never used except as an index into a single object.
The particular case I would argue here is against the "right" way to use an STL container:
for (vector<string>::iterator a_str = my_vec.begin(); a_str != my_vec.end(); ++a_str)
is a lot wordier, and requires overloaded pointer operators *a_str or a_str->size() in the loop. For containers that have fast random access, the following is a lot easier to read:
for (int i = 0; i < my_vec.size(); ++i)
with references to my_vec[i] in the loop body, which won't confuse anyone.
Finally, I often see coders take pride in their line number counts. But it's not the line numbers that count! I'm not sure of the best way to implement this, but if you have any influence over your coding culture, I'd try to shift the reward toward those with compact classes :)
Good explanation. I think this is version of the general Divide and Conquer mentality.