GOTO still considered harmful? [closed] - language-agnostic

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf) and there has been a formidable push since that time to eschew the goto statement whenever possible. While it's possible to use goto to produce unmaintainable, sprawling code, it nevertheless remains in modern programming languages. Even the advanced continuation control structure in Scheme can be described as a sophisticated goto.
What circumstances warrant the use of goto? When is it best to avoid?
As a follow-up question: C provides a pair of functions, setjmp() and longjmp(), that provide the ability to goto not just within the current stack frame but within any of the calling frames. Should these be considered as dangerous as goto? More dangerous?
Dijkstra himself regretted that title, for which he was not responsible. At the end of EWD1308 (also here .pdf) he wrote:
Finally a short story for the record.
In 1968, the Communications of the ACM
published a text of mine under the
title "The goto statement considered
harmful", which in later years would
be most frequently referenced,
regrettably, however, often by authors
who had seen no more of it than its
title, which became a cornerstone of
my fame by becoming a template: we
would see all sorts of articles under
the title "X considered harmful" for
almost any X, including one titled
"Dijkstra considered harmful". But
what had happened? I had submitted a
paper under the title "A case against
the goto statement", which, in order
to speed up its publication, the
editor had changed into a "letter to
the Editor", and in the process he had
given it a new title of his own
invention! The editor was Niklaus
Wirth.
A well thought out classic paper about this topic, to be matched to that of Dijkstra, is Structured Programming with go to Statements, by Donald E. Knuth. Reading both helps to reestablish context and a non-dogmatic understanding of the subject. In this paper, Dijkstra's opinion on this case is reported and is even more strong:
Donald E. Knuth: I believe that by presenting such a
view I am not in fact disagreeing
sharply with Dijkstra's ideas, since
he recently wrote the following:
"Please don't fall into the trap of
believing that I am terribly
dogmatical about [the go to
statement]. I have the uncomfortable
feeling that others are making a
religion out of it, as if the
conceptual problems of programming
could be solved by a single trick, by
a simple form of coding discipline!"

A coworker of mine said the only reason to use a GOTO is if you programmed yourself so far into a corner that it is the only way out. In other words, proper design ahead of time and you won't need to use a GOTO later.
I thought this comic illustrates that beautifully "I could restructure the program's flow, or use one little 'GOTO' instead." A GOTO is a weak way out when you have weak design. Velociraptors prey on the weak.

The following statements are generalizations; while it is always possible to plead exception, it usually (in my experience and humble opinion) isn't worth the risks.
Unconstrained use of memory addresses (either GOTO or raw pointers) provides too many opportunities to make easily avoidable mistakes.
The more ways there are to arrive at a particular "location" in the code, the less confident one can be about what the state of the system is at that point. (See below.)
Structured programming IMHO is less about "avoiding GOTOs" and more about making the structure of the code match the structure of the data. For example, a repeating data structure (e.g. array, sequential file, etc.) is naturally processed by a repeated unit of code. Having built-in structures (e.g. while, for, until, for-each, etc.) allows the programmer to avoid the tedium of repeating the same cliched code patterns.
Even if GOTO is low-level implementation detail (not always the case!) it's below the level that the programmer should be thinking. How many programmers balance their personal checkbooks in raw binary? How many programmers worry about which sector on the disk contains a particular record, instead of just providing a key to a database engine (and how many ways could things go wrong if we really wrote all of our programs in terms of physical disk sectors)?
Footnotes to the above:
Regarding point 2, consider the following code:
a = b + 1
/* do something with a */
At the "do something" point in the code, we can state with high confidence that a is greater than b. (Yes, I'm ignoring the possibility of untrapped integer overflow. Let's not bog down a simple example.)
On the other hand, if the code had read this way:
...
goto 10
...
a = b + 1
10: /* do something with a */
...
goto 10
...
The multiplicity of ways to get to label 10 means that we have to work much harder to be confident about the relationships between a and b at that point. (In fact, in the general case it's undecideable!)
Regarding point 4, the whole notion of "going someplace" in the code is just a metaphor. Nothing is really "going" anywhere inside the CPU except electrons and photons (for the waste heat). Sometimes we give up a metaphor for another, more useful, one. I recall encountering (a few decades ago!) a language where
if (some condition) {
action-1
} else {
action-2
}
was implemented on a virtual machine by compiling action-1 and action-2 as out-of-line parameterless routines, then using a single two-argument VM opcode which used the boolean value of the condition to invoke one or the other. The concept was simply "choose what to invoke now" rather than "go here or go there". Again, just a change of metaphor.

Sometimes it is valid to use GOTO as an alternative to exception handling within a single function:
if (f() == false) goto err_cleanup;
if (g() == false) goto err_cleanup;
if (h() == false) goto err_cleanup;
return;
err_cleanup:
...
COM code seems to fall into this pattern fairly often.

I can only recall using a goto once. I had a series of five nested counted loops and I needed to be able to break out of the entire structure from the inside early based on certain conditions:
for{
for{
for{
for{
for{
if(stuff){
GOTO ENDOFLOOPS;
}
}
}
}
}
}
ENDOFLOOPS:
I could just have easily declared a boolean break variable and used it as part of the conditional for each loop, but in this instance I decided a GOTO was just as practical and just as readable.
No velociraptors attacked me.

Goto is extremely low on my list of things to include in a program just for the sake of it. That doesn't mean it's unacceptable.
Goto can be nice for state machines. A switch statement in a loop is (in order of typical importance): (a) not actually representative of the control flow, (b) ugly, (c) potentially inefficient depending on language and compiler. So you end up writing one function per state, and doing things like "return NEXT_STATE;" which even look like goto.
Granted, it is difficult to code state machines in a way which make them easy to understand. However, none of that difficulty is to do with using goto, and none of it can be reduced by using alternative control structures. Unless your language has a 'state machine' construct. Mine doesn't.
On those rare occasions when your algorithm really is most comprehensible in terms of a path through a sequence of nodes (states) connected by a limited set of permissible transitions (gotos), rather than by any more specific control flow (loops, conditionals, whatnot), then that should be explicit in the code. And you ought to draw a pretty diagram.
setjmp/longjmp can be nice for implementing exceptions or exception-like behaviour. While not universally praised, exceptions are generally considered a "valid" control structure.
setjmp/longjmp are 'more dangerous' than goto in the sense that they're harder to use correctly, never mind comprehensibly.
There never has been, nor will there
ever be, any language in which it is
the least bit difficult to write bad
code. -- Donald Knuth.
Taking goto out of C would not make it any easier to write good code in C. In fact, it would rather miss the point that C is supposed to be capable of acting as a glorified assembler language.
Next it'll be "pointers considered harmful", then "duck typing considered harmful". Then who will be left to defend you when they come to take away your unsafe programming construct? Eh?

We already had this discussion and I stand by my point.
Furthermore, I'm fed up with people describing higher-level language structures as “goto in disguise” because they clearly haven't got the point at all. For example:
Even the advanced continuation control structure in Scheme can be described as a sophisticated goto.
That is complete nonsense. Every control structure can be implemented in terms of goto but this observation is utterly trivial and useless. goto isn't considered harmful because of its positive effects but because of its negative consequences and these have been eliminated by structured programming.
Similarly, saying “GOTO is a tool, and as all tools, it can be used and abused” is completely off the mark. No modern construction worker would use a rock and claim it “is a tool.” Rocks have been replaced by hammers. goto has been replaced by control structures. If the construction worker were stranded in the wild without a hammer, of course he would use a rock instead. If a programmer has to use an inferior programming language that doesn't have feature X, well, of course she may have to use goto instead. But if she uses it anywhere else instead of the appropriate language feature she clearly hasn't understood the language properly and uses it wrongly. It's really as simple as that.

In Linux: Using goto In Kernel Code on Kernel Trap, there's a discussion with Linus Torvalds and a "new guy" about the use of GOTOs in Linux code. There are some very good points there and Linus dressed in that usual arrogance :)
Some passages:
Linus: "No, you've been brainwashed by
CS people who thought that Niklaus
Wirth actually knew what he was
talking about. He didn't. He doesn't
have a frigging clue."
-
Linus: "I think goto's are fine, and
they are often more readable than
large amounts of indentation."
-
Linus: "Of course, in stupid languages
like Pascal, where labels cannot be
descriptive, goto's can be bad."

In C, goto only works within the scope of the current function, which tends to localise any potential bugs. setjmp and longjmp are far more dangerous, being non-local, complicated and implementation-dependent. In practice however, they're too obscure and uncommon to cause many problems.
I believe that the danger of goto in C is greatly exaggerated. Remember that the original goto arguments took place back in the days of languages like old-fashioned BASIC, where beginners would write spaghetti code like this:
3420 IF A > 2 THEN GOTO 1430
Here Linus describes an appropriate use of goto: http://www.kernel.org/doc/Documentation/CodingStyle (chapter 7).

Today, it's hard to see the big deal about the GOTO statement because the "structured programming" people mostly won the debate and today's languages have sufficient control flow structures to avoid GOTO.
Count the number of gotos in a modern C program. Now add the number of break, continue, and return statements. Furthermore, add the number of times you use if, else, while, switch or case. That's about how many GOTOs your program would have had if you were writing in FORTRAN or BASIC in 1968 when Dijkstra wrote his letter.
Programming languages at the time were lacking in control flow. For example, in the original Dartmouth BASIC:
IF statements had no ELSE. If you wanted one, you had to write:
100 IF NOT condition THEN GOTO 200
...stuff to do if condition is true...
190 GOTO 300
200 REM else
...stuff to do if condition is false...
300 REM end if
Even if your IF statement didn't need an ELSE, it was still limited to a single line, which usually consisted of a GOTO.
There was no DO...LOOP statement. For non-FOR loops, you had to end the loop with an explicit GOTO or IF...GOTO back to the beginning.
There was no SELECT CASE. You had to use ON...GOTO.
So, you ended up with a lot of GOTOs in your program. And you couldn't depend on the restriction of GOTOs to within a single subroutine (because GOSUB...RETURN was such a weak concept of subroutines), so these GOTOs could go anywhere. Obviously, this made control flow hard to follow.
This is where the anti-GOTO movement came from.

Go To can provide a sort of stand-in for "real" exception handling in certain cases. Consider:
ptr = malloc(size);
if (!ptr) goto label_fail;
bytes_in = read(f_in,ptr,size);
if (bytes_in=<0) goto label_fail;
bytes_out = write(f_out,ptr,bytes_in);
if (bytes_out != bytes_in) goto label_fail;
Obviously this code was simplified to take up less space, so don't get too hung up on the details. But consider an alternative I've seen all too many times in production code by coders going to absurd lengths to avoid using goto:
success=false;
do {
ptr = malloc(size);
if (!ptr) break;
bytes_in = read(f_in,ptr,size);
if (count=<0) break;
bytes_out = write(f_out,ptr,bytes_in);
if (bytes_out != bytes_in) break;
success = true;
} while (false);
Now functionally this code does the exact same thing. In fact, the code generated by the compiler is nearly identical. However, in the programmer's zeal to appease Nogoto (the dreaded god of academic rebuke), this programmer has completely broken the underlying idiom that the while loop represents, and did a real number on the readability of the code. This is not better.
So, the moral of the story is, if you find yourself resorting to something really stupid in order to avoid using goto, then don't.

Donald E. Knuth answered this question in the book "Literate Programming", 1992 CSLI. On p. 17 there is an essay "Structured Programming with goto Statements" (PDF). I think the article might have been published in other books as well.
The article describes Dijkstra's suggestion and describes the circumstances where this is valid. But he also gives a number of counter examples (problems and algorithms) which cannot be easily reproduced using structured loops only.
The article contains a complete description of the problem, the history, examples and counter examples.

Goto considered helpful.
I started programming in 1975. To 1970s-era programmers, the words "goto considered harmful" said more or less that new programming languages with modern control structures were worth trying. We did try the new languages. We quickly converted. We never went back.
We never went back, but, if you are younger, then you have never been there in the first place.
Now, a background in ancient programming languages may not be very useful except as an indicator of the programmer's age. Notwithstanding, younger programmers lack this background, so they no longer understand the message the slogan "goto considered harmful" conveyed to its intended audience at the time it was introduced.
Slogans one does not understand are not very illuminating. It is probably best to forget such slogans. Such slogans do not help.
This particular slogan however, "Goto considered harmful," has taken on an undead life of its own.
Can goto not be abused? Answer: sure, but so what? Practically every programming element can be abused. The humble bool for example is abused more often than some of us would like to believe.
By contrast, I cannot remember meeting a single, actual instance of goto abuse since 1990.
The biggest problem with goto is probably not technical but social. Programmers who do not know very much sometimes seem to feel that deprecating goto makes them sound smart. You might have to satisfy such programmers from time to time. Such is life.
The worst thing about goto today is that it is not used enough.

Attracted by Jay Ballou adding an answer, I'll add my £0.02. If Bruno Ranschaert had not already done so, I'd have mentioned Knuth's "Structured Programming with GOTO Statements" article.
One thing that I've not seen discussed is the sort of code that, while not exactly common, was taught in Fortran text books. Things like the extended range of a DO loop and open-coded subroutines (remember, this would be Fortran II, or Fortran IV, or Fortran 66 - not Fortran 77 or 90). There's at least a chance that the syntactic details are inexact, but the concepts should be accurate enough. The snippets in each case are inside a single function.
Note that the excellent but dated (and out of print) book 'The Elements of Programming Style, 2nd Edn' by Kernighan & Plauger includes some real-life examples of abuse of GOTO from programming text books of its era (late-70s). The material below is not from that book, however.
Extended range for a DO loop
do 10 i = 1,30
...blah...
...blah...
if (k.gt.4) goto 37
91 ...blah...
...blah...
10 continue
...blah...
return
37 ...some computation...
goto 91
One reason for such nonsense was the good old-fashioned punch-card. You might notice that the labels (nicely out of sequence because that was canonical style!) are in column 1 (actually, they had to be in columns 1-5) and the code is in columns 7-72 (column 6 was the continuation marker column). Columns 73-80 would be given a sequence number, and there were machines that would sort punch card decks into sequence number order. If you had your program on sequenced cards and needed to add a few cards (lines) into the middle of a loop, you'd have to repunch everything after those extra lines. However, if you replaced one card with the GOTO stuff, you could avoid resequencing all the cards - you just tucked the new cards at the end of the routine with new sequence numbers. Consider it to be the first attempt at 'green computing' - a saving of punch cards (or, more specifically, a saving of retyping labour - and a saving of consequential rekeying errors).
Oh, you might also note that I'm cheating and not shouting - Fortran IV was written in all upper-case normally.
Open-coded subroutine
...blah...
i = 1
goto 76
123 ...blah...
...blah...
i = 2
goto 76
79 ...blah...
...blah...
goto 54
...blah...
12 continue
return
76 ...calculate something...
...blah...
goto (123, 79) i
54 ...more calculation...
goto 12
The GOTO between labels 76 and 54 is a version of computed goto. If the variable i has the value 1, goto the first label in the list (123); if it has the value 2, goto the second, and so on. The fragment from 76 to the computed goto is the open-coded subroutine. It was a piece of code executed rather like a subroutine, but written out in the body of a function. (Fortran also had statement functions - which were embedded functions that fitted on a single line.)
There were worse constructs than the computed goto - you could assign labels to variables and then use an assigned goto. Googling assigned goto tells me it was deleted from Fortran 95. Chalk one up for the structured programming revolution which could fairly be said to have started in public with Dijkstra's "GOTO Considered Harmful" letter or article.
Without some knowledge of the sorts of things that were done in Fortran (and in other languages, most of which have rightly fallen by the wayside), it is hard for us newcomers to understand the scope of the problem which Dijkstra was dealing with. Heck, I didn't start programming until ten years after that letter was published (but I did have the misfortune to program in Fortran IV for a while).

There is no such things as GOTO considered harmful.
GOTO is a tool, and as all tools, it can be used and abused.
There are, however, many tools in the programming world that have a tendency to be abused more than being used, and GOTO is one of them. the WITH statement of Delphi is another.
Personally I don't use either in typical code, but I've had the odd usage of both GOTO and WITH that were warranted, and an alternative solution would've contained more code.
The best solution would be for the compiler to just warn you that the keyword was tainted, and you'd have to stuff a couple of pragma directives around the statement to get rid of the warnings.
It's like telling your kids to not run with scissors. Scissors are not bad, but some usage of them are perhaps not the best way to keep your health.

Since I began doing a few things in the linux kernel, gotos don't bother me so much as they once did. At first I was sort of horrified to see they (kernel guys) added gotos into my code. I've since become accustomed to the use of gotos, in some limited contexts, and will now occasionally use them myself. Typically, it's a goto that jumps to the end of a function to do some kind of cleanup and bail out, rather than duplicating that same cleanup and bailout in several places in the function. And typically, it's not something large enough to hand off to another function -- e.g. freeing some locally (k)malloc'ed variables is a typical case.
I've written code that used setjmp/longjmp only once. It was in a MIDI drum sequencer program. Playback happened in a separate process from all user interaction, and the playback process used shared memory with the UI process to get the limited info it needed to do the playback. When the user wanted to stop playback, the playback process just did a longjmp "back to the beginning" to start over, rather than some complicated unwinding of wherever it happened to be executing when the user wanted it to stop. It worked great, was simple, and I never had any problems or bugs related to it in that instance.
setjmp/longjmp have their place -- but that place is one you'll not likely visit but once in a very long while.
Edit: I just looked at the code. It was actually siglongjmp() that I used, not longjmp (not that it's a big deal, but I had forgotten that siglongjmp even existed.)

It never was, as long as you were able to think for yourself.

Because goto can be used for confusing metaprogramming
Goto is both a high-level and a low-level control expression, and as a result it just doesn't have a appropriate design pattern suitable for most problems.
It's low-level in the sense that a goto is a primitive operation that implements something higher like while or foreach or something.
It's high-level in the sense that when used in certain ways it takes code that executes in a clear sequence, in an uninterrupted fashion, except for structured loops, and it changes it into pieces of logic that are, with enough gotos, a grab-bag of logic being dynamically reassembled.
So, there is a prosaic and an evil side to goto.
The prosaic side is that an upward pointing goto can implement a perfectly reasonable loop and a downward-pointing goto can do a perfectly reasonable break or return. Of course, an actual while, break, or return would be a lot more readable, as the poor human wouldn't have to simulate the effect of the goto in order to get the big picture. So, a bad idea in general.
The evil side involves a routine not using goto for while, break, or return, but using it for what's called spaghetti logic. In this case the goto-happy developer is constructing pieces of code out of a maze of goto's, and the only way to understand it is to simulate it mentally as a whole, a terribly tiring task when there are many goto's. I mean, imagine the trouble of evaluating code where the else is not precisely an inverse of the if, where nested ifs might allow in some things that were rejected by the outer if, etc, etc.
Finally, to really cover the subject, we should note that essentially all early languages except Algol initially made only single statements subject to their versions of if-then-else. So, the only way to do a conditional block was to goto around it using an inverse conditional. Insane, I know, but I've read some old specs. Remember that the first computers were programmed in binary machine code so I suppose any kind of an HLL was a lifesaver; I guess they weren't too picky about exactly what HLL features they got.
Having said all that I used to stick one goto into every program I wrote "just to annoy the purists".

If you're writing a VM in C, it turns out that using (gcc's) computed gotos like this:
char run(char *pc) {
void *opcodes[3] = {&&op_inc, &&op_lda_direct, &&op_hlt};
#define NEXT_INSTR(stride) goto *(opcodes[*(pc += stride)])
NEXT_INSTR(0);
op_inc:
++acc;
NEXT_INSTR(1);
op_lda_direct:
acc = ram[++pc];
NEXT_INSTR(1);
op_hlt:
return acc;
}
works much faster than the conventional switch inside a loop.

Denying the use of the GOTO statement to programmers is like telling a carpenter not to use a hammer as it Might damage the wall while he is hammering in a nail. A real programmer Knows How and When to use a GOTO. I’ve followed behind some of these so-called ‘Structured Programs’ I’ve see such Horrid code just to avoid using a GOTO, that I could shoot the programmer. Ok, In defense of the other side, I’ve seen some real spaghetti code too and again, those programmers should be shot too.
Here is just one small example of code I’ve found.
YORN = ''
LOOP
UNTIL YORN = 'Y' OR YORN = 'N' DO
CRT 'Is this correct? (Y/N) : ':
INPUT YORN
REPEAT
IF YORN = 'N' THEN
CRT 'Aborted!'
STOP
END
-----------------------OR----------------------
10: CRT 'Is this Correct (Y)es/(N)o ':
INPUT YORN
IF YORN='N' THEN
CRT 'Aborted!'
STOP
ENDIF
IF YORN<>'Y' THEN GOTO 10

"In this link http://kerneltrap.org/node/553/2131"
Ironically, eliminating the goto introduced a bug: the spinlock call was omitted.

The original paper should be thought of as "Unconditional GOTO Considered Harmful". It was in particular advocating a form of programming based on conditional (if) and iterative (while) constructs, rather than the test-and-jump common to early code. goto is still useful in some languages or circumstances, where no appropriate control structure exists.

About the only place I agree Goto could be used is when you need to deal with errors, and each particular point an error occurs requires special handling.
For instance, if you're grabbing resources and using semaphores or mutexes, you have to grab them in order and you should always release them in the opposite manner.
Some code requires a very odd pattern of grabbing these resources, and you can't just write an easily maintained and understood control structure to correctly handle both the grabbing and releasing of these resources to avoid deadlock.
It's always possible to do it right without goto, but in this case and a few others Goto is actually the better solution primarily for readability and maintainability.
-Adam

One modern GOTO usage is by the C# compiler to create state machines for enumerators defined by yield return.
GOTO is something that should be used by compilers and not programmers.

Until C and C++ (amongst other culprits) have labelled breaks and continues, goto will continue to have a role.

If GOTO itself were evil, compilers would be evil, because they generate JMPs. If jumping into a block of code, especially following a pointer, were inherently evil, the RETurn instruction would be evil. Rather, the evil is in the potential for abuse.
At times I have had to write apps that had to keep track of a number of objects where each object had to follow an intricate sequence of states in response to events, but the whole thing was definitely single-thread. A typical sequence of states, if represented in pseudo-code would be:
request something
wait for it to be done
while some condition
request something
wait for it
if one response
while another condition
request something
wait for it
do something
endwhile
request one more thing
wait for it
else if some other response
... some other similar sequence ...
... etc, etc.
endwhile
I'm sure this is not new, but the way I handled it in C(++) was to define some macros:
#define WAIT(n) do{state=(n); enque(this); return; L##n:;}while(0)
#define DONE state = -1
#define DISPATCH0 if state < 0) return;
#define DISPATCH1 if(state==1) goto L1; DISPATCH0
#define DISPATCH2 if(state==2) goto L2; DISPATCH1
#define DISPATCH3 if(state==3) goto L3; DISPATCH2
#define DISPATCH4 if(state==4) goto L4; DISPATCH3
... as needed ...
Then (assuming state is initially 0) the structured state machine above turns into the structured code:
{
DISPATCH4; // or as high a number as needed
request something;
WAIT(1); // each WAIT has a different number
while (some condition){
request something;
WAIT(2);
if (one response){
while (another condition){
request something;
WAIT(3);
do something;
}
request one more thing;
WAIT(4);
}
else if (some other response){
... some other similar sequence ...
}
... etc, etc.
}
DONE;
}
With a variation on this, there can be CALL and RETURN, so some state machines can act like subroutines of other state machines.
Is it unusual? Yes. Does it take some learning on the part of the maintainer? Yes. Does that learning pay off? I think so. Could it be done without GOTOs that jump into blocks? Nope.

I actually found myself forced to use a goto, because I literally couldn't think of a better (faster) way to write this code:
I had a complex object, and I needed to do some operation on it. If the object was in one state, then I could do a quick version of the operation, otherwise I had to do a slow version of the operation. The thing was that in some cases, in the middle of the slow operation, it was possible to realise that this could have been done with the fast operation.
SomeObject someObject;
if (someObject.IsComplex()) // this test is trivial
{
// begin slow calculations here
if (result of calculations)
{
// just discovered that I could use the fast calculation !
goto Fast_Calculations;
}
// do the rest of the slow calculations here
return;
}
if (someObject.IsmediumComplex()) // this test is slightly less trivial
{
Fast_Calculations:
// Do fast calculations
return;
}
// object is simple, no calculations needed.
This was in a speed critical piece of realtime UI code, so I honestly think that a GOTO was justified here.
Hugo

One thing I've not seen from any of the answers here is that a 'goto' solution is often more efficient than one of the structured programming solutions often mentioned.
Consider the many-nested-loops case, where using 'goto' instead of a bunch of if(breakVariable) sections is obviously more efficient. The solution "Put your loops in a function and use return" is often totally unreasonable. In the likely case that the loops are using local variables, you now have to pass them all through function parameters, potentially handling loads of extra headaches that arise from that.
Now consider the cleanup case, which I've used myself quite often, and is so common as to have presumably been responsible for the try{} catch {} structure not available in many languages. The number of checks and extra variables that are required to accomplish the same thing are far worse than the one or two instructions to make the jump, and again, the additional function solution is not a solution at all. You can't tell me that's more manageable or more readable.
Now code space, stack usage, and execution time may not matter enough in many situations to many programmers, but when you're in an embedded environment with only 2KB of code space to work with, 50 bytes of extra instructions to avoid one clearly defined 'goto' is just laughable, and this is not as rare a situation as many high-level programmers believe.
The statement that 'goto is harmful' was very helpful in moving towards structured programming, even if it was always an over-generalization. At this point, we've all heard it enough to be wary of using it (as we should). When it's obviously the right tool for the job, we don't need to be scared of it.

I avoid it since a coworker/manager will undoubtedly question its use either in a code review or when they stumble across it. While I think it has uses (the error handling case for example) - you'll run afoul of some other developer who will have some type of problem with it.
It’s not worth it.

Almost all situations where a goto can be used, you can do the same using other constructs. Goto is used by the compiler anyway.
I personally never use it explicitly, don't ever need to.

You can use it for breaking from a deeply nested loop, but most of the time your code can be refactored to be cleaner without deeply nested loops.

Related

Why is it bad to use goto? [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf) and there has been a formidable push since that time to eschew the goto statement whenever possible. While it's possible to use goto to produce unmaintainable, sprawling code, it nevertheless remains in modern programming languages. Even the advanced continuation control structure in Scheme can be described as a sophisticated goto.
What circumstances warrant the use of goto? When is it best to avoid?
As a follow-up question: C provides a pair of functions, setjmp() and longjmp(), that provide the ability to goto not just within the current stack frame but within any of the calling frames. Should these be considered as dangerous as goto? More dangerous?
Dijkstra himself regretted that title, for which he was not responsible. At the end of EWD1308 (also here .pdf) he wrote:
Finally a short story for the record.
In 1968, the Communications of the ACM
published a text of mine under the
title "The goto statement considered
harmful", which in later years would
be most frequently referenced,
regrettably, however, often by authors
who had seen no more of it than its
title, which became a cornerstone of
my fame by becoming a template: we
would see all sorts of articles under
the title "X considered harmful" for
almost any X, including one titled
"Dijkstra considered harmful". But
what had happened? I had submitted a
paper under the title "A case against
the goto statement", which, in order
to speed up its publication, the
editor had changed into a "letter to
the Editor", and in the process he had
given it a new title of his own
invention! The editor was Niklaus
Wirth.
A well thought out classic paper about this topic, to be matched to that of Dijkstra, is Structured Programming with go to Statements, by Donald E. Knuth. Reading both helps to reestablish context and a non-dogmatic understanding of the subject. In this paper, Dijkstra's opinion on this case is reported and is even more strong:
Donald E. Knuth: I believe that by presenting such a
view I am not in fact disagreeing
sharply with Dijkstra's ideas, since
he recently wrote the following:
"Please don't fall into the trap of
believing that I am terribly
dogmatical about [the go to
statement]. I have the uncomfortable
feeling that others are making a
religion out of it, as if the
conceptual problems of programming
could be solved by a single trick, by
a simple form of coding discipline!"
A coworker of mine said the only reason to use a GOTO is if you programmed yourself so far into a corner that it is the only way out. In other words, proper design ahead of time and you won't need to use a GOTO later.
I thought this comic illustrates that beautifully "I could restructure the program's flow, or use one little 'GOTO' instead." A GOTO is a weak way out when you have weak design. Velociraptors prey on the weak.
The following statements are generalizations; while it is always possible to plead exception, it usually (in my experience and humble opinion) isn't worth the risks.
Unconstrained use of memory addresses (either GOTO or raw pointers) provides too many opportunities to make easily avoidable mistakes.
The more ways there are to arrive at a particular "location" in the code, the less confident one can be about what the state of the system is at that point. (See below.)
Structured programming IMHO is less about "avoiding GOTOs" and more about making the structure of the code match the structure of the data. For example, a repeating data structure (e.g. array, sequential file, etc.) is naturally processed by a repeated unit of code. Having built-in structures (e.g. while, for, until, for-each, etc.) allows the programmer to avoid the tedium of repeating the same cliched code patterns.
Even if GOTO is low-level implementation detail (not always the case!) it's below the level that the programmer should be thinking. How many programmers balance their personal checkbooks in raw binary? How many programmers worry about which sector on the disk contains a particular record, instead of just providing a key to a database engine (and how many ways could things go wrong if we really wrote all of our programs in terms of physical disk sectors)?
Footnotes to the above:
Regarding point 2, consider the following code:
a = b + 1
/* do something with a */
At the "do something" point in the code, we can state with high confidence that a is greater than b. (Yes, I'm ignoring the possibility of untrapped integer overflow. Let's not bog down a simple example.)
On the other hand, if the code had read this way:
...
goto 10
...
a = b + 1
10: /* do something with a */
...
goto 10
...
The multiplicity of ways to get to label 10 means that we have to work much harder to be confident about the relationships between a and b at that point. (In fact, in the general case it's undecideable!)
Regarding point 4, the whole notion of "going someplace" in the code is just a metaphor. Nothing is really "going" anywhere inside the CPU except electrons and photons (for the waste heat). Sometimes we give up a metaphor for another, more useful, one. I recall encountering (a few decades ago!) a language where
if (some condition) {
action-1
} else {
action-2
}
was implemented on a virtual machine by compiling action-1 and action-2 as out-of-line parameterless routines, then using a single two-argument VM opcode which used the boolean value of the condition to invoke one or the other. The concept was simply "choose what to invoke now" rather than "go here or go there". Again, just a change of metaphor.
Sometimes it is valid to use GOTO as an alternative to exception handling within a single function:
if (f() == false) goto err_cleanup;
if (g() == false) goto err_cleanup;
if (h() == false) goto err_cleanup;
return;
err_cleanup:
...
COM code seems to fall into this pattern fairly often.
I can only recall using a goto once. I had a series of five nested counted loops and I needed to be able to break out of the entire structure from the inside early based on certain conditions:
for{
for{
for{
for{
for{
if(stuff){
GOTO ENDOFLOOPS;
}
}
}
}
}
}
ENDOFLOOPS:
I could just have easily declared a boolean break variable and used it as part of the conditional for each loop, but in this instance I decided a GOTO was just as practical and just as readable.
No velociraptors attacked me.
Goto is extremely low on my list of things to include in a program just for the sake of it. That doesn't mean it's unacceptable.
Goto can be nice for state machines. A switch statement in a loop is (in order of typical importance): (a) not actually representative of the control flow, (b) ugly, (c) potentially inefficient depending on language and compiler. So you end up writing one function per state, and doing things like "return NEXT_STATE;" which even look like goto.
Granted, it is difficult to code state machines in a way which make them easy to understand. However, none of that difficulty is to do with using goto, and none of it can be reduced by using alternative control structures. Unless your language has a 'state machine' construct. Mine doesn't.
On those rare occasions when your algorithm really is most comprehensible in terms of a path through a sequence of nodes (states) connected by a limited set of permissible transitions (gotos), rather than by any more specific control flow (loops, conditionals, whatnot), then that should be explicit in the code. And you ought to draw a pretty diagram.
setjmp/longjmp can be nice for implementing exceptions or exception-like behaviour. While not universally praised, exceptions are generally considered a "valid" control structure.
setjmp/longjmp are 'more dangerous' than goto in the sense that they're harder to use correctly, never mind comprehensibly.
There never has been, nor will there
ever be, any language in which it is
the least bit difficult to write bad
code. -- Donald Knuth.
Taking goto out of C would not make it any easier to write good code in C. In fact, it would rather miss the point that C is supposed to be capable of acting as a glorified assembler language.
Next it'll be "pointers considered harmful", then "duck typing considered harmful". Then who will be left to defend you when they come to take away your unsafe programming construct? Eh?
We already had this discussion and I stand by my point.
Furthermore, I'm fed up with people describing higher-level language structures as “goto in disguise” because they clearly haven't got the point at all. For example:
Even the advanced continuation control structure in Scheme can be described as a sophisticated goto.
That is complete nonsense. Every control structure can be implemented in terms of goto but this observation is utterly trivial and useless. goto isn't considered harmful because of its positive effects but because of its negative consequences and these have been eliminated by structured programming.
Similarly, saying “GOTO is a tool, and as all tools, it can be used and abused” is completely off the mark. No modern construction worker would use a rock and claim it “is a tool.” Rocks have been replaced by hammers. goto has been replaced by control structures. If the construction worker were stranded in the wild without a hammer, of course he would use a rock instead. If a programmer has to use an inferior programming language that doesn't have feature X, well, of course she may have to use goto instead. But if she uses it anywhere else instead of the appropriate language feature she clearly hasn't understood the language properly and uses it wrongly. It's really as simple as that.
In Linux: Using goto In Kernel Code on Kernel Trap, there's a discussion with Linus Torvalds and a "new guy" about the use of GOTOs in Linux code. There are some very good points there and Linus dressed in that usual arrogance :)
Some passages:
Linus: "No, you've been brainwashed by
CS people who thought that Niklaus
Wirth actually knew what he was
talking about. He didn't. He doesn't
have a frigging clue."
-
Linus: "I think goto's are fine, and
they are often more readable than
large amounts of indentation."
-
Linus: "Of course, in stupid languages
like Pascal, where labels cannot be
descriptive, goto's can be bad."
In C, goto only works within the scope of the current function, which tends to localise any potential bugs. setjmp and longjmp are far more dangerous, being non-local, complicated and implementation-dependent. In practice however, they're too obscure and uncommon to cause many problems.
I believe that the danger of goto in C is greatly exaggerated. Remember that the original goto arguments took place back in the days of languages like old-fashioned BASIC, where beginners would write spaghetti code like this:
3420 IF A > 2 THEN GOTO 1430
Here Linus describes an appropriate use of goto: http://www.kernel.org/doc/Documentation/CodingStyle (chapter 7).
Today, it's hard to see the big deal about the GOTO statement because the "structured programming" people mostly won the debate and today's languages have sufficient control flow structures to avoid GOTO.
Count the number of gotos in a modern C program. Now add the number of break, continue, and return statements. Furthermore, add the number of times you use if, else, while, switch or case. That's about how many GOTOs your program would have had if you were writing in FORTRAN or BASIC in 1968 when Dijkstra wrote his letter.
Programming languages at the time were lacking in control flow. For example, in the original Dartmouth BASIC:
IF statements had no ELSE. If you wanted one, you had to write:
100 IF NOT condition THEN GOTO 200
...stuff to do if condition is true...
190 GOTO 300
200 REM else
...stuff to do if condition is false...
300 REM end if
Even if your IF statement didn't need an ELSE, it was still limited to a single line, which usually consisted of a GOTO.
There was no DO...LOOP statement. For non-FOR loops, you had to end the loop with an explicit GOTO or IF...GOTO back to the beginning.
There was no SELECT CASE. You had to use ON...GOTO.
So, you ended up with a lot of GOTOs in your program. And you couldn't depend on the restriction of GOTOs to within a single subroutine (because GOSUB...RETURN was such a weak concept of subroutines), so these GOTOs could go anywhere. Obviously, this made control flow hard to follow.
This is where the anti-GOTO movement came from.
Go To can provide a sort of stand-in for "real" exception handling in certain cases. Consider:
ptr = malloc(size);
if (!ptr) goto label_fail;
bytes_in = read(f_in,ptr,size);
if (bytes_in=<0) goto label_fail;
bytes_out = write(f_out,ptr,bytes_in);
if (bytes_out != bytes_in) goto label_fail;
Obviously this code was simplified to take up less space, so don't get too hung up on the details. But consider an alternative I've seen all too many times in production code by coders going to absurd lengths to avoid using goto:
success=false;
do {
ptr = malloc(size);
if (!ptr) break;
bytes_in = read(f_in,ptr,size);
if (count=<0) break;
bytes_out = write(f_out,ptr,bytes_in);
if (bytes_out != bytes_in) break;
success = true;
} while (false);
Now functionally this code does the exact same thing. In fact, the code generated by the compiler is nearly identical. However, in the programmer's zeal to appease Nogoto (the dreaded god of academic rebuke), this programmer has completely broken the underlying idiom that the while loop represents, and did a real number on the readability of the code. This is not better.
So, the moral of the story is, if you find yourself resorting to something really stupid in order to avoid using goto, then don't.
Donald E. Knuth answered this question in the book "Literate Programming", 1992 CSLI. On p. 17 there is an essay "Structured Programming with goto Statements" (PDF). I think the article might have been published in other books as well.
The article describes Dijkstra's suggestion and describes the circumstances where this is valid. But he also gives a number of counter examples (problems and algorithms) which cannot be easily reproduced using structured loops only.
The article contains a complete description of the problem, the history, examples and counter examples.
Goto considered helpful.
I started programming in 1975. To 1970s-era programmers, the words "goto considered harmful" said more or less that new programming languages with modern control structures were worth trying. We did try the new languages. We quickly converted. We never went back.
We never went back, but, if you are younger, then you have never been there in the first place.
Now, a background in ancient programming languages may not be very useful except as an indicator of the programmer's age. Notwithstanding, younger programmers lack this background, so they no longer understand the message the slogan "goto considered harmful" conveyed to its intended audience at the time it was introduced.
Slogans one does not understand are not very illuminating. It is probably best to forget such slogans. Such slogans do not help.
This particular slogan however, "Goto considered harmful," has taken on an undead life of its own.
Can goto not be abused? Answer: sure, but so what? Practically every programming element can be abused. The humble bool for example is abused more often than some of us would like to believe.
By contrast, I cannot remember meeting a single, actual instance of goto abuse since 1990.
The biggest problem with goto is probably not technical but social. Programmers who do not know very much sometimes seem to feel that deprecating goto makes them sound smart. You might have to satisfy such programmers from time to time. Such is life.
The worst thing about goto today is that it is not used enough.
Attracted by Jay Ballou adding an answer, I'll add my £0.02. If Bruno Ranschaert had not already done so, I'd have mentioned Knuth's "Structured Programming with GOTO Statements" article.
One thing that I've not seen discussed is the sort of code that, while not exactly common, was taught in Fortran text books. Things like the extended range of a DO loop and open-coded subroutines (remember, this would be Fortran II, or Fortran IV, or Fortran 66 - not Fortran 77 or 90). There's at least a chance that the syntactic details are inexact, but the concepts should be accurate enough. The snippets in each case are inside a single function.
Note that the excellent but dated (and out of print) book 'The Elements of Programming Style, 2nd Edn' by Kernighan & Plauger includes some real-life examples of abuse of GOTO from programming text books of its era (late-70s). The material below is not from that book, however.
Extended range for a DO loop
do 10 i = 1,30
...blah...
...blah...
if (k.gt.4) goto 37
91 ...blah...
...blah...
10 continue
...blah...
return
37 ...some computation...
goto 91
One reason for such nonsense was the good old-fashioned punch-card. You might notice that the labels (nicely out of sequence because that was canonical style!) are in column 1 (actually, they had to be in columns 1-5) and the code is in columns 7-72 (column 6 was the continuation marker column). Columns 73-80 would be given a sequence number, and there were machines that would sort punch card decks into sequence number order. If you had your program on sequenced cards and needed to add a few cards (lines) into the middle of a loop, you'd have to repunch everything after those extra lines. However, if you replaced one card with the GOTO stuff, you could avoid resequencing all the cards - you just tucked the new cards at the end of the routine with new sequence numbers. Consider it to be the first attempt at 'green computing' - a saving of punch cards (or, more specifically, a saving of retyping labour - and a saving of consequential rekeying errors).
Oh, you might also note that I'm cheating and not shouting - Fortran IV was written in all upper-case normally.
Open-coded subroutine
...blah...
i = 1
goto 76
123 ...blah...
...blah...
i = 2
goto 76
79 ...blah...
...blah...
goto 54
...blah...
12 continue
return
76 ...calculate something...
...blah...
goto (123, 79) i
54 ...more calculation...
goto 12
The GOTO between labels 76 and 54 is a version of computed goto. If the variable i has the value 1, goto the first label in the list (123); if it has the value 2, goto the second, and so on. The fragment from 76 to the computed goto is the open-coded subroutine. It was a piece of code executed rather like a subroutine, but written out in the body of a function. (Fortran also had statement functions - which were embedded functions that fitted on a single line.)
There were worse constructs than the computed goto - you could assign labels to variables and then use an assigned goto. Googling assigned goto tells me it was deleted from Fortran 95. Chalk one up for the structured programming revolution which could fairly be said to have started in public with Dijkstra's "GOTO Considered Harmful" letter or article.
Without some knowledge of the sorts of things that were done in Fortran (and in other languages, most of which have rightly fallen by the wayside), it is hard for us newcomers to understand the scope of the problem which Dijkstra was dealing with. Heck, I didn't start programming until ten years after that letter was published (but I did have the misfortune to program in Fortran IV for a while).
There is no such things as GOTO considered harmful.
GOTO is a tool, and as all tools, it can be used and abused.
There are, however, many tools in the programming world that have a tendency to be abused more than being used, and GOTO is one of them. the WITH statement of Delphi is another.
Personally I don't use either in typical code, but I've had the odd usage of both GOTO and WITH that were warranted, and an alternative solution would've contained more code.
The best solution would be for the compiler to just warn you that the keyword was tainted, and you'd have to stuff a couple of pragma directives around the statement to get rid of the warnings.
It's like telling your kids to not run with scissors. Scissors are not bad, but some usage of them are perhaps not the best way to keep your health.
Since I began doing a few things in the linux kernel, gotos don't bother me so much as they once did. At first I was sort of horrified to see they (kernel guys) added gotos into my code. I've since become accustomed to the use of gotos, in some limited contexts, and will now occasionally use them myself. Typically, it's a goto that jumps to the end of a function to do some kind of cleanup and bail out, rather than duplicating that same cleanup and bailout in several places in the function. And typically, it's not something large enough to hand off to another function -- e.g. freeing some locally (k)malloc'ed variables is a typical case.
I've written code that used setjmp/longjmp only once. It was in a MIDI drum sequencer program. Playback happened in a separate process from all user interaction, and the playback process used shared memory with the UI process to get the limited info it needed to do the playback. When the user wanted to stop playback, the playback process just did a longjmp "back to the beginning" to start over, rather than some complicated unwinding of wherever it happened to be executing when the user wanted it to stop. It worked great, was simple, and I never had any problems or bugs related to it in that instance.
setjmp/longjmp have their place -- but that place is one you'll not likely visit but once in a very long while.
Edit: I just looked at the code. It was actually siglongjmp() that I used, not longjmp (not that it's a big deal, but I had forgotten that siglongjmp even existed.)
It never was, as long as you were able to think for yourself.
Because goto can be used for confusing metaprogramming
Goto is both a high-level and a low-level control expression, and as a result it just doesn't have a appropriate design pattern suitable for most problems.
It's low-level in the sense that a goto is a primitive operation that implements something higher like while or foreach or something.
It's high-level in the sense that when used in certain ways it takes code that executes in a clear sequence, in an uninterrupted fashion, except for structured loops, and it changes it into pieces of logic that are, with enough gotos, a grab-bag of logic being dynamically reassembled.
So, there is a prosaic and an evil side to goto.
The prosaic side is that an upward pointing goto can implement a perfectly reasonable loop and a downward-pointing goto can do a perfectly reasonable break or return. Of course, an actual while, break, or return would be a lot more readable, as the poor human wouldn't have to simulate the effect of the goto in order to get the big picture. So, a bad idea in general.
The evil side involves a routine not using goto for while, break, or return, but using it for what's called spaghetti logic. In this case the goto-happy developer is constructing pieces of code out of a maze of goto's, and the only way to understand it is to simulate it mentally as a whole, a terribly tiring task when there are many goto's. I mean, imagine the trouble of evaluating code where the else is not precisely an inverse of the if, where nested ifs might allow in some things that were rejected by the outer if, etc, etc.
Finally, to really cover the subject, we should note that essentially all early languages except Algol initially made only single statements subject to their versions of if-then-else. So, the only way to do a conditional block was to goto around it using an inverse conditional. Insane, I know, but I've read some old specs. Remember that the first computers were programmed in binary machine code so I suppose any kind of an HLL was a lifesaver; I guess they weren't too picky about exactly what HLL features they got.
Having said all that I used to stick one goto into every program I wrote "just to annoy the purists".
If you're writing a VM in C, it turns out that using (gcc's) computed gotos like this:
char run(char *pc) {
void *opcodes[3] = {&&op_inc, &&op_lda_direct, &&op_hlt};
#define NEXT_INSTR(stride) goto *(opcodes[*(pc += stride)])
NEXT_INSTR(0);
op_inc:
++acc;
NEXT_INSTR(1);
op_lda_direct:
acc = ram[++pc];
NEXT_INSTR(1);
op_hlt:
return acc;
}
works much faster than the conventional switch inside a loop.
Denying the use of the GOTO statement to programmers is like telling a carpenter not to use a hammer as it Might damage the wall while he is hammering in a nail. A real programmer Knows How and When to use a GOTO. I’ve followed behind some of these so-called ‘Structured Programs’ I’ve see such Horrid code just to avoid using a GOTO, that I could shoot the programmer. Ok, In defense of the other side, I’ve seen some real spaghetti code too and again, those programmers should be shot too.
Here is just one small example of code I’ve found.
YORN = ''
LOOP
UNTIL YORN = 'Y' OR YORN = 'N' DO
CRT 'Is this correct? (Y/N) : ':
INPUT YORN
REPEAT
IF YORN = 'N' THEN
CRT 'Aborted!'
STOP
END
-----------------------OR----------------------
10: CRT 'Is this Correct (Y)es/(N)o ':
INPUT YORN
IF YORN='N' THEN
CRT 'Aborted!'
STOP
ENDIF
IF YORN<>'Y' THEN GOTO 10
"In this link http://kerneltrap.org/node/553/2131"
Ironically, eliminating the goto introduced a bug: the spinlock call was omitted.
The original paper should be thought of as "Unconditional GOTO Considered Harmful". It was in particular advocating a form of programming based on conditional (if) and iterative (while) constructs, rather than the test-and-jump common to early code. goto is still useful in some languages or circumstances, where no appropriate control structure exists.
About the only place I agree Goto could be used is when you need to deal with errors, and each particular point an error occurs requires special handling.
For instance, if you're grabbing resources and using semaphores or mutexes, you have to grab them in order and you should always release them in the opposite manner.
Some code requires a very odd pattern of grabbing these resources, and you can't just write an easily maintained and understood control structure to correctly handle both the grabbing and releasing of these resources to avoid deadlock.
It's always possible to do it right without goto, but in this case and a few others Goto is actually the better solution primarily for readability and maintainability.
-Adam
One modern GOTO usage is by the C# compiler to create state machines for enumerators defined by yield return.
GOTO is something that should be used by compilers and not programmers.
Until C and C++ (amongst other culprits) have labelled breaks and continues, goto will continue to have a role.
If GOTO itself were evil, compilers would be evil, because they generate JMPs. If jumping into a block of code, especially following a pointer, were inherently evil, the RETurn instruction would be evil. Rather, the evil is in the potential for abuse.
At times I have had to write apps that had to keep track of a number of objects where each object had to follow an intricate sequence of states in response to events, but the whole thing was definitely single-thread. A typical sequence of states, if represented in pseudo-code would be:
request something
wait for it to be done
while some condition
request something
wait for it
if one response
while another condition
request something
wait for it
do something
endwhile
request one more thing
wait for it
else if some other response
... some other similar sequence ...
... etc, etc.
endwhile
I'm sure this is not new, but the way I handled it in C(++) was to define some macros:
#define WAIT(n) do{state=(n); enque(this); return; L##n:;}while(0)
#define DONE state = -1
#define DISPATCH0 if state < 0) return;
#define DISPATCH1 if(state==1) goto L1; DISPATCH0
#define DISPATCH2 if(state==2) goto L2; DISPATCH1
#define DISPATCH3 if(state==3) goto L3; DISPATCH2
#define DISPATCH4 if(state==4) goto L4; DISPATCH3
... as needed ...
Then (assuming state is initially 0) the structured state machine above turns into the structured code:
{
DISPATCH4; // or as high a number as needed
request something;
WAIT(1); // each WAIT has a different number
while (some condition){
request something;
WAIT(2);
if (one response){
while (another condition){
request something;
WAIT(3);
do something;
}
request one more thing;
WAIT(4);
}
else if (some other response){
... some other similar sequence ...
}
... etc, etc.
}
DONE;
}
With a variation on this, there can be CALL and RETURN, so some state machines can act like subroutines of other state machines.
Is it unusual? Yes. Does it take some learning on the part of the maintainer? Yes. Does that learning pay off? I think so. Could it be done without GOTOs that jump into blocks? Nope.
I actually found myself forced to use a goto, because I literally couldn't think of a better (faster) way to write this code:
I had a complex object, and I needed to do some operation on it. If the object was in one state, then I could do a quick version of the operation, otherwise I had to do a slow version of the operation. The thing was that in some cases, in the middle of the slow operation, it was possible to realise that this could have been done with the fast operation.
SomeObject someObject;
if (someObject.IsComplex()) // this test is trivial
{
// begin slow calculations here
if (result of calculations)
{
// just discovered that I could use the fast calculation !
goto Fast_Calculations;
}
// do the rest of the slow calculations here
return;
}
if (someObject.IsmediumComplex()) // this test is slightly less trivial
{
Fast_Calculations:
// Do fast calculations
return;
}
// object is simple, no calculations needed.
This was in a speed critical piece of realtime UI code, so I honestly think that a GOTO was justified here.
Hugo
One thing I've not seen from any of the answers here is that a 'goto' solution is often more efficient than one of the structured programming solutions often mentioned.
Consider the many-nested-loops case, where using 'goto' instead of a bunch of if(breakVariable) sections is obviously more efficient. The solution "Put your loops in a function and use return" is often totally unreasonable. In the likely case that the loops are using local variables, you now have to pass them all through function parameters, potentially handling loads of extra headaches that arise from that.
Now consider the cleanup case, which I've used myself quite often, and is so common as to have presumably been responsible for the try{} catch {} structure not available in many languages. The number of checks and extra variables that are required to accomplish the same thing are far worse than the one or two instructions to make the jump, and again, the additional function solution is not a solution at all. You can't tell me that's more manageable or more readable.
Now code space, stack usage, and execution time may not matter enough in many situations to many programmers, but when you're in an embedded environment with only 2KB of code space to work with, 50 bytes of extra instructions to avoid one clearly defined 'goto' is just laughable, and this is not as rare a situation as many high-level programmers believe.
The statement that 'goto is harmful' was very helpful in moving towards structured programming, even if it was always an over-generalization. At this point, we've all heard it enough to be wary of using it (as we should). When it's obviously the right tool for the job, we don't need to be scared of it.
I avoid it since a coworker/manager will undoubtedly question its use either in a code review or when they stumble across it. While I think it has uses (the error handling case for example) - you'll run afoul of some other developer who will have some type of problem with it.
It’s not worth it.
Almost all situations where a goto can be used, you can do the same using other constructs. Goto is used by the compiler anyway.
I personally never use it explicitly, don't ever need to.
You can use it for breaking from a deeply nested loop, but most of the time your code can be refactored to be cleaner without deeply nested loops.

Can every recursion be converted into iteration?

A reddit thread brought up an apparently interesting question:
Tail recursive functions can trivially be converted into iterative functions. Other ones, can be transformed by using an explicit stack. Can every recursion be transformed into iteration?
The (counter?)example in the post is the pair:
(define (num-ways x y)
(case ((= x 0) 1)
((= y 0) 1)
(num-ways2 x y) ))
(define (num-ways2 x y)
(+ (num-ways (- x 1) y)
(num-ways x (- y 1))
Can you always turn a recursive function into an iterative one? Yes, absolutely, and the Church-Turing thesis proves it if memory serves. In lay terms, it states that what is computable by recursive functions is computable by an iterative model (such as the Turing machine) and vice versa. The thesis does not tell you precisely how to do the conversion, but it does say that it's definitely possible.
In many cases, converting a recursive function is easy. Knuth offers several techniques in "The Art of Computer Programming". And often, a thing computed recursively can be computed by a completely different approach in less time and space. The classic example of this is Fibonacci numbers or sequences thereof. You've surely met this problem in your degree plan.
On the flip side of this coin, we can certainly imagine a programming system so advanced as to treat a recursive definition of a formula as an invitation to memoize prior results, thus offering the speed benefit without the hassle of telling the computer exactly which steps to follow in the computation of a formula with a recursive definition. Dijkstra almost certainly did imagine such a system. He spent a long time trying to separate the implementation from the semantics of a programming language. Then again, his non-deterministic and multiprocessing programming languages are in a league above the practicing professional programmer.
In the final analysis, many functions are just plain easier to understand, read, and write in recursive form. Unless there's a compelling reason, you probably shouldn't (manually) convert these functions to an explicitly iterative algorithm. Your computer will handle that job correctly.
I can see one compelling reason. Suppose you've a prototype system in a super-high level language like [donning asbestos underwear] Scheme, Lisp, Haskell, OCaml, Perl, or Pascal. Suppose conditions are such that you need an implementation in C or Java. (Perhaps it's politics.) Then you could certainly have some functions written recursively but which, translated literally, would explode your runtime system. For example, infinite tail recursion is possible in Scheme, but the same idiom causes a problem for existing C environments. Another example is the use of lexically nested functions and static scope, which Pascal supports but C doesn't.
In these circumstances, you might try to overcome political resistance to the original language. You might find yourself reimplementing Lisp badly, as in Greenspun's (tongue-in-cheek) tenth law. Or you might just find a completely different approach to solution. But in any event, there is surely a way.
Is it always possible to write a non-recursive form for every recursive function?
Yes. A simple formal proof is to show that both µ recursion and a non-recursive calculus such as GOTO are both Turing complete. Since all Turing complete calculi are strictly equivalent in their expressive power, all recursive functions can be implemented by the non-recursive Turing-complete calculus.
Unfortunately, I’m unable to find a good, formal definition of GOTO online so here’s one:
A GOTO program is a sequence of commands P executed on a register machine such that P is one of the following:
HALT, which halts execution
r = r + 1 where r is any register
r = r – 1 where r is any register
GOTO x where x is a label
IF r ≠ 0 GOTO x where r is any register and x is a label
A label, followed by any of the above commands.
However, the conversions between recursive and non-recursive functions isn’t always trivial (except by mindless manual re-implementation of the call stack).
For further information see this answer.
Recursion is implemented as stacks or similar constructs in the actual interpreters or compilers. So you certainly can convert a recursive function to an iterative counterpart because that's how it's always done (if automatically). You'll just be duplicating the compiler's work in an ad-hoc and probably in a very ugly and inefficient manner.
Basically yes, in essence what you end up having to do is replace method calls (which implicitly push state onto the stack) into explicit stack pushes to remember where the 'previous call' had gotten up to, and then execute the 'called method' instead.
I'd imagine that the combination of a loop, a stack and a state-machine could be used for all scenarios by basically simulating the method calls. Whether or not this is going to be 'better' (either faster, or more efficient in some sense) is not really possible to say in general.
Recursive function execution flow can be represented as a tree.
The same logic can be done by a loop, which uses a data-structure to traverse that tree.
Depth-first traversal can be done using a stack, breadth-first traversal can be done using a queue.
So, the answer is: yes. Why: https://stackoverflow.com/a/531721/2128327.
Can any recursion be done in a single loop? Yes, because
a Turing machine does everything it does by executing a single loop:
fetch an instruction,
evaluate it,
goto 1.
Yes, using explicitly a stack (but recursion is far more pleasant to read, IMHO).
Yes, it's always possible to write a non-recursive version. The trivial solution is to use a stack data structure and simulate the recursive execution.
In principle it is always possible to remove recursion and replace it with iteration in a language that has infinite state both for data structures and for the call stack. This is a basic consequence of the Church-Turing thesis.
Given an actual programming language, the answer is not as obvious. The problem is that it is quite possible to have a language where the amount of memory that can be allocated in the program is limited but where the amount of call stack that can be used is unbounded (32-bit C where the address of stack variables is not accessible). In this case, recursion is more powerful simply because it has more memory it can use; there is not enough explicitly allocatable memory to emulate the call stack. For a detailed discussion on this, see this discussion.
All computable functions can be computed by Turing Machines and hence the recursive systems and Turing machines (iterative systems) are equivalent.
Sometimes replacing recursion is much easier than that. Recursion used to be the fashionable thing taught in CS in the 1990's, and so a lot of average developers from that time figured if you solved something with recursion, it was a better solution. So they would use recursion instead of looping backwards to reverse order, or silly things like that. So sometimes removing recursion is a simple "duh, that was obvious" type of exercise.
This is less of a problem now, as the fashion has shifted towards other technologies.
Recursion is nothing just calling the same function on the stack and once function dies out it is removed from the stack. So one can always use an explicit stack to manage this calling of the same operation using iteration.
So, yes all-recursive code can be converted to iteration.
Removing recursion is a complex problem and is feasible under well defined circumstances.
The below cases are among the easy:
tail recursion
direct linear recursion
Appart from the explicit stack, another pattern for converting recursion into iteration is with the use of a trampoline.
Here, the functions either return the final result, or a closure of the function call that it would otherwise have performed. Then, the initiating (trampolining) function keep invoking the closures returned until the final result is reached.
This approach works for mutually recursive functions, but I'm afraid it only works for tail-calls.
http://en.wikipedia.org/wiki/Trampoline_(computers)
I'd say yes - a function call is nothing but a goto and a stack operation (roughly speaking). All you need to do is imitate the stack that's built while invoking functions and do something similar as a goto (you may imitate gotos with languages that don't explicitly have this keyword too).
Have a look at the following entries on wikipedia, you can use them as a starting point to find a complete answer to your question.
Recursion in computer science
Recurrence relation
Follows a paragraph that may give you some hint on where to start:
Solving a recurrence relation means obtaining a closed-form solution: a non-recursive function of n.
Also have a look at the last paragraph of this entry.
It is possible to convert any recursive algorithm to a non-recursive
one, but often the logic is much more complex and doing so requires
the use of a stack. In fact, recursion itself uses a stack: the
function stack.
More Details: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Functions
tazzego, recursion means that a function will call itself whether you like it or not. When people are talking about whether or not things can be done without recursion, they mean this and you cannot say "no, that is not true, because I do not agree with the definition of recursion" as a valid statement.
With that in mind, just about everything else you say is nonsense. The only other thing that you say that is not nonsense is the idea that you cannot imagine programming without a callstack. That is something that had been done for decades until using a callstack became popular. Old versions of FORTRAN lacked a callstack and they worked just fine.
By the way, there exist Turing-complete languages that only implement recursion (e.g. SML) as a means of looping. There also exist Turing-complete languages that only implement iteration as a means of looping (e.g. FORTRAN IV). The Church-Turing thesis proves that anything possible in a recursion-only languages can be done in a non-recursive language and vica-versa by the fact that they both have the property of turing-completeness.
Here is an iterative algorithm:
def howmany(x,y)
a = {}
for n in (0..x+y)
for m in (0..n)
a[[m,n-m]] = if m==0 or n-m==0 then 1 else a[[m-1,n-m]] + a[[m,n-m-1]] end
end
end
return a[[x,y]]
end

One line functions in C?

What do you think about one line functions? Is it bad?
One advantage I can think of is that it makes the code more comprehensive (if you choose a good name for it). For example:
void addint(Set *S, int n)
{
(*S)[n/CHAR_SIZE] |= (unsigned char) pow(2, (CHAR_SIZE - 1) - (n % CHAR_SIZE));
}
One disadvantage I can think of is that it slows the code (pushing parameters to stack, jumping to a function, popping the parameters, doing the operation, jumping back to the code - and only for one line?)
is it better to put such lines in functions or to just put them in the code? Even if we use them only once?
BTW, I haven't found any question about that, so forgive me if such question had been asked before.
Don't be scared of 1-line functions!
A lot of programmers seem to have a mental block about 1-line functions, you shouldn't.
If it makes the code clearer and cleaner, extract the line into a function.
Performance probably won't be affected.
Any decent compiler made in the last decade (and perhaps further) will automatically inline a simple 1-line function. Also, 1-line of C can easily correspond to many lines of machine code. You shouldn't assume that even in the theoretical case where you incur the full overhead of a function call that this overhead is significant compared to your "one little line". Let alone significant to the overall performance of your application.
Abstraction Leads to Better Design. (Even for single lines of code)
Functions are the primary building blocks of abstract, componentized code, they should not be neglected. If encapsulating a single line of code behind a function call makes the code more readable, do it. Even in the case where the function is called once. If you find it important to comment one particular line of code, that's a good code smell that it might be helpful to move the code into a well-named function.
Sure, that code may be 1-line today, but how many different ways of performing the same function are there? Encapsulating code inside a function can make it easier to see all the design options available to you. Maybe your 1-line of code expands into a call to a webservice, maybe it becomes a database query, maybe it becomes configurable (using the strategy pattern, for example), maybe you want to switch to caching the value computed by your 1-line. All of these options are easier to implement and more readily thought of when you've extracted your 1-line of code into its own function.
Maybe Your 1-Line Should Be More Lines.
If you have a big block of code it can be tempting to cram a lot of functionality onto a single line, just to save on screen real estate. When you migrate this code to a function, you reduce these pressures, which might make you more inclined to expand your complex 1-liner into more straightforward code taking up several lines (which would likely improve its readability and maintainability).
I am not a fan of having all sort of logic and functionality banged into one line. The example you have shown is a mess and could be broken down into several lines, using meaningful variable names and performing one operation after another.
I strongly recommend, in every question of this kind, to have a look (buy it, borrow it, (don't) download it (for free)) at this book: Robert C. Martin - Clean Code. It is a book every developer should have a look at.
It will not make you a good coder right away and it will not stop you from writing ugly code in the future, it will however make you realise it when you are writing ugly code. It will force you to look at your code with a more critical eye and to make your code readable like a newspaper story.
If used more than once, definitely make it a function, and let the compiler do the inlining (possibly adding "inline" to the function definition). (<Usual advice about premature optimization goes here>)
Since your example appears to be using a C(++) syntax you may want to read up on inline functions which eliminate the overhead of calling a simple function. This keyword is only recommendation to the compiler though and it may not inline all functions that you mark, and may choose to inline unmarked functions.
In .NET the JIT will inline methods that it feels is appropiate, but you have no control over why or when it does this, though (as I understand it) debug builds will never inline since that would stop the source code matching the compiled application.
What language? If you mean C, I'd also use the inline qualifier. In C++, I have the option of inline, boost.lamda or and moving forward C++0x native support for lamdas.
There is nothing wrong with one line functions. As mentioned it is possible for the compiler to inline the functions which will remove any performance penalty.
Functions should also be preferred over macros as they are easier to debug, modify, read and less likely to have unintended side effects.
If it is used only once then the answer is less obvious. Moving it to a function can make the calling function simpler & clearer by moving some of the complexity into the new function.
If you use the code within that function 3 times or more, then I would recommend to put that in a function. Only for maintainability.
Sometimes it's not a bad idea to use the preprocessor:
#define addint(S, n) (*S)[n/CHAR_SIZE] |= (unsigned char) pow(2, (CHAR_SIZE - 1) - (n % CHAR_SIZE));
Granted, you don't get any kind of type checking, but in some cases this can be useful. Macros have their disadvantages and their advantages, and in a few cases their disadvantages can become advantages. I'm a fan of macros in appropriate places, but it's up to you to decide when is appropriate. In this case, I'm going to go out on a limb and say that, whatever you end up doing, that one line of code is quite a bit.
#define addint(S, n) do { \
unsigned char c = pow(2, (CHAR_SIZE -1) - (n % CHAR_SIZE)); \
(*S)[n/CHAR_SIZE] |= c \
} while(0)

What exactly is the danger of using magic debug values (such as 0xDEADBEEF) as literals?

It goes without saying that using hard-coded, hex literal pointers is a disaster:
int *i = 0xDEADBEEF;
// god knows if that location is available
However, what exactly is the danger in using hex literals as variable values?
int i = 0xDEADBEEF;
// what can go wrong?
If these values are indeed "dangerous" due to their use in various debugging scenarios, then this means that even if I do not use these literals, any program that during runtime happens to stumble upon one of these values might crash.
Anyone care to explain the real dangers of using hex literals?
Edit: just to clarify, I am not referring to the general use of constants in source code. I am specifically talking about debug-scenario issues that might come up to the use of hex values, with the specific example of 0xDEADBEEF.
There's no more danger in using a hex literal than any other kind of literal.
If your debugging session ends up executing data as code without you intending it to, you're in a world of pain anyway.
Of course, there's the normal "magic value" vs "well-named constant" code smell/cleanliness issue, but that's not really the sort of danger I think you're talking about.
With few exceptions, nothing is "constant".
We prefer to call them "slow variables" -- their value changes so slowly that we don't mind recompiling to change them.
However, we don't want to have many instances of 0x07 all through an application or a test script, where each instance has a different meaning.
We want to put a label on each constant that makes it totally unambiguous what it means.
if( x == 7 )
What does "7" mean in the above statement? Is it the same thing as
d = y / 7;
Is that the same meaning of "7"?
Test Cases are a slightly different problem. We don't need extensive, careful management of each instance of a numeric literal. Instead, we need documentation.
We can -- to an extent -- explain where "7" comes from by including a tiny bit of a hint in the code.
assertEquals( 7, someFunction(3,4), "Expected 7, see paragraph 7 of use case 7" );
A "constant" should be stated -- and named -- exactly once.
A "result" in a unit test isn't the same thing as a constant, and requires a little care in explaining where it came from.
A hex literal is no different than a decimal literal like 1. Any special significance of a value is due to the context of a particular program.
I believe the concern raised in the IP address formatting question earlier today was not related to the use of hex literals in general, but the specific use of 0xDEADBEEF. At least, that's the way I read it.
There is a concern with using 0xDEADBEEF in particular, though in my opinion it is a small one. The problem is that many debuggers and runtime systems have already co-opted this particular value as a marker value to indicate unallocated heap, bad pointers on the stack, etc.
I don't recall off the top of my head just which debugging and runtime systems use this particular value, but I have seen it used this way several times over the years. If you are debugging in one of these environments, the existence of the 0xDEADBEEF constant in your code will be indistinguishable from the values in unallocated RAM or whatever, so at best you will not have as useful RAM dumps, and at worst you will get warnings from the debugger.
Anyhow, that's what I think the original commenter meant when he told you it was bad for "use in various debugging scenarios."
There's no reason why you shouldn't assign 0xdeadbeef to a variable.
But woe betide the programmer who tries to assign decimal 3735928559, or octal 33653337357, or worst of all: binary 11011110101011011011111011101111.
Big Endian or Little Endian?
One danger is when constants are assigned to an array or structure with different sized members; the endian-ness of the compiler or machine (including JVM vs CLR) will affect the ordering of the bytes.
This issue is true of non-constant values, too, of course.
Here's an, admittedly contrived, example. What is the value of buffer[0] after the last line?
const int TEST[] = { 0x01BADA55, 0xDEADBEEF };
char buffer[BUFSZ];
memcpy( buffer, (void*)TEST, sizeof(TEST));
I don't see any problem with using it as a value. Its just a number after all.
There's no danger in using a hard-coded hex value for a pointer (like your first example) in the right context. In particular, when doing very low-level hardware development, this is the way you access memory-mapped registers. (Though it's best to give them names with a #define, for example.) But at the application level you shouldn't ever need to do an assignment like that.
I use CAFEBABE
I haven't seen it used by any debuggers before.
int *i = 0xDEADBEEF;
// god knows if that location is available
int i = 0xDEADBEEF;
// what can go wrong?
The danger that I see is the same in both cases: you've created a flag value that has no immediate context. There's nothing about i in either case that will let me know 100, 1000 or 10000 lines that there is a potentially critical flag value associated with it. What you've planted is a landmine bug that, if I don't remember to check for it in every possible use, I could be faced with a terrible debugging problem. Every use of i will now have to look like this:
if (i != 0xDEADBEEF) { // Curse the original designer to oblivion
// Actual useful work goes here
}
Repeat the above for all of the 7000 instances where you need to use i in your code.
Now, why is the above worse than this?
if (isIProperlyInitialized()) { // Which could just be a boolean
// Actual useful work goes here
}
At a minimum, I can spot several critical issues:
Spelling: I'm a terrible typist. How easily will you spot 0xDAEDBEEF in a code review? Or 0xDEADBEFF? On the other hand, I know that my compile will barf immediately on isIProperlyInitialised() (insert the obligatory s vs. z debate here).
Exposure of meaning. Rather than trying to hide your flags in the code, you've intentionally created a method that the rest of the code can see.
Opportunities for coupling. It's entirely possible that a pointer or reference is connected to a loosely defined cache. An initialization check could be overloaded to check first if the value is in cache, then to try to bring it back into cache and, if all that fails, return false.
In short, it's just as easy to write the code you really need as it is to create a mysterious magic value. The code-maintainer of the future (who quite likely will be you) will thank you.

When is a function too long? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
35 lines, 55 lines, 100 lines, 300 lines? When you should start to break it apart? I'm asking because I have a function with 60 lines (including comments) and was thinking about breaking it apart.
long_function(){ ... }
into:
small_function_1(){...}
small_function_2(){...}
small_function_3(){...}
The functions are not going to be used outside of the long_function, making smaller functions means more function calls, etc.
When would you break apart a function into smaller ones? Why?
Methods should do only one logical thing (think about functionality)
You should be able to explain the method in a single sentence
It should fit into the height of your display
Avoid unnecessary overhead (comments that point out the obvious...)
Unit testing is easier for small logical functions
Check if part of the function can be reused by other classes or methods
Avoid excessive inter-class coupling
Avoid deeply nested control structures
Thanks everyone for the answers, edit the list and vote for the correct answer I'll choose that one ;)
I am refactoring now with those ideas in mind :)
Here is a list of red-flags (in no particular order) that could indicate that a function is too long:
Deeply nested control structures: e.g. for-loops 3 levels deep or even just 2 levels deep with nested if-statements that have complex conditions.
Too many state-defining parameters: By state-defining parameter, I mean a function parameter that guarantees a particular execution path through the function. Get too many of these type of parameters and you have a combinatorial explosion of execution paths (this usually happens in tandem with #1).
Logic that is duplicated in other methods: poor code re-use is a huge contributor to monolithic procedural code. A lot of such logic duplication can be very subtle, but once re-factored, the end result can be a far more elegant design.
Excessive inter-class coupling: this lack of proper encapsulation results in functions being concerned with intimate characteristics of other classes, hence lengthening them.
Unnecessary overhead: Comments that point out the obvious, deeply nested classes, superfluous getters and setters for private nested class variables, and unusually long function/variable names can all create syntactic noise within related functions that will ultimately increase their length.
Your massive developer-grade display isn't big enough to display it: Actually, displays of today are big enough that a function that is anywhere close to its height is probably way too long. But, if it is larger, this is a smoking gun that something is wrong.
You can't immediately determine the function's purpose: Furthermore, once you actually do determine its purpose, if you can't summarize this purpose in a single sentence or happen to have a tremendous headache, this should be a clue.
In conclusion, monolithic functions can have far-reaching consequences and are often a symptom of major design deficiencies. Whenever I encounter code that is an absolute joy to read, it's elegance is immediately apparent. And guess what: the functions are often very short in length.
There aren't any real hard or fast rule for it. I generally like my methods to just "do one thing". So if it's grabbing data, then doing something with that data, then writing it to disk then I'd split out the grabbing and writing into separate methods so my "main" method just contains that "doing something".
That "doing something" could still be quite a few lines though, so I'm not sure if the "number of lines" metric is the right one to use :)
Edit: This is a single line of code I mailed around work last week (to prove a point.. it's definitely not something I make a habit of :)) - I certainly wouldn't want 50-60 of these bad boys in my method :D
return level4 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3) && (r.Level4 == (int)level4)).ToList() : level3 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3)).ToList() : level2 != null ? GetResources().Where(r => (r.Level2 == (int)level2)).ToList() : GetAllResourceList();
I think there is a huge caveat to the "do only one thing" mantra on this page. Sometimes doing one thing juggles lots of variables. Don't break up a long function into a bunch of smaller functions if the smaller functions end up having long parameter lists. Doing that just turns a single function into a set of highly coupled functions with no real individual value.
I agree a function should only do one thing, but at what level is that one thing.
If your 60 lines is accomplishing one thing (from your programs perspective) and the pieces that make up those 60 lines aren't going to be used by anything else then 60 lines is fine.
There is no real benefit to breaking it up, unless you can break it up into concrete pieces that stand on their own. The metric to use is functionality and not lines of code.
I have worked on many programs where the authors took the only one thing to an extreme level and all it ended up doing was to make it look like someone took a grenade to a function/method and blew it up into dozens of unconnected pieces that are hard to follow.
When pulling out pieces of that function you also need to consider if you will be adding any unnecessary overhead and avoid passing large amounts of data.
I believe the key point is to look for reuseability in that long function and pull those parts out. What you are left with is the function, whether it is 10, 20, or 60 lines long.
A function should do only one thing. If you are doing many small things in a function, make each small thing a function and call those functions from the long function.
What you really don't want to do is copy and paste every 10 lines of your long function into short functions (as your example suggests).
60 lines is large but not too long for a function. If it fits on one screen in an editor you can see it all at once. It really depends on what the functions is doing.
Why I may break up a function:
It is too long
It makes the code more maintainable by breaking it up and using meaningful names for the new function
The function is not cohesive
Parts of the function are useful in themselves.
When it is difficult to come up with a meaningful name for the function (It is probably doing too much)
My personal heuristic is that it's too long if I can't see the whole thing without scrolling.
Take a peek at McCabe's cyclomatic, in which he breaks up his code into a graph where, "Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program."
Now imagine your code has no functions/methods; its just one huge sprawl of code in the form of a graph.
You want to break this sprawl into methods. Consider that, when you do, there will be a certain number of blocks in each method. Only one block of each method will be visible to all other methods: the first block (we're presuming that you will be able to jump into a method at only one point: the first block). All the other blocks in each method will be information hidden within that method, but each block within a method may potentially jump to any other block within that method.
To determine what size your methods should be in terms of number of blocks per method, one question you might ask yourself is: how many methods should I have to minimise the maximum potential number of dependencies (MPE) between all blocks?
That answer is given by an equation. If r is the number of methods that minimises the MPE of the system, and n is the number of blocks in the system, then the equation is:
r = sqrt(n)
And it can be shown that this gives the number of blocks per method to be, also, sqrt(n).
Size approx you screen size (so go get a big pivot widescreen and turn it)... :-)
Joke aside, one logical thing per function.
And the positive thing is that unit testing is really much easier to do with small logical functions that do 1 thing. Big functions that do many things are harder to verify!
/Johan
Rule of thumb: If a function contains code blocks that do something, that is somewhat detached from the rest of code, put it in a seperate function. Example:
function build_address_list_for_zip($zip) {
$query = "SELECT * FROM ADDRESS WHERE zip = $zip";
$results = perform_query($query);
$addresses = array();
while ($address = fetch_query_result($results)) {
$addresses[] = $address;
}
// now create a nice looking list of
// addresses for the user
return $html_content;
}
much nicer:
function fetch_addresses_for_zip($zip) {
$query = "SELECT * FROM ADDRESS WHERE zip = $zip";
$results = perform_query($query);
$addresses = array();
while ($address = fetch_query_result($results)) {
$addresses[] = $address;
}
return $addresses;
}
function build_address_list_for_zip($zip) {
$addresses = fetch_addresses_for_zip($zip);
// now create a nice looking list of
// addresses for the user
return $html_content;
}
This approach has two advantages:
Whenever you need to fetch addresses for a certain zip code you can use the readily available function.
When you ever need to read the function build_address_list_for_zip() again you know what the first code block is going to do (it fetches addresses for a certain zip code, at least thats what you can derive from the function name). If you would have left the query code inline, you would first need to analyze that code.
[On the other hand (I will deny I told you this, even under torture): If you read a lot about PHP optimization, you could get the idea to keep the number of functions as small a possible, because function call are very, very expensive in PHP. I don't know about that since I never did any benchmarks. If thats case you would probably be better of not following any of the answers to your question if you application is very "performance sensitive" ;-) ]
Bear in mind that you can end up re-factoring just for re-factoring's sake, potentially making the code more unreadable than it was in the first place.
A former colleague of mine had a bizarre rule that a function/method must only contain 4 lines of code! He tried to stick to this so rigidly that his method names often became repetitive & meaningless plus the calls became deeply nested & confusing.
So my own mantra has become: if you can't think of a decent function/method name for the chunk of code you are re-factoring, don't bother.
The main reason I usually break a function up is either because bits and pieces of it are also ingredients in another nearby function I'm writing, so the common parts get factored out. Also, if it's using a lot of fields or properties out of some other class, there's a good chance that the relevant chunk can be lifted out wholesale and if possible moved into the other class.
If you have a block of code with a comment at the top, consider pulling it out into a function, with the function and argument names illustrating its purpose, and reserving the comment for the code's rationale.
Are you sure there are no pieces in there that would be useful elsewhere? What sort of function is it?
I usually break functions up by the need to place comments describing the next code block. What previously went into the comments now goes into the new function name. This is no hard rule, but (for me) a nice rule of thumb. I like code speaking for itself better than one that needs comments (as I've learned that comments usually lie)
In my opinion the answer is: when it does too much things.
Your function should perform only the actions you expect from the name of the function itself.
Another thing to consider is if you want to reuse some parts of your functions in others; in this case it could be useful to split it.
This is partly a matter of taste, but how I determine this is I try to keep my functions roughly only as long as will fit on my screen at one time (at a maximum). The reason being that it's easier to understand what's happening if you can see the whole thing at once.
When I code, it's a mix of writing long functions, then refactoring to pull out bits that could be reused by other functions -and- writing small functions that do discrete tasks as I go.
I don't know that there is any right or wrong answer to this (e.g., you may settle on 67 lines as your max, but there may be times when it makes sense to add a few more).
There has been some thorough research done on this very topic, if you want the fewest bugs, your code shouldn't be too long. But it also shouldn't be too short.
I disagree that a method should fit on your display in one, but if you are scrolling down by more than a page then the method is too long.
See
The Optimal Class Size for
Object-Oriented Software for further discussion.
I have written 500 line functions before, however these were just big switch statements for decoding and responding to messages. When the code for a single message got more complex than a single if-then-else, I extracted it out.
In essence, although the function was 500 lines, the independently maintained regions averaged 5 lines.
I normally uses a test driven approach to writing code. In this approach the function size is often related to the granularity of your tests.
If your test is sufficiently focused then it will lead you to write a small focused function to make the test pass.
This also works in the other direction. Functions need to be small enough to test effectively. So when working with legacy code I often find that I break down larger functions in-order to test the different parts of them.
I usually ask myself "what is the responsibility of this function" and if I can't state the responsibility in a clear concise sentence, and then translate that into a small focused test, I wonder if the function is too big.
If it has more than three branches, generally this means that a function or method should be broken apart, to encapsulate the branching logic in different methods.
Each for loop, if statement, etc is then not seen as a branch in the calling method.
Cobertura for Java code (and I'm sure there are other tools for other languages) calculates the number of if, etc. in a function for each function and sums it for the "average cyclomatic complexity".
If a function/method only has three branches, it will get a three on that metric, which is very good.
Sometimes it is difficult to follow this guideline, namely for validating user input. Nevertheless putting branches in different methods aids not only development and maintenance but testing as well, since the inputs to the methods that perform the branching can be analyzed easily to see what inputs need to be added to the test cases in order to cover the branches that were not covered.
If all the branches were inside a single method, the inputs would have to be tracked since the start of the method, which hinders testability.
I suspect you'll find a lot of answers on this.
I would probably break it up based on the logical tasks that were being performed within the function. If it looks to you that your short story is turning into a novel, I would suggest find and extract distinct steps.
For example, if you have a function that handles some kind of string input and returns a string result, you might break the function up based on the logic to split your string into parts, the logic to add extra characters and the logic to put it all back together again as a formatted result.
In short, whatever makes your code clean and easy to read (whether that's by simply ensuring your function has good commenting or breaking it up) is the best approach.
assuming that you are doing one thing, the length will depend on:
what you are doing
what language you are using
how many levels of abstraction you need to deal with in the code
60 lines might be too long or it might be just right. i suspect that it may be too long though.
One thing (and that thing should be obvious from the function name), but no more than a screenful of code, regardless. And feel free to increase your font size. And if in doubt, refactor it into two or more functions.
Extending the spirit of a tweet from Uncle Bob a while back, you know a function is getting too long when you feel the need to put a blank line between two lines of code. The idea being that if you need a blank line to separate the code, its responsibility and scope are separating at that point.
My idea is that if I have to ask myself if it is too long, it is probably too long. It helps making smaller functions, in this area, because it could help later in the application's life cycle.