Looking over my Raku code, I've realized that I pretty much never use CATCH blocks to actually catch/handle error. Instead, I handle errors with try blocks and testing for undefined values; the only thing I use CATCH blocks for is to log errors differently. I don't seem to be alone in this habit – looking at the CATCH blocks in the Raku docs, pretty much none of them handle the error in any sense beyond printing a message. (The same is true of most of the CATCH blocks in Rakudo.).
Nevertheless, I'd like to better understand how to use CATCH blocks. Let me work through a few example functions, all of which are based on the following basic idea:
sub might-die($n) { $n %% 2 ?? 'lives' !! die 418 }
Now, as I've said, I'd normally use this function with something like
say try { might-die(3) } // 'default';
But I'd like to avoid that here and use CATCH blocks inside the function. My first instinct is to write
sub might-die1($n) {
$n %% 2 ?? 'lives' !! die 418
CATCH { default { 'default' }}
}
But this not only doesn't work, it also (very helpfully!) doesn't even compile. Apparently, the CATCH block is not removed from the control flow (as I would have thought). Thus, that block, rather than the ternary expression, is the last statement in the function. Ok, fair enough. How about this:
sub might-die2($n) {
ln1: CATCH { default { 'default' }}
ln2: $n %% 2 ?? 'lives' !! die 418
}
(those line numbers are Lables. Yes, it's valid Raku and, yes, they're useless here. But SO doesn't give line numbers, and I wanted some.)
This at least compiles, but it doesn't do what I mean.
say might-die2(3); # OUTPUT: «Nil»
To DWIM, I can change this to
sub might-die3($n) {
ln1: CATCH { default { return 'default' }}
ln2: $n %% 2 ?? 'lives' !! die 418
}
say might-die3(3); # OUTPUT: «'default'»
What these two reveal is that the result of the CATCH block is not, as I'd hopped, being inserted into control flow where the exception occurred. Instead, the exception is causing control flow to jump to the CATCH block for the enclosing scope. It's as though we'd written (in an alternate universe where Raku has a GOTO operator [EDIT: or maybe not that alternate of a universe, since we apparently have a NYI goto method. Learn something new every day…]
sub might-die4($n) {
ln0: GOTO ln2;
ln1: return 'default';
ln2: $n %% 2 ?? 'lives' !! GOTO ln1;
}
I realize that some critics of exceptions say that they can reduce to GOTO statements, but this seems to be carrying things a bit far.
I could (mostly) avoid emulating GOTO with the .resume method, but I can't do it the way I'd like to. Specifically, I can't write:
sub might-die5($n) {
ln1: CATCH { default { .resume('default') }}
ln2: $n %% 2 ?? 'lives' !! die 418
}
Because .resume doesn't take an argument. I can write
sub might-die6($n) {
ln1: CATCH { default { .resume }}
ln2: $n %% 2 ?? 'lives' !! do { die 418; 'default' }
}
say might-die6 3; # OUTPUT: «'default'»
This works, at least in this particular example. But I can't help feeling that it's more of a hack than an actual solution and that it wouldn't generalize well. Indeed, I can't help feeling that I'm missing some larger insight behind error handling in Raku that would make all of this fit together better. (Maybe because I've spent too much time programming in languages that handle errors without exceptions?) I would appreciate any insight into how to write the above code in idiomatic Raku. Is one of the approaches above basically correct? Is there a different approach I haven't considered? And is there a larger insight about error handling that I'm missing in all of this?
"Larger insight about error handling"
Is one of the approaches [in my question] basically correct?
Yes. In the general case, use features like try and if, not CATCH.
Is there a different approach I haven't considered?
Here's a brand new one: catch. I invented the first version of it a few weeks ago, and now your question has prompted me to reimagine it. I'm pretty happy with how it's now settled; I'd appreciate readers' feedback about it.
is there a larger insight about error handling that I'm missing in all of this?
I'll discuss some of my thoughts at the end of this answer.
But let's now go through your points in the order you wrote them.
KISS
I pretty much never use CATCH blocks to actually catch/handle error.
Me neither.
Instead, I handle errors with try blocks and testing for undefined values
That's more like it.
Logging errors with a catchall CATCH
the only thing I use CATCH blocks for is to log errors differently.
Right. A judiciously located catchall. This is a use case for which I'd say CATCH is a good fit.
The doc
looking at the CATCH blocks in the Raku docs, pretty much none of them handle the error in any sense beyond printing a message.
If the doc is misleading about:
The limits of the capabilities and applicability of CATCH / CONTROL blocks; and/or
The alternatives; and/or
What's idiomatic (which imo is not use of CATCH for code where try is more appropriate (and now my new catch function too?)).
then that would be unfortunate.
CATCH blocks in the Rakudo compiler source
(The same is true of most of the CATCH blocks in Rakudo.).
At a guess those will be judiciously placed catchalls. Placing one just before the callstack runs out, to specify default exception handling (as either a warning plus .resume, or a die or similar), seems reasonable to me. Is that what they all are?
Why are phasers statements?
sub might-die1($n) {
$n %% 2 ?? 'lives' !! die 418
CATCH { default { 'default' }}
}
this not only doesn't work, it also (very helpfully!) doesn't even compile.
.oO ( Well that's because you forgot a semi-colon at the end of the first statement )
(I would have thought ... the CATCH block [would have been] removed from the control flow)
Join the club. Others have expressed related sentiments in filed bugs, and SO Q's and A's. I used to think the current situation was wrong in the same way you express. I think I could now easily be persuaded by either side of the argument -- but jnthn's view would be decisive for me.
Quoting the doc:
A phaser block is just a trait of the closure containing it, and is automatically called at the appropriate moment.
That suggests that a phaser is not a statement, at least not in an ordinary sense and would, one might presume, be removed from ordinary control flow.
But returning to the doc:
Phasers [may] have a runtime value, and if evaluated [in a] surrounding expression, they simply save their result for use in the expression ... when the rest of the expression is evaluated.
That suggests that they can have a value in an ordinary control flow sense.
Perhaps the rationale for not removing phasers from holding their place in ordinary control flow, and instead evaluating to Nil if they don't otherwise return a value, is something like:
Phasers like INIT do return values. The compiler could insist that one assigns their result to a variable and then explicitly returns that variable. But that would be very un Raku-ish.
Raku philosophy is that, in general, the dev tells the compiler what to do or not do, not the other way around. A phaser is a statement. If you put a statement at the end, then you want it to be the value returned by its enclosing block. (Even if it's Nil.)
Still, overall, I'm with you in the following sense:
It seems natural to think that ordinary control flow does not include phasers that do not return a value. Why should it?
It seems IWBNI the compiler at least warned if it saw a non-value-returning phaser used as the last statement of a block that contains other value-returning statements.
Why don't CATCH blocks return/inject a value?
Ok, fair enough. How about this:
sub might-die2($n) {
ln1: CATCH { default { 'default' }}
ln2: $n %% 2 ?? 'lives' !! die 418
}
say might-die2(3); # OUTPUT: «Nil»
As discussed above, many phasers, including the exception handling ones, are statements that do not return values.
I think one could reasonably have expected that:
CATCH phasers would return a value. But they don't. I vaguely recall jnthn already explaining why here on SO; I'll leave hunting that down as an exercise for readers. Or, conversely:
The compiler would warn that a phaser that did not return a value was placed somewhere a returned value was probably intended.
It's as though we'd written ... a GOTO operator
Raku(do) isn't just doing an unstructured jump.
(Otherwise .resume wouldn't work.)
this seems to be carrying things a bit far
I agree, you are carrying things a bit too far. :P
.resume
Resumable exceptions certainly aren't something I've found myself reaching for in Raku. I don't think I've used them in "userspace" code at all yet.
(from jnthn's answer to When would I want to resume a Raku exception?.)
.resume doesn't take an argument
Right. It just resumes execution at the statement after the one that led to an exception being thrown. .resume does not alter the result of the failed statement.
Even if a CATCH block tries to intervene, it won't be able to do so in a simple, self-contained fashion, by setting the value of a variable whose assignment has thrown an exception, and then .resumeing. cf Should this Raku CATCH block be able to change variables in the lexical scope?.
(I tried several CATCH related approaches before concluding that just using try was the way to go for the body of the catch function I linked at the start. If you haven't already looked at the catch code, I recommend you do.)
Further tidbits about CATCH blocks
They're a bit fraught for a couple reasons. One is what seems to be deliberate limits of their intended capability and applicability. Another is bugs. Consider, for example:
My answer to SO CATCH and throw in custom exception
Rakudo issue: Missing return value from do when calling .resume and CATCH is the last statement in a block
Rakudo issue: return-ing out of a block and LEAVE phaser (“identity”‽)
Larger insight about error handling
is there a larger insight about error handling that I'm missing in all of this?
Perhaps. I think you already know most of it well, but:
KISS #1 You've handled errors without exceptions in other PLs. It worked. You've done it in Raku. It works. Use exceptions only when you need or want to use them. For most code, you won't.
KISS #2 Ignoring some native type use cases, almost all results can be expressed as valid or not valid, without leading to the semi-predicate problem, using simple combinations of the following Raku Truth value that provide ergonomic ways to discern between non-error values and errors:
Conditionals: if, while, try, //, et al
Predicates: .so, .defined, .DEFINITE, et al
Values/types: Nil, Failures, zero length composite data structures, :D vs :U type constraints, et al
Sticking with error exceptions, some points I think worth considering:
One of the use cases for Raku error exceptions is to cover the same ground as exceptions in, say, Haskell. These are scenarios in which handling them as values isn't the right solution (or, in Raku, might not be).
Other PLs support exceptions. One of Raku's superpowers is being able to interoperate with all other PLs. Ergo it supports exceptions if for no other reason than to enable correct interoperation.
Raku includes the notion of a Failure, a delayed exception. The idea is you can get the best of both worlds. Handled with due care, a Failure is just an error value. Handled carelessly, it blows up like a regular exception.
More generally, all of Raku's features are designed to work together to provide convenient but high quality error handling that supports all of the following coding scenarios:
Fast coding. Prototyping, exploratory code, one-offs, etc.
Control of robustness. Gradually narrowing or broadening error handling.
Diverse options. What errors should be signalled? When? By which code? What if consuming code wants to signal that producing code should be more strict? Or more relaxed? What if it's the other way around -- producing code wants to signal that consuming code should be more careful or can relax? What can be done if producing and consuming code have conflicting philosophies? What if producing code cannot be altered (eg it's a library, or written in another language)?
Interoperation between languages / codebases. The only way that can work well is if Raku provides both high levels of control and diverse options.
Convenient refactoring between these scenarios.
All of these factors, and more, underlie Raku's approach to error handling.
CATCH is a really old feature of the language.
It used to only exist inside of a try block.
(Which is not very Rakuish.)
It is also a very rarely used part of Raku.
Which means that not a lot of people have come up with “pain points” of the feature.
So then very rarely has anyone done any work to make it more Rakuish.
Both of those combined make it so that CATCH is a rather featureless part of the language.
If you look at the test file for the feature, you will note that most of it was written in 2009 when the test suite was still a part of the Pugs project.
(And most of the rest are tests for bugs that have been found over the years.)
There is a very good reason that few people have tried to add new behaviours to CATCH, there are plenty of other features that are much nicer to work with.
If you want to replace a result in the event of an exception
sub may-die () {
if Bool.pick {
return 'normal'
} else {
die
}
}
my $result;
{
CATCH { default { $result = 'replacement' }}
$result = may-die();
}
It is much easier to just use try without CATCH, along with defined‑or // to get something that works very similarly.
my $result = try { may-die } // 'replacement';
It is even easier if you are dealing with soft failures instead of hard exceptions, because you can just use defined‑or by itself.
sub may-fail () {
if Bool.pick {
return 'normal'
} else {
fail
}
}
my $result = may-fail() // 'replacement';
In fact the only way to use CATCH with a soft failure is to combine it with try
my $result;
try {
CATCH { default { $result = 'replacement' }}
$result = may-fail();
}
If your soft failure is the base of all failure objects Nil, you can either use // or is default
my $result = may-return-nil // 'replacement';
my $result is default<replacement> = may-return-nil;
But Nil won't just work with CATCH no matter how much you try.
Really the only time I would normally use CATCH is when I want to handle several different errors in different ways.
{
CATCH {
when X::Something { … }
when X::This { … }
when X::That { … }
default { … }
}
# some code that may throw X::This
…
# some code that may throw X::NotSpecified (default)
…
# some code that may throw X::Something
…
# some code that may throw X::This or X::That
…
# some code that may fail instead of throw
# (sunk so that it will throw immediately)
sink may-fail;
}
Or if I wanted to show how you could write this [terrible] Visual Basic line
On Error Resume Next
In Raku
CATCH { default { .resume } }
That of course doesn't really answer your question in the slightest.
You say that you expected CATCH to be removed from the control flow.
The whole point of CATCH is to insert itself into the exceptional control flow.
Actually that's not accurate. It doesn't so much insert itself into the control flow as ending the control flow while doing some processing before moving on to the caller/outside block. Presumably because the data of the current block is in an erroneous state and should no longer be trusted.
That still doesn't explain why your code fails to compile.
You expected CATCH to have its own special syntax rule when it comes to the semicolon ending a statement.
If it worked the way you expected it would fail one of the important [syntax] rules in Raku, “there should be as few special cases as possible”. Its syntax is not special in any way unlike what you seem to expect.
CATCH is just one of many phasers with one important extra bit of functionality, it stops exception propagation down the call stack.
What you seem to be asking for it to instead alter the result of an expression that may throw.
That doesn't seem like a good idea.
$a + may-die() + $b
You want to be able to replace the exception from may-die with a value.
$a + 42 + $b
Basically you are asking for the ability to add action‑at‑a‑distance as a feature.
There is also a problem, what if you actually wanted $a + may‑die to be replaced instead.
42 + $b
There is no way in your idea for you to specify that.
Even worse, there is a way that could accidently happen. What if may‑die started returning a failure instead of exception. Then it would only cause an exception when you tried to use it, for example by adding it to $a.
If some code throws an exception, the block is in an unrecoverable state and it needs to halt execution. This far, no farther.
If an expression throws an exception, the result of executing the statement it is in, is suspect.
Other statements may rely on that broken statement, so then the whole block is also suspect.
I do not think it would be that good of an idea if it instead allowed the code to continue but with a different result for the current expression. Especially if that value can be far removed from the expression somewhere else inside of the block. (action‑at‑a‑distance)
If you could come up with some code that would be vastly improved with .resume(value), then maybe it could be added.
(I personally think that leave(value) would be more useful in such a circumstance.)
I will grant that .resume(value) seems like it may be useful for control exceptions.
(Caught with CONTROL instead of CATCH.)
I've long been under the impression that goto should never be used if possible.
However, while perusing libavcodec (which is written in C) the other day, I was surprised to notice multiple uses of it.
Is it ever advantageous to use goto in a language that supports loops and functions? If so, why? Please provide a concrete example that clearly justifies the use of a goto.
Everybody who is anti-goto cites, directly or indirectly, Edsger Dijkstra's GoTo Considered Harmful article to substantiate their position. Too bad Dijkstra's article has virtually nothing to do with the way goto statements are used these days and thus what the article says has little to no applicability to the modern programming scene. The goto-less meme verges now on a religion, right down to its scriptures dictated from on high, its high priests and the shunning (or worse) of perceived heretics.
Let's put Dijkstra's paper into context to shed a little light on the subject.
When Dijkstra wrote his paper the popular languages of the time were unstructured procedural ones like BASIC, FORTRAN (the earlier dialects) and various assembly languages. It was quite common for people using the higher-level languages to jump all over their code base in twisted, contorted threads of execution that gave rise to the term "spaghetti code". You can see this by hopping on over to the classic Trek game written by Mike Mayfield and trying to figure out how things work. Take a few moments to look that over.
THIS is "the unbridled use of the go to statement" that Dijkstra was railing against in his paper in 1968. THIS is the environment he lived in that led him to write that paper. The ability to jump anywhere you like in your code at any point you liked was what he was criticising and demanding be stopped. Comparing that to the anaemic powers of goto in C or other such more modern languages is simply risible.
I can already hear the raised chants of the cultists as they face the heretic. "But," they will chant, "you can make code very difficult to read with goto in C." Oh yeah? You can make code very difficult to read without goto as well. Like this one:
#define _ -F<00||--F-OO--;
int F=00,OO=00;main(){F_OO();printf("%1.3f\n",4.*-F/OO/OO);}F_OO()
{
_-_-_-_
_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_
_-_-_-_
}
Not a goto in sight, so it must be easy to read, right? Or how about this one:
a[900]; b;c;d=1 ;e=1;f; g;h;O; main(k,
l)char* *l;{g= atoi(* ++l); for(k=
0;k*k< g;b=k ++>>1) ;for(h= 0;h*h<=
g;++h); --h;c=( (h+=g>h *(h+1)) -1)>>1;
while(d <=g){ ++O;for (f=0;f< O&&d<=g
;++f)a[ b<<5|c] =d++,b+= e;for( f=0;f<O
&&d<=g; ++f)a[b <<5|c]= d++,c+= e;e= -e
;}for(c =0;c<h; ++c){ for(b=0 ;b<k;++
b){if(b <k/2)a[ b<<5|c] ^=a[(k -(b+1))
<<5|c]^= a[b<<5 |c]^=a[ (k-(b+1 ))<<5|c]
;printf( a[b<<5|c ]?"%-4d" :" " ,a[b<<5
|c]);} putchar( '\n');}} /*Mike Laman*/
No goto there either. It must therefore be readable.
What's my point with these examples? It's not language features that make unreadable, unmaintainable code. It's not syntax that does it. It's bad programmers that cause this. And bad programmers, as you can see in that above item, can make any language feature unreadable and unusable. Like the for loops up there. (You can see them, right?)
Now to be fair, some language constructs are easier to abuse than others. If you're a C programmer, however, I'd peer far more closely at about 50% of the uses of #define long before I'd go on a crusade against goto!
So, for those who've bothered to read this far, there are several key points to note.
Dijkstra's paper on goto statements was written for a programming environment where goto was a lot
more potentially damaging than it is in most modern languages that aren't an assembler.
Automatically throwing away all uses of goto because of this is about as rational as saying "I tried
to have fun once but didn't like it so now I'm against it".
There are legitimate uses of the modern (anaemic) goto statements in code that cannot be adequately
replaced by other constructs.
There are, of course, illegitimate uses of the same statements.
There are, too, illegitimate uses of the modern control statements like the "godo" abomination where an always-false do loop is broken out of using break in place of a goto. These are often worse than judicious use of goto.
There are a few reasons for using the "goto" statement that I'm aware of (some have spoken to this already):
Cleanly exiting a function
Often in a function, you may allocate resources and need to exit in multiple places. Programmers can simplify their code by putting the resource cleanup code at the end of the function, and all "exit points" of the function would goto the cleanup label. This way, you don't have to write cleanup code at every "exit point" of the function.
Exiting nested loops
If you're in a nested loop and need to break out of all loops, a goto can make this much cleaner and simpler than break statements and if-checks.
Low-level performance improvements
This is only valid in perf-critical code, but goto statements execute very quickly and can give you a boost when moving through a function. This is a double-edged sword, however, because a compiler typically cannot optimize code that contains gotos.
Note that in all these examples, gotos are restricted to the scope of a single function.
Obeying best practices blindly is not a best practice. The idea of avoiding goto statements as one's primary form of flow control is to avoid producing unreadable spaghetti code. If used sparingly in the right places, they can sometimes be the simplest, clearest way of expressing an idea. Walter Bright, the creator of the Zortech C++ compiler and the D programming language, uses them frequently, but judiciously. Even with the goto statements, his code is still perfectly readable.
Bottom line: Avoiding goto for the sake of avoiding goto is pointless. What you really want to avoid is producing unreadable code. If your goto-laden code is readable, then there's nothing wrong with it.
Well, there's one thing that's always worse than goto's; strange use of other programflow operators to avoid a goto:
Examples:
// 1
try{
...
throw NoErrorException;
...
} catch (const NoErrorException& noe){
// This is the worst
}
// 2
do {
...break;
...break;
} while (false);
// 3
for(int i = 0;...) {
bool restartOuter = false;
for (int j = 0;...) {
if (...)
restartOuter = true;
if (restartOuter) {
i = -1;
}
}
etc
etc
Since goto makes reasoning about program flow hard1 (aka. “spaghetti code”), goto is generally only used to compensate for missing features: The use of goto may actually be acceptable, but only if the language doesn't offer a more structured variant to obtain the same goal. Take Doubt's example:
The rule with goto that we use is that goto is okay to for jumping forward to a single exit cleanup point in a function.
This is true – but only if the language doesn't allow structured exception handling with cleanup code (such as RAII or finally), which does the same job better (as it is specially built for doing it), or when there's a good reason not to employ structured exception handling (but you will never have this case except at a very low level).
In most other languages, the only acceptable use of goto is to exit nested loops. And even there it is almost always better to lift the outer loop into an own method and use return instead.
Other than that, goto is a sign that not enough thought has gone into the particular piece of code.
1 Modern languages which support goto implement some restrictions (e.g. goto may not jump into or out of functions) but the problem fundamentally remains the same.
Incidentally, the same is of course also true for other language features, most notably exceptions. And there are usually strict rules in place to only use these features where indicated, such as the rule not to use exceptions to control non-exceptional program flow.
In C# switch statement doest not allow fall-through. So goto is used to transfer control to a specific switch-case label or the default label.
For example:
switch(value)
{
case 0:
Console.WriteLine("In case 0");
goto case 1;
case 1:
Console.WriteLine("In case 1");
goto case 2;
case 2:
Console.WriteLine("In case 2");
goto default;
default:
Console.WriteLine("In default");
break;
}
Edit: There is one exception on "no fall-through" rule. Fall-through is allowed if a case statement has no code.
I've written more than a few lines of assembly language over the years. Ultimately, every high level language compiles down to gotos. Okay, call them "branches" or "jumps" or whatever else, but they're gotos. Can anyone write goto-less assembler?
Now sure, you can point out to a Fortran, C or BASIC programmer that to run riot with gotos is a recipe for spaghetti bolognaise. The answer however is not to avoid them, but to use them carefully.
A knife can be used to prepare food, free someone, or kill someone. Do we do without knives through fear of the latter? Similarly the goto: used carelessly it hinders, used carefully it helps.
#ifdef TONGUE_IN_CHEEK
Perl has a goto that allows you to implement poor-man's tail calls. :-P
sub factorial {
my ($n, $acc) = (#_, 1);
return $acc if $n < 1;
#_ = ($n - 1, $acc * $n);
goto &factorial;
}
#endif
Okay, so that has nothing to do with C's goto. More seriously, I agree with the other comments about using goto for cleanups, or for implementing Duff's device, or the like. It's all about using, not abusing.
(The same comment can apply to longjmp, exceptions, call/cc, and the like---they have legitimate uses, but can easily be abused. For example, throwing an exception purely to escape a deeply-nested control structure, under completely non-exceptional circumstances.)
Take a look at When To Use Goto When Programming in C:
Although the use of goto is almost always bad programming practice (surely you can find a better way of doing XYZ), there are times when it really isn't a bad choice. Some might even argue that, when it is useful, it's the best choice.
Most of what I have to say about goto really only applies to C. If you're using C++, there's no sound reason to use goto in place of exceptions. In C, however, you don't have the power of an exception handling mechanism, so if you want to separate out error handling from the rest of your program logic, and you want to avoid rewriting clean up code multiple times throughout your code, then goto can be a good choice.
What do I mean? You might have some code that looks like this:
int big_function()
{
/* do some work */
if([error])
{
/* clean up*/
return [error];
}
/* do some more work */
if([error])
{
/* clean up*/
return [error];
}
/* do some more work */
if([error])
{
/* clean up*/
return [error];
}
/* do some more work */
if([error])
{
/* clean up*/
return [error];
}
/* clean up*/
return [success];
}
This is fine until you realize that you need to change your cleanup code. Then you have to go through and make 4 changes. Now, you might decide that you can just encapsulate all of the cleanup into a single function; that's not a bad idea. But it does mean that you'll need to be careful with pointers -- if you plan to free a pointer in your cleanup function, there's no way to set it to then point to NULL unless you pass in a pointer to a pointer. In a lot of cases, you won't be using that pointer again anyway, so that may not be a major concern. On the other hand, if you add in a new pointer, file handle, or other thing that needs cleanup, then you'll need to change your cleanup function again; and then you'll need to change the arguments to that function.
By using goto, it will be
int big_function()
{
int ret_val = [success];
/* do some work */
if([error])
{
ret_val = [error];
goto end;
}
/* do some more work */
if([error])
{
ret_val = [error];
goto end;
}
/* do some more work */
if([error])
{
ret_val = [error];
goto end;
}
/* do some more work */
if([error])
{
ret_val = [error];
goto end;
}
end:
/* clean up*/
return ret_val;
}
The benefit here is that your code following end has access to everything it will need to perform cleanup, and you've managed to reduce the number of change points considerably. Another benefit is that you've gone from having multiple exit points for your function to just one; there's no chance you'll accidentally return from the function without cleaning up.
Moreover, since goto is only being used to jump to a single point, it's not as though you're creating a mass of spaghetti code jumping back and forth in an attempt to simulate function calls. Rather, goto actually helps write more structured code.
In a word, goto should always be used sparingly, and as a last resort -- but there is a time and a place for it. The question should be not "do you have to use it" but "is it the best choice" to use it.
The rule with goto that we use is that goto is okay to for jumping forward to a single exit cleanup point in a function. In really complex functions we relax that rule to allow other jump forwards. In both cases we are avoiding deeply nested if statements that often occur with error code checking, which helps readability and maintance.
The most thoughtful and thorough discussion of goto statements, their legitimate uses, and alternative constructs that can be used in place of "virtuous goto statements" but can be abused as easily as goto statements, is Donald Knuth's article "Structured Programming with goto Statements", in the December 1974 Computing Surveys (volume 6, no. 4. pp. 261 - 301).
Not surprisingly, some aspects of this 39-year old paper are dated: Orders-of-magnitude increases in processing power make some of Knuth's performance improvements unnoticeable for moderately sized problems, and new programming-language constructs have been invented since then. (For example, try-catch blocks subsume Zahn's Construct, although they are rarely used in that way.) But Knuth covers all sides of the argument, and should be required reading before anyone rehashes the issue yet again.
I find it funny that some people will go as far as to give a list of cases where goto is acceptable, saying that all other uses are unacceptable. Do you really think that you know every case where goto is the best choice for expressing an algorithm?
To illustrate, I'll give you an example that no one here has shown yet:
Today I was writing code for inserting an element in a hash table. The hash table is a cache of previous calculations which can be overwritten at will (affecting performance but not correctness).
Each bucket of the hash table has 4 slots, and I have a bunch of criteria to decide which element to overwrite when a bucket is full. Right now this means making up to three passes through a bucket, like this:
// Overwrite an element with same hash key if it exists
for (add_index=0; add_index < ELEMENTS_PER_BUCKET; add_index++)
if (slot_p[add_index].hash_key == hash_key)
goto add;
// Otherwise, find first empty element
for (add_index=0; add_index < ELEMENTS_PER_BUCKET; add_index++)
if ((slot_p[add_index].type == TT_ELEMENT_EMPTY)
goto add;
// Additional passes go here...
add:
// element is written to the hash table here
Now if I didn't use goto, what would this code look like?
Something like this:
// Overwrite an element with same hash key if it exists
for (add_index=0; add_index < ELEMENTS_PER_BUCKET; add_index++)
if (slot_p[add_index].hash_key == hash_key)
break;
if (add_index >= ELEMENTS_PER_BUCKET) {
// Otherwise, find first empty element
for (add_index=0; add_index < ELEMENTS_PER_BUCKET; add_index++)
if ((slot_p[add_index].type == TT_ELEMENT_EMPTY)
break;
if (add_index >= ELEMENTS_PER_BUCKET)
// Additional passes go here (nested further)...
}
// element is written to the hash table here
It would look worse and worse if more passes are added, while the version with goto keeps the same indentation level at all times and avoids the use of spurious if statements whose result is implied by the execution of the previous loop.
So there's another case where goto makes the code cleaner and easier to write and understand... I'm sure there are many more, so don't pretend to know all the cases where goto is useful, dissing any good ones that you couldn't think of.
One of the reasons goto is bad, besides coding style is that you can use it to create overlapping, but non-nested loops:
loop1:
a
loop2:
b
if(cond1) goto loop1
c
if(cond2) goto loop2
This would create the bizarre, but possibly legal flow-of-control structure where a sequence like (a, b, c, b, a, b, a, b, ...) is possible, which makes compiler hackers unhappy. Apparently there are a number of clever optimization tricks that rely on this type of structure not occuring. (I should check my copy of the dragon book...) The result of this might (using some compilers) be that other optimizations aren't done for code that contains gotos.
It might be useful if you know it just, "oh, by the way", happens to persuade the compiler to emit faster code. Personally, I'd prefer to try to explain to the compiler about what's probable and what's not before using a trick like goto, but arguably, I might also try goto before hacking assembler.
Some say there is no reason for goto in C++. Some say that in 99% cases there are better alternatives. This is not reasoning, just irrational impressions. Here's a solid example where goto leads to a nice code, something like enhanced do-while loop:
int i;
PROMPT_INSERT_NUMBER:
std::cout << "insert number: ";
std::cin >> i;
if(std::cin.fail()) {
std::cin.clear();
std::cin.ignore(1000,'\n');
goto PROMPT_INSERT_NUMBER;
}
std::cout << "your number is " << i;
Compare it to goto-free code:
int i;
bool loop;
do {
loop = false;
std::cout << "insert number: ";
std::cin >> i;
if(std::cin.fail()) {
std::cin.clear();
std::cin.ignore(1000,'\n');
loop = true;
}
} while(loop);
std::cout << "your number is " << i;
I see these differences:
nested {} block is needed (albeit do {...} while looks more familiar)
extra loop variable is needed, used in four places
it takes longer time to read and understand the work with the loop
the loop does not hold any data, it just controls the flow of the execution, which is less comprehensible than simple label
There is another example
void sort(int* array, int length) {
SORT:
for(int i=0; i<length-1; ++i) if(array[i]>array[i+1]) {
swap(data[i], data[i+1]);
goto SORT; // it is very easy to understand this code, right?
}
}
Now let's get rid of the "evil" goto:
void sort(int* array, int length) {
bool seemslegit;
do {
seemslegit = true;
for(int i=0; i<length-1; ++i) if(array[i]>array[i+1]) {
swap(data[i], data[i+1]);
seemslegit = false;
}
} while(!seemslegit);
}
You see it is the same type of using goto, it is well structured pattern and it is not forward goto as many promote as the only recommended way. Surely you want to avoid "smart" code like this:
void sort(int* array, int length) {
for(int i=0; i<length-1; ++i) if(array[i]>array[i+1]) {
swap(data[i], data[i+1]);
i = -1; // it works, but WTF on the first glance
}
}
The point is that goto can be easily misused, but goto itself is not to blame. Note that label has function scope in C++, so it does not pollute global scope like in pure assembly, in which overlapping loops have its place and are very common - like in the following code for 8051, where 7segment display is connected to P1. The program loops lightning segment around:
; P1 states loops
; 11111110 <-
; 11111101 |
; 11111011 |
; 11110111 |
; 11101111 |
; 11011111 |
; |_________|
init_roll_state:
MOV P1,#11111110b
ACALL delay
next_roll_state:
MOV A,P1
RL A
MOV P1,A
ACALL delay
JNB P1.5, init_roll_state
SJMP next_roll_state
There is another advantage: goto can serve as named loops, conditions and other flows:
if(valid) {
do { // while(loop)
// more than one page of code here
// so it is better to comment the meaning
// of the corresponding curly bracket
} while(loop);
} // if(valid)
Or you can use equivalent goto with indentation, so you don't need comment if you choose the label name wisely:
if(!valid) goto NOTVALID;
LOOPBACK:
// more than one page of code here
if(loop) goto LOOPBACK;
NOTVALID:;
I have come across a situation where a goto was a good solution, and I have not seen this example here or anywhere.
I had a switch case with a few cases which all needed to call the same function in the end. I had other cases which all needed to call a different function in the end.
This looked a bit like this:
switch( x ) {
case 1: case1() ; doStuffFor123() ; break ;
case 2: case2() ; doStuffFor123() ; break ;
case 3: case3() ; doStuffFor123() ; break ;
case 4: case4() ; doStuffFor456() ; break ;
case 5: case5() ; doStuffFor456() ; break ;
case 6: case6() ; doStuffFor456() ; break ;
case 7: case7() ; doStuffFor789() ; break ;
case 8: case8() ; doStuffFor789() ; break ;
case 9: case9() ; doStuffFor789() ; break ;
}
Instead of giving every case a function call, I replaced the break by a goto. The goto jumps to a label which is also inside the switch case.
switch( x ) {
case 1: case1() ; goto stuff123 ;
case 2: case2() ; goto stuff123 ;
case 3: case3() ; goto stuff123 ;
case 4: case4() ; goto stuff456 ;
case 5: case5() ; goto stuff456 ;
case 6: case6() ; goto stuff456 ;
case 7: case7() ; goto stuff789 ;
case 8: case8() ; goto stuff789 ;
case 9: case9() ; goto stuff789 ;
stuff123: doStuffFor123() ; break ;
stuff456: doStuffFor456() ; break ;
stuff789: doStuffFor789() ; break ;
}
cases 1 through 3 all must call doStuffFor123() and similarly cases 4 through 6 had to call doStuffFor456() etc.
In my opinion, gotos are perfectly fine if you use them correctly. In the end, any code is as clear as people write it. With gotos one can make spaghetti code, but that does not mean that gotos are the cause of the spaghetti code. That cause is us; programmers. I can also create spaghetti code with functions if I want to. The same goes for macros as well.
In a Perl module, you occasionally want to create subroutines or closures on the fly. The thing is, that once you have created the subroutine, how do you get to it. You could just call it, but then if the subroutine uses caller() it won't be as helpful as it could be. That is where the goto &subroutine variation can be helpful.
Here is a quick example:
sub AUTOLOAD{
my($self) = #_;
my $name = $AUTOLOAD;
$name =~ s/.*:://;
*{$name} = my($sub) = sub{
# the body of the closure
}
goto $sub;
# nothing after the goto will ever be executed.
}
You can also use this form of goto to provide a rudimentary form of tail-call optimization.
sub factorial($){
my($n,$tally) = (#_,1);
return $tally if $n <= 1;
$tally *= $n--;
#_ = ($n,$tally);
goto &factorial;
}
( In Perl 5 version 16 that would be better written as goto __SUB__; )
There is a module that will import a tail modifier and one that will import recur if you don't like using this form of goto.
use Sub::Call::Tail;
sub AUTOLOAD {
...
tail &$sub( #_ );
}
use Sub::Call::Recur;
sub factorial($){
my($n,$tally) = (#_,1);
return $tally if $n <= 1;
recur( $n-1, $tally * $n );
}
Most of the other reasons to use goto are better done with other keywords.
Like redoing a bit of code:
LABEL: ;
...
goto LABEL if $x;
{
...
redo if $x;
}
Or going to the last of a bit of code from multiple places:
goto LABEL if $x;
...
goto LABEL if $y;
...
LABEL: ;
{
last if $x;
...
last if $y
...
}
1) The most common use of goto that I know of is emulating exception handling in languages that don't offer it, namely in C. (The code given by Nuclear above is just that.) Look at the Linux source code and you'll see a bazillion gotos used that way; there were about 100,000 gotos in Linux code according to a quick survey conducted in 2013: http://blog.regehr.org/archives/894. Goto usage is even mentioned in the Linux coding style guide: https://www.kernel.org/doc/Documentation/CodingStyle. Just like object-oriented programming is emulated using structs populated with function pointers, goto has its place in C programming. So who is right: Dijkstra or Linus (and all Linux kernel coders)? It's theory vs. practice basically.
There is however the usual gotcha for not having compiler-level support and checks for common constructs/patterns: it's easier to use them wrong and introduce bugs without compile-time checks. Windows and Visual C++ but in C mode offer exception handling via SEH/VEH for this very reason: exceptions are useful even outside OOP languages, i.e. in a procedural language. But the compiler can't always save your bacon, even if it offers syntactic support for exceptions in the language. Consider as example of the latter case the famous Apple SSL "goto fail" bug, which just duplicated one goto with disastrous consequences (https://www.imperialviolet.org/2014/02/22/applebug.html):
if (something())
goto fail;
goto fail; // copypasta bug
printf("Never reached\n");
fail:
// control jumps here
You can have exactly the same bug using compiler-supported exceptions, e.g. in C++:
struct Fail {};
try {
if (something())
throw Fail();
throw Fail(); // copypasta bug
printf("Never reached\n");
}
catch (Fail&) {
// control jumps here
}
But both variants of the bug can be avoided if the compiler analyzes and warns you about unreachable code. For example compiling with Visual C++ at the /W4 warning level finds the bug in both cases. Java for instance forbids unreachable code (where it can find it!) for a pretty good reason: it's likely to be a bug in the average Joe's code. As long as the goto construct doesn't allow targets that the compiler can't easily figure out, like gotos to computed addresses(**), it's not any harder for the compiler to find unreachable code inside a function with gotos than using Dijkstra-approved code.
(**) Footnote: Gotos to computed line numbers are possible in some versions of Basic, e.g. GOTO 10*x where x is a variable. Rather confusingly, in Fortran "computed goto" refers to a construct that is equivalent to a switch statement in C. Standard C doesn't allow computed gotos in the language, but only gotos to statically/syntactically declared labels. GNU C however has an extension to get the address of a label (the unary, prefix && operator) and also allows a goto to a variable of type void*. See https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html for more on this obscure sub-topic. The rest of this post ins't concerned with that obscure GNU C feature.
Standard C (i.e. not computed) gotos are not usually the reason why unreachable code can't be found at compile time. The usual reason is logic code like the following. Given
int computation1() {
return 1;
}
int computation2() {
return computation1();
}
It's just as hard for a compiler to find unreachable code in any of the following 3 constructs:
void tough1() {
if (computation1() != computation2())
printf("Unreachable\n");
}
void tough2() {
if (computation1() == computation2())
goto out;
printf("Unreachable\n");
out:;
}
struct Out{};
void tough3() {
try {
if (computation1() == computation2())
throw Out();
printf("Unreachable\n");
}
catch (Out&) {
}
}
(Excuse my brace-related coding style, but I tried to keep the examples as compact as possible.)
Visual C++ /W4 (even with /Ox) fails to find unreachable code in any of these, and as you probably know the problem of finding unreachable code is undecidable in general. (If you don't believe me about that: https://www.cl.cam.ac.uk/teaching/2006/OptComp/slides/lecture02.pdf)
As a related issue, the C goto can be used to emulate exceptions only inside the body of a function. The standard C library offers a setjmp() and longjmp() pair of functions for emulating non-local exits/exceptions, but those have some serious drawbacks compared to what other languages offer. The Wikipedia article http://en.wikipedia.org/wiki/Setjmp.h explains fairly well this latter issue. This function pair also works on Windows (http://msdn.microsoft.com/en-us/library/yz2ez4as.aspx), but hardly anyone uses them there because SEH/VEH is superior. Even on Unix, I think setjmp and longjmp are very seldom used.
2) I think the second most common use of goto in C is implementing multi-level break or multi-level continue, which is also a fairly uncontroversial use case. Recall that Java doesn't allow goto label, but allows break label or continue label. According to http://www.oracle.com/technetwork/java/simple-142616.html, this is actually the most common use case of gotos in C (90% they say), but in my subjective experience, system code tends to use gotos for error handling more often. Perhaps in scientific code or where the OS offers exception handling (Windows) then multi-level exits are the dominant use case. They don't really give any details as to the context of their survey.
Edited to add: it turns out these two use patterns are found in the C book of Kernighan and Ritchie, around page 60 (depending on edition). Another thing of note is that both use cases involve only forward gotos. And it turns out that MISRA C 2012 edition (unlike the 2004 edition) now permits gotos, as long as they are only forward ones.
If so, why?
C has no multi-level/labelled break, and not all control flows can be easily modelled with C's iteration and decision primitives. gotos go a long way towards redressing these flaws.
Sometimes it's clearer to use a flag variable of some kind to effect a kind of pseudo-multi-level break, but it's not always superior to the goto (at least a goto allows one to easily determine where control goes to, unlike a flag variable), and sometimes you simply don't want to pay the performance price of flags/other contortions to avoid the goto.
libavcodec is a performance-sensitive piece of code. Direct expression of the control flow is probably a priority, because it'll tend to run better.
I find the do{} while(false) usage utterly revolting. It is conceivable might convince me it is necessary in some odd case, but never that it is clean sensible code.
If you must do some such loop, why not make the dependence on the flag variable explicit?
for (stepfailed=0 ; ! stepfailed ; /*empty*/)
The GOTO can be used, of course, but there is one more important thing than the code style, or if the code is or not readable that you must have in mind when you use it: the code inside may not be as robust as you think.
For instance, look at the following two code snippets:
If A <> 0 Then A = 0 EndIf
Write("Value of A:" + A)
An equivalent code with GOTO
If A == 0 Then GOTO FINAL EndIf
A = 0
FINAL:
Write("Value of A:" + A)
The first thing we think is that the result of both bits of code will be that "Value of A: 0" (we suppose an execution without parallelism, of course)
That's not correct: in the first sample, A will always be 0, but in the second sample (with the GOTO statement) A might not be 0. Why?
The reason is because from another point of the program I can insert a GOTO FINAL without controlling the value of A.
This example is very obvious, but as programs get more complicated, the difficulty of seeing those kind of things increases.
Related material can be found into the famous article from Mr. Dijkstra "A case against the GO TO statement"
It comes in handy for character-wise string processing from time to time.
Imagine something like this printf-esque example:
for cur_char, next_char in sliding_window(input_string) {
if cur_char == '%' {
if next_char == '%' {
cur_char_index += 1
goto handle_literal
}
# Some additional logic
if chars_should_be_handled_literally() {
goto handle_literal
}
# Handle the format
}
# some other control characters
else {
handle_literal:
# Complicated logic here
# Maybe it's writing to an array for some OpenGL calls later or something,
# all while modifying a bunch of local variables declared outside the loop
}
}
You could refactor that goto handle_literal to a function call, but if it's modifying several different local variables, you'd have to pass references to each unless your language supports mutable closures. You'd still have to use a continue statement (which is arguably a form of goto) after the call to get the same semantics if your logic makes an else case not work.
I have also used gotos judiciously in lexers, typically for similar cases. You don't need them most of the time, but they're nice to have for those weird cases.
In Perl, use of a label to "goto" from a loop - using a "last" statement, which is similar to break.
This allows better control over nested loops.
The traditional goto label is supported too, but I'm not sure there are too many instances where this is the only way to achieve what you want - subroutines and loops should suffice for most cases.
I use goto in the following case:
when needed to return from funcions at different places, and before return some uninitialization needs to be done:
non-goto version:
int doSomething (struct my_complicated_stuff *ctx)
{
db_conn *conn;
RSA *key;
char *temp_data;
conn = db_connect();
if (ctx->smth->needs_alloc) {
temp_data=malloc(ctx->some_size);
if (!temp_data) {
db_disconnect(conn);
return -1;
}
}
...
if (!ctx->smth->needs_to_be_processed) {
free(temp_data);
db_disconnect(conn);
return -2;
}
pthread_mutex_lock(ctx->mutex);
if (ctx->some_other_thing->error) {
pthread_mutex_unlock(ctx->mutex);
free(temp_data);
db_disconnect(conn);
return -3;
}
...
key=rsa_load_key(....);
...
if (ctx->something_else->error) {
rsa_free(key);
pthread_mutex_unlock(ctx->mutex);
free(temp_data);
db_disconnect(conn);
return -4;
}
if (ctx->something_else->additional_check) {
rsa_free(key);
pthread_mutex_unlock(ctx->mutex);
free(temp_data);
db_disconnect(conn);
return -5;
}
pthread_mutex_unlock(ctx->mutex);
free(temp_data);
db_disconnect(conn);
return 0;
}
goto version:
int doSomething_goto (struct my_complicated_stuff *ctx)
{
int ret=0;
db_conn *conn;
RSA *key;
char *temp_data;
conn = db_connect();
if (ctx->smth->needs_alloc) {
temp_data=malloc(ctx->some_size);
if (!temp_data) {
ret=-1;
goto exit_db;
}
}
...
if (!ctx->smth->needs_to_be_processed) {
ret=-2;
goto exit_freetmp;
}
pthread_mutex_lock(ctx->mutex);
if (ctx->some_other_thing->error) {
ret=-3;
goto exit;
}
...
key=rsa_load_key(....);
...
if (ctx->something_else->error) {
ret=-4;
goto exit_freekey;
}
if (ctx->something_else->additional_check) {
ret=-5;
goto exit_freekey;
}
exit_freekey:
rsa_free(key);
exit:
pthread_mutex_unlock(ctx->mutex);
exit_freetmp:
free(temp_data);
exit_db:
db_disconnect(conn);
return ret;
}
The second version makes it easier, when you need to change something in the deallocation statements (each is used once in the code), and reduces the chance to skip any of them, when adding a new branch. Moving them in a function will not help here, because the deallocation can be done at different "levels".
Use "goto" wherever it makes your code more readable or run faster. Just don't let it turn your code into spaghetti.
The problem with 'goto' and the most important argument of the 'goto-less programming' movement is, that if you use it too frequently your code, although it might behave correctly, becomes unreadable, unmaintainable, unreviewable etc. In 99.99% of the cases 'goto' leads to spaghetti code. Personally, I cannot think of any good reason as to why I would use 'goto'.
Edsger Dijkstra, a computer scientist that had major contributions on the field, was also famous for criticizing the use of GoTo.
There's a short article about his argument on Wikipedia.