Should exception logging subsystem have limited throughput ? If yes, how? - exception

We had a case when exceptions had gone in some kind of infinite loop.
Stack traces were very big and we log all of them.
That flood our Oracle database and when redo logs reached their size limit db stopped.
EDIT: Of course that the most important thing is to find the cause of infinite loop an correct the bug in the system. We already did that and that is not the question here.
The system could have more bugs like that (it's an windows service and it's running constantly) and in that case one app broke the whole DB, meaning all applications on that Oracle DB.
I'm mostly interested in your experiences, architecturally. And that from other logging frameworks like log4net, log4j and others. How do they handle flood of exceptions ? Just handle them like all other exceptions ?

I think your situation illustrates that there should definitely be some mechanism in place to prevent exception logs from causing a denial-of-service anywhere, as this has done.
If you use the Windows event logs, this can be handled for you automatically, as old records can automatically be wiped out when the log is full. You could code a DB-based system to do the same thing, as well.
Of course, you want to do everything you can to eliminate such errors in the first place where ever possible, too!
Another option may be to detect and ignore multiple, consecutive errors of the same time... perhaps simply updating a count property/field instead.

I'd worry more about the root cause of the infinite loop then I would about limiting logging.
I'd check your code for methods that catch an exception, log the stack trace, and re-throw. I'd argue that catching and re-throwing is not exception handling. If a class truly can't handle the exception, it's better to let it simply bubble up until it reaches a single point where someone can deal with it.
Redo logs? How often do you flush those? Surely you don't have one big transaction, do you?

Can you do the logging to a different database with no redo logs? That will protect the production database.
In our applications whe have a central exceptionhandler where all execeptions go through
void OnExceptionOccurs(Exception ex,
string enduserFriendlyContextDescription,
string tecnicalContextDescription,
ILogger loggerBelongingToProcess)
that handler can decide how to log and you have a central location for breakpoint when debugging

Related

Biztalk exception- self healing orchestartion

We have main orchestration that has multiple sub orchestration. All root orchestration is of transaction type:none, hence all the sub are also of same nature. Now any exception is caught in a parent scope of main orchestration and we have some steps like logging. The orchestration is activated with a message from App SQL. So every time an exception occurs, say due to something intermittent, like unable to connect to web service. We later go manually re-trigger.
I'm looking at modifying the orch to be self healing, say from exception catch block it reinitialize the messages based on conditions that tell, the issue was intermittent. Something like app issue-null reference, we would not want to resend message, because, the orch is never going to work.
There is a concept called compensation, but that is for transaction based orch- do n steps if any 1 fails, do m other steps(which would do alternate action or cleanup).
The only idea I have is do a look-up based on keywords in exception and decide to resend messages. But I want some1 to challenge this or suggest a better approach
I have always thought that it's better to handle failures offline. So if the orchestration fails, terminate it. But before you terminate, send a message out. This message will contain all the information necessary to recover the message processing if it turns out that there was a temporary problem which caused the failure. The message can be consumed by a "caretaker" process which is responsible for recovery.
This is similar to how the Erlang OTP framework approaches high availability. Processes fail quickly and caretaker processes make sure recovery happens.

Which exceptions should I handle?

I am designing a WinForms application.
At the moment, all my exceptions are being logged at the UI level.
However, for none of them, do I do anything other than logging. Is this indicative of a bad design?
Furthermore, in one method (.NET's method execute a command on a windows service), it can throw exceptions of type Win32Exception and InvalidOperationException.
With an exception like FileNotFound, I could prompt the user to provide another file (although .NET has methods built-in to check for the file's existence), but with exceptions like the above, they are down to low-level problems with the machine, so these can only be logged really.
Is this the right way to go with deciding which exceptions to catch? Also, should I catch or throw ArgumentNullException? It indicates a problem with the code, right?
(I'll use Eric Lippert's taxonomy of exceptions throughout the answer.)
If there is nothing you can do about it, then just log and bail out of the current operation, screen or the entire application, depending on the seriousness of the error. Just don't try to proceed in the face of fatal exceptions. In some extreme cases (like an AccessViolationException), just logging or even letting your finally blocks run may not be a good idea because you don't know what will happen if you run code in a corrupt process.
FileNotFoundException and other exogenous exceptions you should handle anyways. Even though you can check if a file exists beforehand, nothing prevents it from becoming inaccessible in between the check and its use. These exceptions depend on external conditions that you have no control over, so you should be prepared to handle them.
You should never catch ArgumentNullException or any other boneheaded exceptions. If it hurts when you do that, don't do it. If you pass a null argument when you shouldn't, then don't pass it. Fix the code so that it deals with the null reference beforehand.

what is the gist of exception handling

please verify me if I am right: when a program encounters a exception we should write code to handle that exception, the handler should do some recovery job, like rerun the program, but is not just telling us where we went wrong in real world application.
When you throw an exception you're saying:
"Something happened and I can't handle it here. I'm passing the buck!"
When you catch you say:
"Ok I can do something. Perhaps loop around and try again; maybe log an error message".
You can even choose to ignore the error but that's usually discouraged.
In general the thrower captures the state of the failure and the catcher acts on the failure.
In real life, exceptions don't always make error handling easier; but it helps keeps errors out of the main line code path. It's an alternate execution flow. You often hear this: "Exceptions should be used only for exceptional things."
This is a controversial topic, so I expect some to disagree with what I'm going to say.
Exceptions are for exceptional circumstances, namely, the following two classes of problems:
Program Bug
External Problems
In the former case, your program has gotten into a state it shouldn't be in. So you throw an exception and catch it at a level high enough where the program can gracefully continue. In general, this should be fairly high in your program. The reality is, if there's a bug in the middle of an operation, there's not much you can do to recover (after all, if you knew there was a bug, you'd fix it!). Best is to log it, let the user know and move on, if possible. Terminate the current operation, dialog, whatever, or even the whole program.
In the latter case, you are dealing with capricious and fickle universe. Things might go wrong through no fault of your own. In this case, you should try to detect errors as close to the source as you can and deal with them as best you can. If you sending an email to a flaky server results in an exception, it might be reasonable to try again (warning the user). If the database connection goes down, you could try again, but it might be better to give up and kill the current operation. It depends on how well you understand the external problems that might arise and what can actually be done about them.
If you have known error conditions, such as errors in user input or other data sources (e.g., XML parse error, user picked wrong choice on a form, etc.), it's probably best not to throw an exception, but instead gather and report the errors in a more structured fashion. In one project of mine, I have an error reporter class that can gather errors without interrupting program flow. Then those errors can be reported to the user, or logged.
Often times, I think the best approach is not to catch the error, especially if you don't have a specific response for the error. In general, I think the approach of "catch and try again" is flawed. The cause should be identified and corrected. You shouldn't just keep ramming into a brick wall.
Exceptions should be thrown when, and only when, a method/property/whatever is unable to fulfill its contract. The only time an exception should be caught without either rethrowing it or wrapping it in a new exception and throwing that is when the method that caught the exception can fulfill its contract despite the inner method's failure to fulfill its contract. While it may be hard to determine the optimal contract for a routine, once the contract is in place, deciding whether to throw an exception will be easy: do what the contract says.
It really depends on the error and how you want to handle it. In a lot of my automation systems, if something goes wrong, I want the program to send an email out with a specific error and then terminate. Other times I want to catch the error and run a different process to back out data that I had previously entered.
Sounds like you have the general idea down.

Rethrowing exception question

I read several posts on exception handling/rethrowing exceptions on here (by looking at the highest voted threads), but I am slightly confused:
-Why would you not want the immediate catch block to handle an exception but rather something above it?
-Also, I read quite frequently that you should only handle exceptions which you can "handle". Does that mean actually doing something about it, such as retrying the operation?
You might want to catch an exception (e.g. file not found) and do some processing - e.g. if you open two files and the second file is missing, you will want to close the first file again before you continue, so that it isn't left open.
You might then want to tell the caller that an error occurred, so you re-throw the same exception or throw a new exception, describing the problem.
In some cases, if you get an exception, your code has no way of knowing if it is an error or not (e.g. if you are asked to load an XML file, but you get a File Not Found exception, is that an error, or should you return a blank XMl result?). In these cases you either want to re-throw the exception, or not handle it all all, and let the calling code decide how to deal with the problem.
Your second point is the answer to the first. Sometimes the lower-level functionality does not know enough about the context of the application to know what the right action should be. For example, if opening a file for reading fails because there is no file of that name, then the application might want to ask for a different file, or abort the whole operation, or whatever. At some level, some part of the application will take the responsibility to do the right thing, unless of course just having the program crash is an acceptable action to take.
Answering to your second question - you need to handle the exception in the immediate block only if can do anything about it: for example close connection to db, close streams, retry or retry with different params, log exception (if there will not be an exception generic handler on the higher levels). Probably only immediate block of code knows such details and can handle them. Calling blocks need to know that the error occurred they might know better what to do with exception.
For example immediate block works with a file. A caller might try to open a file from different locations(In the process of "probing") and ignore several errors as long as at least one succeeds. Another part of code might consider the very first failed attempt as an error. Caller block might chose to notify the user that an error is occurred, probably let her/him know some helpful info on how to fix the problem. Also it is nice to provide the means to notify support of the problem – some kind of dialog allowing user to ask for help, describe problem and send a message. In this message you might attach logs, some info about the environment like OS, versions of frameworks, programs, browser capabilities whatever you need to diagnose the problem (if user permits you to do so).
An exception is "handled" if the method which caught it can satisfy its construct. For example, the contract for a routine OpenRecentDocument which is called when the user selects an item from the "recent files" menu might specify that it must either (1) successfully open a document window, or (2) try unsuccessfully to open a document window, roll back any side-effects resulting from the attempt, and notify the user of the what happened. If OpenRecentDocument catches an exception while trying to open the file, but it is able to roll back any side effects from the attempt and notify the user, the routine will have satisfied its contract and should thus return without rethrowing the exception.
One unfortunate "gotcha" in all this is that there isn't any standard means by which routines which throw an exception can indicate whether their attempted operation has resulted in side-effects which could not be rolled back. There is no inherent way, for example, of distinguishing an InvalidOperationException which occurs unexpectedly while updating a shared data structure (which would imply that other open documents may have been corrupted), from an InvalidOperationException which occurs while updating the data associated with the document being loaded, even if one has anticipated the latter possibility and provided for it. The best one can do is either try to catch any InvalidOperationException which might occur in the latter situation near the spot that it occurs, encapsulate that exception in some other exception type, and throw that, or else have data structures maintain an "object corrupted" flag and ensure that if a data structure is found to be corrupt, all future operations on it will fail as cleanly as possible. Neither approach is at all elegant. The more common approach, which could probably be described as "hope for the best", usually works.

How to implement top level exception handling?

I recently had to develop an additional module for an existing service developed by a colleague. He had placed a try/catch block in the main working function for catching all unhadled exceptions that bubbled up to this level, logging them together with stack trace info etc:
try
{
// do main work
}
catch(Exception ex)
{
// log exception info
}
While this makes the program very stable (as in 'unlikely to crash'), I hate it because when I am testing my code, I do not see the exceptions caused by it. Of course I can look at the exception log and see if there are new entries, but I very much prefer the direct feedback of getting the exception the moment it is thrown (with the cursor on the right line in the code, please).
I removed this top level try/catch at least while i was still coding and testing. But now my task is finished and I have to decide wether to put it back in for the release, or not.
I think that I should do it, because it makes the service more stable, and the whole point of it is that it runs in the background without needing any supervision. On the other hand I have read that one should only call specific exceptions (as in IoException), not generically Exception.
What is your advice on this issue?
By the way, the project is written in C#, but I am also interested in answers for non .NET languages.
Put it back.
The exception should be only relevant while testing. Otherwise it doesn't make sense pop it to the user.
Logging is fine.
You may additionally use the DEBUG symbol defined by Visual Studio for debug builds as a flag.
...
} catch( Exception e ) {
#if DEBUG
throw;
#else
log as usual
#endif
}
So next time a modification is needed the debug flag should be set to true and the exception will pop up.
In any Java application, you just about always want to define an exception handler for uncaught exceptions with something like this:
Thread.setDefaultUncaughtExceptionHandler( ... );
where the object that will catch these uncaught exceptions will at least log the failure so you have a chance to know about it. Otherwise, there is no guarantee that you'll even be notified that a Thread took an Exception -- not unless it's your main Thread.
In addition to doing this, most of my threads have a try/catch where I'll catch RunnableException (but not Error) and log it ... some Threads will die when this happens, others will log and ignore, others will display a complaint to the user and let the user decide, depending on the needs of the Application. This try/catch is at the very root of the Thread, in the Runnable.run() method or equivalent. Since the try/catch is at the root of the Thread, there's no need to sometimes disable this catch.
When I write in C#, I code similarly. But this all depends on the need of the application. Is the Exception one that will corrupt data? Well then, don't catch and ignore it. ALWAYS log it, but then Let the application die. Most Exceptions are not of this sort, however.
Ideally you want to handle an exception as close to where it occured as possible but that doesn't mean a global exception handler is a bad idea. Especially for a service which must remain running at all costs. I would continue what you have been doing. Disable it while debugging but leave it in place for production.
Keep in mind it should be used as a safety net. Still try to catch all exceptions before they elevate that far.
The urge to catch all exceptions and make the program 'stable' is very strong and the idea seems very enticing indeed to everyone. The problem as you point out is that this is just a ruse and the program may very well be buggy and worse, w/o indications of failures. No one monitors the logs regularly.
My advice would be to try and convince the other developer to do extensive testing and then deploy it in production w/o the outer catch.
If you want to see the exception when it occurs, in Visual Studio you can go to the DEBUG menu, select EXCEPTIONS and you can tell the debugger to Break as soon as an Exception is thrown. You even get to pick what type of Exceptions. :)