Should I Log or not? - language-agnostic

I am sure lots of you had this debate: What to write or not to the application log file.
I am not talking about the trivial error exception which we surely log inside the catch clauses.
Let's say we have a standart application which is connecting to database doing some selects.
we have a Dao object which each method in it wrapping a select query.
I would like to have your suggesations. Should I log every entrance and exit before I execute any selection? Should I log the result?
what about logging the error stacktrace? I find it very messy and overloading the log file.
could anyone recommend me on a good article in this subject(not necessary about logging database executions but generally)?
Thanks,
ray.

Logging means exacly that: taking notes when something happens. So you need to understand your needs as developer, and the needs of your customers. In both cases, try to figure out what do you need to accomplish your task.
As a developer, you should decide what level of confidence do you have with your software: if it is fully tested and debugged, then you could not log anything at all and just try to trace crashes. If on the other hand you are doing debug, you could need more detail. And in general, you should leave the possibility to turn logging off when confidence increases, and turning it on when thngs start to fail, possibly through a configuration setting. When you need to decide what to log, ask yourself: if it crashed, will this information help me identify the problem or will it be just noise?
For you customers, it depends. On a shared system for example, it's good to know who did what, so it happens to me to log actions that customers do. You should agree that with your customer.

don't log more than necessary.
more detailed explanation here http://www.codinghorror.com/blog/2008/12/the-problem-with-logging.html
cheers

that's why you have various log levels. the purely informational stuff you log at LOG_INFO and the debugging stuff you log at LOG_DEBUG. what actually gets logged is up to the user.

Related

Resumable exceptions: any real-life scenario?

As the topic states, I can hardly imagine, where and when to use resumable exceptions in a real life example and which effective advantage we might get by the usage of them.
What I can imagine is, a subsystem is connected, let's say via RFC to a session, which is held open. The subsystem has to pass some shopflor-data to sap, let us say, in an usual way, in the frequency of any weight/piece/liter, which is processed.
Somehow something fails.
I can get all of this done, without the use of a resumable exceptions, so, besides that this exception seems to keep track of the entire context( what does not seem to be a NEW feature), does anybody have a clue, what this is really all about ?
A non-resumable exception is an error of the kind "Something went wrong here, and I can't continue running the program as desired any more. TILT." The caller just has to deal with that.
A resumable exception still tells the caller that something went wrong, but it defers the decision if the program can be continued to the caller. I expect there to be only a few scenarios where this might be useful. Mass updates might be one scenario: "You wanted me to update both the material price and the text; I've changed the price, but the text in language ZH doesn't exist. I don't know whether you'd like to abort the operation entirely (RETURN) or keep the updated price and disregard the missing text (RESUME)."

Juding whether an exception is exceptional

It's a pretty popular and well known phrase that you should "only catch/throw exceptions which are exceptional". However, how is an "exceptional" exception determined?
For example, a bad password is very routine in logging into a service, so this is not exceptional. Statistics for a web app would probably show something like one bad login attempt for every 5 attempts (from no specific user). Likewise, with attempting to go to a checkout with a basket in an online store, this could be very commmon (especially for new users). However, a file not found could go either way. I usually work along the lines that if a method is missing something to do its work, throw an exception, but then it gets a little confusing here. In some cases, a file not found could be common (e.g. a file share used by many users with no tight controls), compared to a very locked down production environment missing a file, which would be exceptional.
Is this the right way to deduce between whether an exception is exceptional or not? I can easily filter things like no network connection etc as exceptional, but some cases are hard to judge. Is it subjective?
Thanks
I think it's pretty subjective, honestly, so I prefer to avoid that method of figuring out when I should use exceptions.
Instead, I prefer to consider three things:
Is it likely that I might want to let the call stack unwind more than one level?
Is there another way? (Return null or an error code, etc.) If so, do I have even the slightest performance concern?
If neither of those lead to a clear decision, which is easier to read by someone who has to maintain the code?
If #1 is true, and I don't have a MAJOR performance concern, I will probably opt to use exceptions because it will speed up my development time not to have to code return codes (and manually code the logic to have them propagate up the call stack if needed). When you use exceptions, call stack unwinding is free of charge for development time.
If #2 is true, and either I'm not going more than one frame (maybe two?) up the call stack or I have a serious performance concern (in a tight loop, for example), then I'll try really hard to find another way that doesn't involve exceptions.
Exceptions are only a tool for programmers in a language which supports them. I don't believe they have to have any intrinsic value as to what is "exceptional" or not. Instead, I say use them when they are the best tool for the job.

How much information in error messages to regular users?

I'm want to get an idea how I should handle end-user visible error messages in my web application.
How much information do you give in
error messages?
Do you redirect all errors,
regardless of type, to a common error
page, or do you have a small set of pages (404, 403, all others)?
Do you give error codes that the user
could reference/give to you that only
you understand?
Do you give any technical details?
As I stated, my users are non-technical regular Joe folks.
Display a nice error to the user, Log a detailed error for yourself.
I try to do the following:
make sure you never run the risk of passwords or connection strings appearing in error messages.
Make sure the errors get logged to a persistable medium. I prefer a database so that I can query by time range and other paramaters. I don't log 404s.
If the application is an internal app that does not need to be pretty, it may be ok to have the error info on the page. Even if you are logging this stuff, it is nice to be able to have your users email you a screen shot or copy/paste.
If 3 seems distasteful, have some error info written as HTML comments. Then you can at least see the info by viewing source.
In general I try to give users as much information needed to help them solve their problems themselves. For example, in the case of a 404, you might want to let them know to double check that the URL they are looking for is correct.
They obviously wont need stack traces, and the like, but it will make sense for you to log that level of detail somewhere for diagnostics and debugging.
for fatal errors, keep them short, so they can repeat them over the phone or e-mail: can't connect to database, etc.
for non-fatal errors, describe the condition fully: Error, can not save the invoice without an invoice date.
I also always log everything, the parameters to the function and any internal values that may be of use.
I try to show users enough information that they know it's an issue they need to tell someone about, but try to avoid showing them so much it scares them!
If possible the error message should tell them what just failed e.g did their save just fail, or has it saved fine, but the refresh of the screen afterwards had an issue. Extra error information (e.g. stack traces) should be logged somewhere where you can get at it without the user having to send it to you.
When it comes to displaying errors for the end user, I find it a good practise to display a errorcode (so me and administrators know what error it is) and a typical "ops something went wrong, please contact an administrator"
It can be good to give a bit more information for common errors that could be the cause of the users actions. But usually too much information can scare or confuse the user.
None, just show give a reference number so user can give it to you, and you can check the details from the application logs (obviously you need to keep a copy of error logs).
Your web application's error messages should always (at
least) be the answers these 3 questions (in that order):
What happened?
Why did it happen?
What can be done about it?
I have used it for many years, originally from Apple's
"Human Interface Guidelines: The Apple Desktop Interface". Newer version.
Microsoft has similar guidelines.
This also makes it easy to write them - this structured
approach makes it faster to write them as one can just
answer the questions.
The error messages should also be specific. Any information
that the web application know about and that the user may
need to resolve the problem should be in the error message.
The (infamous) error message "An error happend." is simply
not acceptable.
Optional: more technical information that the user may not
understand can be placed at the end. But it should be marked
as such.

To what extent should code try to explain fatal exceptions?

I suspect that all non-trivial software is likely to experience situations where it hits an external problem it cannot work around and thus needs to fail. This might be due to bad configuration, an external server being down, disk full, etc.
In these situations, especially if the software is running in non-interactive mode, I expect that all one can really do is log an error and wait for the admin to read the logs and fix the problem. If someone happens to interact with the software in the meantime, e.g. a request comes in to a server that failed to initialize properly, then perhaps an appropriate hint can be given to check the logs and maybe even the error can be echoed (depending on whether you can tell if they're a technical guy as opposed to a business user). For the moment though let's not think too hard about this part.
My question is, to what extent should the software be responsible for trying to explain the meaning of the fatal error? In general, how much competence/knowledge are you allowed to presume on administrators of the software, and how much should you include troubleshooting information and potential resolution steps when logging fatal errors? Of course if there's something that's unique to the runtime context this should definitely be logged; but lets assume your software needs to talk to Active Directory via LDAP and gets back an error "[LDAP: error code 49 - 80090308: LdapErr: DSID-0C090334, comment: AcceptSecurityContext error, data 525, vece]". Is it reasonable to assume that the maintainers will be able to Google the error code and work out what it means, or should the software try to parse the error code and log that this is caused by an incorrect user DN in the LDAP config?
I don't know if there is a definitive best-practices answer for this, so I'm keen to hear a variety of views.
The approach I tend to agree with is that you should explain as much as possible if the fatal error is caused by some code in your own responsibility (i.e. not third party). Otherwise if the error is caused "further down", for example at the database level, then the administrators should be passed up the error returned without adding much further information. So if the database server dies, then your connector with throw some exception, and you would log the error code in the exception.
The administrator or support staff should then have sufficient knowledge to resolve the issue with the provided information.
When you do provide too much details on errors which are not caused by your own code you run the risk of having error details NOT matching the cause of the actual error, especially if the error codes stop matching between versions.
Of course, there are exceptions. We have worked with open source libraries that were so poorly documented that we ended up writing wrappers around the libraries just to provide decent logging of what actually is going on.
Just my 2c
The answer, as for all broad questions, is "it depends."
If you're looking at a configuration error, then by all means you should try to explain what was wrong (in the logs). If it's an out-of-memory error, there's not much you can do -- and you may not even be able to write a log message.
One thing you said concerns me:
If someone happens to interact with
the software in the meantime, e.g. a
request comes in to a server that
failed to initialize properly, then
perhaps an appropriate hint can be
given to check the logs
If this is truly a fatal error, the server should not be running, and therefore any incoming request should fail with absolutely no warning or explanation.
You should at least provide the message from the exception and a stack trace so you can find out where in the code it occurred. If possible, you should also explain what you were attempting to do and what you think may have happened depending on the exception type.
I guess it depends on how much time you have before delivering the software to your customers.
Yes, it would be nice to parse the error and give a more explicit message but, in this day and age, Google is not always very far.
So unless, you have time to create the code to parse errors, I would leave them as is.
IMHO you can never provide too much information in these case.
In the real world it comes down to cost-benefit analysis, though. What's the impact of the error to you, your app, your business, etc. How much time is it worth spending on it.
In a business critical app my first point applies. Everything else is a sliding scale.
I think it depends on who is using the application.
If the application is used by tech savvy people then show more technical details, so they will be able to troubleshoot the problem if they want. I've had some users go to great lengths to solve issues. It can be very helpful, especially for issues that are specific to certain configurations.
If your user base is more of the average Joe then technical details will confuse them in most cases. You should show them a simple error message, and try to offer some solutions if possible.
You could also merge the two techniques. Show a simple error message by default and allow the user to view more detailed error information if they want.
You just don't want to overwhelm the user with too much information that they don't understand. It just frustrates and confuses them in the majority of cases.
There are two aspects I think all errors and exceptions should have:
1) Enough information in the error to help debug the problem. Stacktrace, class/method name, type of exception etc fall in this category.
2) A human understandable message, ideally clear enough for say Ops team or Sysadmins engineer to know who to call or forward that error message. Typically it is of the form "so and so module failed" or "network call failed" etc. Something that will come as close to you explaining the problem to customer, in non technical jargon.
Now with all the time constraints etc it may not be possible to have both messages programmed in. Then I would go out on a limb and say we should have the second type of error message. Remember, the sysadmin would probably be able to call you and since you helped write the code you can maybe pinpoint the error. But if the customer is on phone asking about the error, the sysadmin better be able to explain the possible cause :)
On a different note, all products need a clear exception/error handling mechanism decided at architecture level. And the exceptions NEED to adhere to that design. There are few things more frustrating than trying to debug an error based on a design only to find out a day later that its a one of a kind error message based on completely different design.
See https://meta.stackexchange.com/questions/3122/formatting-sandbox

How can I support the support department better?

With the best will in the world, whatever software you (and me) write will have some kind of defect in it.
What can I do, as a developer, to make things easier for the support department (first line, through to third line, and development) to diagnose, workaround and fix problems that the user encounters.
Notes
I'm expecting answers which are predominantly technical in nature, but I expect other answers to exist.
"Don't release bugs in your software" is a good answer, but I know that already.
Log as much detail about the environment in which you're executing as possible (probably on startup).
Give exceptions meaningful names and messages. They may only appear in a stack trace, but that's still incredibly helpful.
Allocate some time to writing tools for the support team. They will almost certainly have needs beyond either your users or the developers.
Sit with the support team for half a day to see what kind of thing they're having to do. Watch any repetitive tasks - they may not even consciously notice the repetition any more.
Meet up with the support team regularly - make sure they never resent you.
If you have at least a part of your application running on your server, make sure you monitor logs for errors.
When we first implemented daily script which greps for ERROR/Exception/FATAL and sends results per email, I was surprised how many issues (mostly tiny) we haven't noticed before.
This will help in a way, that you notice some problems yourself before they are reported to support team.
Technical features:
In the error dialogue for a desktop app, include a clickable button that opens up and email, and attaches the stacktrace, and log, including system properties.
On an error screen in a webapp, report a timestamp including nano-seconds and error code, pid, etc so server logs can be searched.
Allow log levels to be dynamically changed at runtime. Having to restart your server to do this is a pain.
Log as much detail about the environment in which you're executing as possible (probably on startup).
Non-technical:
Provide a known issues section in your documentation. If this is a web page, then this correspond to a triaged bug list from your bug tracker.
Depending on your audience, expose some kind of interface to your issue tracking.
Again, depending on audience, provide some forum for the users to help each other.
Usability solves problems before they are a problem. Sensible, non-scary error messages often allow a user to find the solution to their own problem.
Process:
watch your logs. For a server side product, regular reviews of logs will be a good early warning sign for impending trouble. Make sure support knows when you think there is trouble ahead.
allow time to write tools for the support department. These may start off as debugging tools for devs, become a window onto the internal state of the app for support, and even become power tools for future releases.
allow some time for devs to spend with the support team; listening to customers on a support call, go out on site, etc. Make sure that the devs are not allowed to promise anything. Debrief the dev after doing this - there maybe feature ideas there.
where appropriate provide user training. An impedence mismatch can cause the user to perceive problems with the software, rather than the user's mental model of the software.
Make sure your application can be deployed with automatic updates. One of the headaches of a support group is upgrading customers to the latest and greatest so that they can take advantage of bug fixes, new features, etc. If the upgrade process is seamless, stress can be relieved from the support group.
Similar to a combination of jamesh's answers, we do this for web apps
Supply a "report a bug" link so that users can report bugs even when they don't generate error screens.
That link opens up a small dialog which in turn submits via Ajax to a processor on the server.
The processor associates the submission to the script being reported on and its PID, so that we can find the right log files (we organize ours by script/pid), and then sends e-mail to our bug tracking system.
Provide a know issues document
Give training on the application so they know how it should work
Provide simple concise log lines that they will understand or create error codes with a corresponding document that describes the error
Some thoughts:
Do your best to validate user input immediately.
Check for errors or exceptions as early and as often as possible. It's easier to trace and fix a problem just after it occurs, before it generates "ricochet" effects.
Whenever possible, describe how to correct the problem in your error message. The user isn't interested in what went wrong, only how to continue working:
BAD: Floating-point exception in vogon.c, line 42
BETTER: Please enter a dollar amount greater than 0.
If you can't suggest a correction for the problem, tell the user what to do (or not to do) before calling tech support, such as: "Click Help->About to find the version/license number," or "Please leave this error message on the screen."
Talk to your support staff. Ask about common problems and pet peeves. Have them answer this question!
If you have a web site with a support section, provide a hyperlink or URL in the error message.
Indicate whether the error is due to a temporary or permanent condition, so the user will know whether to try again.
Put your cell phone number in every error message, and identify yourself as the developer.
Ok, the last item probably isn't practical, but wouldn't it encourage better coding practices?
Provide a mechanism for capturing what the user was doing when the problem happened, a logging or tracing capability that can help provide you and your colleagues with data (what exception was thrown, stack traces, program state, what the user had been doing, etc.) so that you can recreate the issue.
If you don't already incorporate developer automated testing in your product development, consider doing so.
Have a mindset for improving things. Whenever you fix something, ask:
How can I avoid a similar problem in the future?
Then try to find a way of solving that problem.