How to nicely inform to the user that an unknown error has happened? - usability

There are several guidelines for error reporting, that are usually based on giving to the user useful information when he or she does something wrong, but to give this kind of information you need to be handling the error and know that it can happen. There are also tons of articles about designing 404 error pages. But, what can you do when it's a new, unhandled error provoked by a failure in the software?
Are there some guidelines about how to nicely report totally unexpected errors in a web site as an unexpected error 500?
What header message should be shown in that case? something like "Sorry, an unexpected error has ocurred" would be enough?
What information should be given?
Should it have mechanisms to help to report the failure to developers? Which ones?

First thing to keep in mind when an error happens : do not frighten or confuse your users.
Your error pages should integrate into your site in a seamless way : similar layout, same colors. It should be obvious that this page is part of your site and is a consequence of a previous action.
As Matthew Wilson and codymanix said, you should not be too technical. Error messages must be clear and intelligible. If you do not know where the error comes from, just say it.
If the error happened during a transaction of any kind, you should inform the user about the consequences. If an error interrupts an online order, you should be able to tell if the transaction has been carried out or not.
Now you have told your users something bad happened, it is time to inform the developers about it. There are plenty of solutions around, including :
Custom error log
OS-provided error log (such as Windows' Event Log)
E-mail alerts

"Sorry, an unexpected error has ocurred" followed by possible tips on how to recover from it (refresh the page/clear cookies/restart browser etc). A link to your bug repository would be helpful to you, but frustrated users need not follow it.
If you plan to automatically report the failure back to your servers, make sure that users are aware of it and they agree with it.

What header message should be shown in that case? something like "Sorry, an unexpected error has ocurred" would be enough?
The text should read like "An error occurred and it's your fault!"
What information should be given?
What Information could you give about an unknown error?
Should it have mechanisms to help to report the failure to developers? Which ones?
No! Never report errors nor do logging! I could lead to the ability to fix the error!
x-D

Related

When running the DAML sandbox an error occurs

The following error occurs when running the sandbox:
io.grpc.netty.NettyServerHandler onStreamError
WARNING: Stream Error
io.netty.handler.codec.http2.Http2Exception$HeaderListSizeException: Header size exceeded max allowed size (8192)
What could the cause of this be?
I have seen this error numerous times, and it is a consequence of having a transaction failure in a complex DAML model/transaction when running on the Sandbox. When you experience a transaction failure (fetch/exercise an inactive contract, lookupByKey returned a stale cid, head [], divide-by-zero, etc) the sandbox helpfully tries to provide transaction trace information in the error result.
This is normally fine for relatively simple models. With more complex models this trace can exceed the maximum header size producing the error you see. When this happens I have found the trace in the sandbox.log file, sometimes along with other errors that help explain what is going on.
The trace is an unformatted dump, so it can take a bit of effort to decode manually, but I have done it many times myself and the information I needed to identify the issue has always been there —— and to be honest, generally just knowing the choice I was exercising + the specific class of error is normally enough to point me in the right direction.
I believe there is some tooling being built to help with this sort of diagnosis; however, I don't know how advanced the work on that is.

What is the general consensus on user-error correction for web apps?

I'm building a RoR site, and today I get the pagination done. Upon showing it to my coworker, his first question is "what happens if you set the querystring to "?page=-1". It died with a runtime exception (error 500). He suggested that that should definitely be fixed before this site goes anywhere near live.
I happen to disagree with him (hear me out). Now, I've been in the web dev business for all of four months, so I very well could be wrong. But I would think that this isn't a big deal. I would think that, so long as said errors do not constitute a security risk, things like this shouldn't be a priority. The only way to cause this error is if you manually edit the query string, and, well, garbage in garbage out. If you're smart enough to know that you even can edit the querystring, you should be smart enough to not give it a negative number.
What is the general consensus on things like this? Do you completely idiot proof the site, so that no matter what the query string is, you never generate an error? Do you let things slide so long as it works the way it's supposed to (and doesn't expose a security risk)? Somewhere in the middle?
EDIT: Somehow my question didn't really come out completely as I intended it. The crux of my question was, where to draw the line between proactively correcting for things versus not doing them. If there's invalid input in the get string, for instance, would it be better practice to display a tasteful error as suggested in the posted replies, or to try to figure out what the user was doing, and do that. Or, as a more concrete example: If a user sets page=-1 in the get string, would it be better to silently assume they meant page=0, or to display some kind of tasteful error page saying somethign like "invalid page specified"?
You should be error checking anything that comes in from the query string. If you get an invalid page number, you should have an error message that's a little more graceful than the Error 500 page. Maybe a sorry, bad request. Try this: <possible suggestions>. It's just plain sloppy and unprofessional to knowingly and deliberately leave an easily accessible error like that on a live site.
You say you're new to web apps, but if your previous dev experience was other GUI apps being used by the "general public" (non-developers, non-techies), would it have been OK to have stack traces thrown into the user's face as the app falls apart around them? In my experience, this is never really acceptable.
You make some good points, but an incorrect query string can have many reasons. For example, a link to a record that has since been deleted. Or a Google result pointing to a page that doesn't exist in the current result set any more.
In these cases, you should show the user something a bit more verbose than a 500 error.
If you have an error-page that looks nice, and gives a polite message, I'd say it's fine. Though I might consider responding with a 404 instead. Garbage in should preferably not produce an error.
I don't think a 500 error page is very meaningful to your average user. At least tell him something is wrong with your page and guide him back on the right track by providing a link to get back to your site.
Sometimes I redirect users to a page that is likely to what he wanted. So when a query goes below zero and this is not permitted, redirect your user to ?page=0 and maybe display a message on top of that page. I think you should prefer this method because it is a better approach in terms of user experience to not use modal windows.
I agree with you, that error messages are necessary and useful but you should try to differentiate, e.g. give an 404 where the user requested a page that doesn't exist.
It varies from project to project. How many users do you expect? If it's below 10K visitors a day it might not be so bad. What percentage of users do you expect will hit the problem? I don't expect that very many but you would know best.
The goal should be to ship the product and roll out improvements regularly. Hopefully the product is sound overall.
Regarding a solution, if its a page not found, a 4xx error should be thrown instead of a 5xx. 5xx errors typically warrant a deeper look and while it's hard to write an air-tight application directly on launch, you should try to have a generic handler for 4xx and 5xx errors.
In the PCI game (Credit Card Verification / Validation) the rule is validate everything and allow for no idiots. So the answer depends on your application.

How much information in error messages to regular users?

I'm want to get an idea how I should handle end-user visible error messages in my web application.
How much information do you give in
error messages?
Do you redirect all errors,
regardless of type, to a common error
page, or do you have a small set of pages (404, 403, all others)?
Do you give error codes that the user
could reference/give to you that only
you understand?
Do you give any technical details?
As I stated, my users are non-technical regular Joe folks.
Display a nice error to the user, Log a detailed error for yourself.
I try to do the following:
make sure you never run the risk of passwords or connection strings appearing in error messages.
Make sure the errors get logged to a persistable medium. I prefer a database so that I can query by time range and other paramaters. I don't log 404s.
If the application is an internal app that does not need to be pretty, it may be ok to have the error info on the page. Even if you are logging this stuff, it is nice to be able to have your users email you a screen shot or copy/paste.
If 3 seems distasteful, have some error info written as HTML comments. Then you can at least see the info by viewing source.
In general I try to give users as much information needed to help them solve their problems themselves. For example, in the case of a 404, you might want to let them know to double check that the URL they are looking for is correct.
They obviously wont need stack traces, and the like, but it will make sense for you to log that level of detail somewhere for diagnostics and debugging.
for fatal errors, keep them short, so they can repeat them over the phone or e-mail: can't connect to database, etc.
for non-fatal errors, describe the condition fully: Error, can not save the invoice without an invoice date.
I also always log everything, the parameters to the function and any internal values that may be of use.
I try to show users enough information that they know it's an issue they need to tell someone about, but try to avoid showing them so much it scares them!
If possible the error message should tell them what just failed e.g did their save just fail, or has it saved fine, but the refresh of the screen afterwards had an issue. Extra error information (e.g. stack traces) should be logged somewhere where you can get at it without the user having to send it to you.
When it comes to displaying errors for the end user, I find it a good practise to display a errorcode (so me and administrators know what error it is) and a typical "ops something went wrong, please contact an administrator"
It can be good to give a bit more information for common errors that could be the cause of the users actions. But usually too much information can scare or confuse the user.
None, just show give a reference number so user can give it to you, and you can check the details from the application logs (obviously you need to keep a copy of error logs).
Your web application's error messages should always (at
least) be the answers these 3 questions (in that order):
What happened?
Why did it happen?
What can be done about it?
I have used it for many years, originally from Apple's
"Human Interface Guidelines: The Apple Desktop Interface". Newer version.
Microsoft has similar guidelines.
This also makes it easy to write them - this structured
approach makes it faster to write them as one can just
answer the questions.
The error messages should also be specific. Any information
that the web application know about and that the user may
need to resolve the problem should be in the error message.
The (infamous) error message "An error happend." is simply
not acceptable.
Optional: more technical information that the user may not
understand can be placed at the end. But it should be marked
as such.

Displaying friendly error messages

I'm curious if anyone has given some thought to the wording in desktop application error messages. As a developer I always put on my programmer hat and display it in a dialect that looks like a robot is speaking to the user.
For example:
Failed to open file ___
Unable to retrieve settings file
Error occurred updating the database
Cannot set ____
Unknown error occured
None of these say "friendly application". Does anyone know any resources or ways of phrasing errors in less robotic language - for common errors like IO problems, database issues, null reference and so on.
There's quite an extensive article on error messages in the Windows Desktop Design guide.
Apple have something to say about writing good alert messages in their human interface guidelines.
be brief and specific
tell us what we can do about the problem
don't be oblique, tell us it's gone wrong if it has
avoid making the error feel like it's our fault
The messages you've posted as an example are more meant more for developers than end users.
One thing I find annoying about any kind of errors that are thrown at me as a user is when I don't have a clue why it happened. That's why such error messages should contain some information about the issue for non-programmers to understand. Like if opening a file failed, one could check whether file exists, if the permissions are OK or the path given is on a network.
There's also a great blog post by Jeff Atwood about funny error messages.
Not a lot of info here, but some links that you should know:
http://blogs.msdn.com/brada/archive/2004/01/28/64255.aspx
http://msdn.microsoft.com/en-us/library/ms229056.aspx
In general, it is good to word things so that it is clear
what is the cause of the problem
what can be done to fix/remedy it
My opinion: I think the most common 'bad' thing in error is to forget the second bullet. The second most common error is to provide insufficient info for the first bullet (e.g. 'file not found' - which file?!?)

To what extent should code try to explain fatal exceptions?

I suspect that all non-trivial software is likely to experience situations where it hits an external problem it cannot work around and thus needs to fail. This might be due to bad configuration, an external server being down, disk full, etc.
In these situations, especially if the software is running in non-interactive mode, I expect that all one can really do is log an error and wait for the admin to read the logs and fix the problem. If someone happens to interact with the software in the meantime, e.g. a request comes in to a server that failed to initialize properly, then perhaps an appropriate hint can be given to check the logs and maybe even the error can be echoed (depending on whether you can tell if they're a technical guy as opposed to a business user). For the moment though let's not think too hard about this part.
My question is, to what extent should the software be responsible for trying to explain the meaning of the fatal error? In general, how much competence/knowledge are you allowed to presume on administrators of the software, and how much should you include troubleshooting information and potential resolution steps when logging fatal errors? Of course if there's something that's unique to the runtime context this should definitely be logged; but lets assume your software needs to talk to Active Directory via LDAP and gets back an error "[LDAP: error code 49 - 80090308: LdapErr: DSID-0C090334, comment: AcceptSecurityContext error, data 525, vece]". Is it reasonable to assume that the maintainers will be able to Google the error code and work out what it means, or should the software try to parse the error code and log that this is caused by an incorrect user DN in the LDAP config?
I don't know if there is a definitive best-practices answer for this, so I'm keen to hear a variety of views.
The approach I tend to agree with is that you should explain as much as possible if the fatal error is caused by some code in your own responsibility (i.e. not third party). Otherwise if the error is caused "further down", for example at the database level, then the administrators should be passed up the error returned without adding much further information. So if the database server dies, then your connector with throw some exception, and you would log the error code in the exception.
The administrator or support staff should then have sufficient knowledge to resolve the issue with the provided information.
When you do provide too much details on errors which are not caused by your own code you run the risk of having error details NOT matching the cause of the actual error, especially if the error codes stop matching between versions.
Of course, there are exceptions. We have worked with open source libraries that were so poorly documented that we ended up writing wrappers around the libraries just to provide decent logging of what actually is going on.
Just my 2c
The answer, as for all broad questions, is "it depends."
If you're looking at a configuration error, then by all means you should try to explain what was wrong (in the logs). If it's an out-of-memory error, there's not much you can do -- and you may not even be able to write a log message.
One thing you said concerns me:
If someone happens to interact with
the software in the meantime, e.g. a
request comes in to a server that
failed to initialize properly, then
perhaps an appropriate hint can be
given to check the logs
If this is truly a fatal error, the server should not be running, and therefore any incoming request should fail with absolutely no warning or explanation.
You should at least provide the message from the exception and a stack trace so you can find out where in the code it occurred. If possible, you should also explain what you were attempting to do and what you think may have happened depending on the exception type.
I guess it depends on how much time you have before delivering the software to your customers.
Yes, it would be nice to parse the error and give a more explicit message but, in this day and age, Google is not always very far.
So unless, you have time to create the code to parse errors, I would leave them as is.
IMHO you can never provide too much information in these case.
In the real world it comes down to cost-benefit analysis, though. What's the impact of the error to you, your app, your business, etc. How much time is it worth spending on it.
In a business critical app my first point applies. Everything else is a sliding scale.
I think it depends on who is using the application.
If the application is used by tech savvy people then show more technical details, so they will be able to troubleshoot the problem if they want. I've had some users go to great lengths to solve issues. It can be very helpful, especially for issues that are specific to certain configurations.
If your user base is more of the average Joe then technical details will confuse them in most cases. You should show them a simple error message, and try to offer some solutions if possible.
You could also merge the two techniques. Show a simple error message by default and allow the user to view more detailed error information if they want.
You just don't want to overwhelm the user with too much information that they don't understand. It just frustrates and confuses them in the majority of cases.
There are two aspects I think all errors and exceptions should have:
1) Enough information in the error to help debug the problem. Stacktrace, class/method name, type of exception etc fall in this category.
2) A human understandable message, ideally clear enough for say Ops team or Sysadmins engineer to know who to call or forward that error message. Typically it is of the form "so and so module failed" or "network call failed" etc. Something that will come as close to you explaining the problem to customer, in non technical jargon.
Now with all the time constraints etc it may not be possible to have both messages programmed in. Then I would go out on a limb and say we should have the second type of error message. Remember, the sysadmin would probably be able to call you and since you helped write the code you can maybe pinpoint the error. But if the customer is on phone asking about the error, the sysadmin better be able to explain the possible cause :)
On a different note, all products need a clear exception/error handling mechanism decided at architecture level. And the exceptions NEED to adhere to that design. There are few things more frustrating than trying to debug an error based on a design only to find out a day later that its a one of a kind error message based on completely different design.
See https://meta.stackexchange.com/questions/3122/formatting-sandbox