How to determine what log level to use? [closed] - language-agnostic

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
The log levels WARN, ERROR and FATAL are pretty clear. But when is something DEBUG, and when INFO?
I've seen some projects that are annoyingly verbose on the INFO level, but I've also seen code that favors the DEBUG level too much. In both cases, useful information is hidden in the noise.
What are the criteria for determining log levels?

I don't think there are any hard-and-fast rules; using the log4j-type levels, my 'rules of thumb' are something like:
FATAL: the app (or at the very least a thread) is about to die horribly. This is where the info explaining why that's happening goes.
ERROR: something that the app's doing that it shouldn't. This isn't a user error ('invalid search query'); it's an assertion failure, network problem, etc etc., probably one that is going to abort the current operation
WARN: something that's concerning but not causing the operation to abort; # of connections in the DB pool getting low, an unusual-but-expected timeout in an operation, etc. I often think of 'WARN' as something that's useful in aggregate; e.g. grep, group, and count them to get a picture of what's affecting the system health
INFO: Normal logging that's part of the normal operation of the app; diagnostic stuff so you can go back and say 'how often did this broad-level operation happen?', or 'how did the user's data get into this state?'
DEBUG: Off by default, able to be turned on for debugging specific unexpected problems. This is where you might log detailed information about key method parameters or other information that is useful for finding likely problems in specific 'problematic' areas of the code.
TRACE: "Seriously, WTF is going on here?!?! I need to log every single statement I execute to find this ##$#ing memory corruption bug before I go insane"
Not set in stone, but a rough idea of how I think of it.

Informally I use this sort of hierarchy,
DEBUG - actual trace values
INFO - Something just happened - nothing important, just a flag
WARN - everything's working, but something isn't quite what was expected
ERROR - something has happened that will need to be fixed, but we can carry on and do other (independent) activities
FATAL - a serious enough problem that we shouldn't even carry on
I'll generally release with INFO being logged, but only if I know that log files are actually reviewed (and size isn't an issue), otherwise it's WARN.

Think about who needs to use each level.
In my code I keep DEBUG reserved for a developer output, e.g. output that would only help a developer.
VERBOSE is used for a normal user when alot of info is needed.
INFO I use to normally show major events (e.g. sending a webpage, checking something important).
And FAIL and WARN are pretty self explanatory.

The convention in my team is to use debug if something is calculated in the message, whereas info is used for plain text. So in effect info will show you what's happening and debug will show the values of the things that are happening.

I tend to target INFO towards the user to give them messages that aren't even warnings. DEBUG tends to be for developer use where I output messages to help trace the flow through the code (with values of variables as well).
I also like another level of DEBUG (DEBUG2?) which gives absolute bucketloads of debug information such as hex dumps of all buffers and so on.

There's no need for a DEBUG2 level. That's what 'TRACE' is for. TRACE is intended to be the absolute lowest level of logging outputting every possible piece of information you might want to see.
To avoid a deluge of information, it is generally not recommended that you enable trace-level logging across an entire project. Instead use 'DEBUG' to find out general information about the bug and where it occurs (hence the name), and then enable TRACE only for that component if you still can't figure it out.

Related

Difference between bug and failure [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I want to ask about the difference between bug and failure and error, i read that the error is mistake made by people, but i conflicted between the difference between the bug and failure. I can't know the difference exactly. Can any one help please and give simple snippet for code represents the difference.
Thanks a lot.
Bug is a programming error - not checking array bounds, ignoring error codes, multiple deletions, memory leaks, etc. fall under this general category. Errors like this require code changes to fix (there may be work-arounds that do not require code changes, though)
Failure is a system error - disconnection of storage, lack of network connectivity, and hardware failures are in this category. Fixing failures usually requires configuring other parts of the system, not the program itself.
User errors are mistakes made by users - entering values incorrectly or providing incomplete data are in this category. Errors like that are fixed by the user who uses the program without anyone else's involvement.
By my definition I would say
An error is about my behavior, or my acting. so I make errors.
A bug is the the result of my error in the program code.
The failure is the malfunction of my buggy software.
but others may interpret this differently.
A fault or Bug is a defect within the system (Somewhere hidden in the code and maybe never detected!).
An error is a deviation of the required operation of the system or subsystem. (The fault detected during execution but no harm).
A failure occurs when the system fails to perform its required function. (System crash)
An Error is a manifestation of a fault in a system, which could lead to system failure.
(Singhal/Shivaratri)
Example:
If you multiply x with 4 instead of 2 in your code, but there is no way to affect any functionalists or is not visible. This is a bug or fault.
If user can see it, let's say with having a wrong text as subject of email, then this is an error but still system worked and no harmful event happened.
But if your system withdraw the wrong money to user in a bank or your robot cut the head of the lady instead of cutting the cake for her then this is a failure :)
Instead of code snippets, I gave your examples below. I hope examples help you to understand the term better.
Bug is a term used among testers to address faults in software.
Error is a value or state or operation that is varies from expected value or state or operation. For example, programmer makes a mistake like missing a semi-colon, calling a wrong function name.
Result from system != Expected Result from system
Fault is a error brought into a system during design or implementation stage that is capable of causing system failure. Imagine some company X gives discount to their loyal customer. Loyal customer is someone who shops 10 times in a month. In software, programmer enter 20 times instead of 10. This is a mistake introduced by programmer called error. then it turned to fault. In tester language, it is a bug.
System failure is an inability of a system to do, what is required of the system. For example, if a user tries to sign up for account in a social networking site if they site fails to register the user. Then, that is system failure.
Technically,
Error -----> Fault -----> Failure
The root cause of any failure is error.

Should I Log or not?

I am sure lots of you had this debate: What to write or not to the application log file.
I am not talking about the trivial error exception which we surely log inside the catch clauses.
Let's say we have a standart application which is connecting to database doing some selects.
we have a Dao object which each method in it wrapping a select query.
I would like to have your suggesations. Should I log every entrance and exit before I execute any selection? Should I log the result?
what about logging the error stacktrace? I find it very messy and overloading the log file.
could anyone recommend me on a good article in this subject(not necessary about logging database executions but generally)?
Thanks,
ray.
Logging means exacly that: taking notes when something happens. So you need to understand your needs as developer, and the needs of your customers. In both cases, try to figure out what do you need to accomplish your task.
As a developer, you should decide what level of confidence do you have with your software: if it is fully tested and debugged, then you could not log anything at all and just try to trace crashes. If on the other hand you are doing debug, you could need more detail. And in general, you should leave the possibility to turn logging off when confidence increases, and turning it on when thngs start to fail, possibly through a configuration setting. When you need to decide what to log, ask yourself: if it crashed, will this information help me identify the problem or will it be just noise?
For you customers, it depends. On a shared system for example, it's good to know who did what, so it happens to me to log actions that customers do. You should agree that with your customer.
don't log more than necessary.
more detailed explanation here http://www.codinghorror.com/blog/2008/12/the-problem-with-logging.html
cheers
that's why you have various log levels. the purely informational stuff you log at LOG_INFO and the debugging stuff you log at LOG_DEBUG. what actually gets logged is up to the user.

Has using an acknowledged anti-pattern ever been proven to actually solve a problem, or be beneficial in any other way? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Has using an acknowledged anti-pattern ever been proven to actually work in a certain specific case? Did you ever solve a problem or gain any kind of benefit in one of your projects by using an anti-pattern?
My understanding of the "anti-pattern" concept is that it encompasses solutions that have drawbacks that only reveal themselves over the long term. Indeed, the primary danger associated with a lot of them---like writing spaghetti code with loads of global variables and gotos every which way, or tossing exceptions into the black hole of an empty catch block---is that they're seductive because they provide an expedient solution to an immediate problem.
EDIT to add: Because of that, sometimes you do derive benefit from these anti-patterns. Sometimes your calculation that you're writing throwaway code that no one will touch again is dead wrong and you wind up with maintenance programmers slandering your heritage and sexual hygiene, but other times you're right and that crummy shell script that's held together with baling wire and spit does the job you intended it to do and is then blessedly forgotten, saving you the considerable time and effort of putting together something decent.
Anti-Patterns are still so widely around just because they solve a particular problem (while creating 10 new ones). Also known as workaround. But how do they say? Nothing lasts longer than a makeshift.
In fact I believe we'd all be jobless if things had been done right from the beginning.
The biggest problem that it has solved in my experience is launching a new application.
When the dev team has scoped the new application thouroughly, the timeline to implement the correct solution is usually too much for management to bear. Therefore, oftentimes, you code to meet the timeline, rather than "correctness" of the solution to get to the launch date, (but have others coding the "correct" solution for the next rev), making it essentially "throw-away" code.
One software anti-pattern is Softcoding, also defined at the daily WTF. Softcoding happens when programmers put material that "should be" inside code into external resources.
I'm working with software that some might say is suffering from softcoding. External files drive the software. Those external files are a micro-language: they must be compiled to XML before the software can use them. This micro-language has its own tools.
But softcoding is always in the mind of the beholder.
Having the material in a micro-language with its own parser has made my life easier. One data source can generate many different outputs: In addition to the version that the main program uses, I am able to extract information into HTML, .csv, and other formats that our customers want. Other programs can generate code in the micro-language, making automation easier.
In our case, softcoding has been a useful pattern, not an anti-pattern.
There is a reason for calling it a pattern rather than a law.
I would surmise that almost everyone has at least one example of a place in code where exactly the wrong thing was done, and it turned out better in the long term than the "right" thing would have.
And a far longer list of examples of anti-patterns causing trouble.
I have used magic pushbuttons a number of times, out of ignorance or laziness, and sometimes it actually worked out just fine, and it turned out that I did not need the extra abstraction of proper MVC.
Duff's Device utilizes the Loop-Switch Sequence (AKA For-Case Paradigm) anti-pattern.

To what extent should code try to explain fatal exceptions?

I suspect that all non-trivial software is likely to experience situations where it hits an external problem it cannot work around and thus needs to fail. This might be due to bad configuration, an external server being down, disk full, etc.
In these situations, especially if the software is running in non-interactive mode, I expect that all one can really do is log an error and wait for the admin to read the logs and fix the problem. If someone happens to interact with the software in the meantime, e.g. a request comes in to a server that failed to initialize properly, then perhaps an appropriate hint can be given to check the logs and maybe even the error can be echoed (depending on whether you can tell if they're a technical guy as opposed to a business user). For the moment though let's not think too hard about this part.
My question is, to what extent should the software be responsible for trying to explain the meaning of the fatal error? In general, how much competence/knowledge are you allowed to presume on administrators of the software, and how much should you include troubleshooting information and potential resolution steps when logging fatal errors? Of course if there's something that's unique to the runtime context this should definitely be logged; but lets assume your software needs to talk to Active Directory via LDAP and gets back an error "[LDAP: error code 49 - 80090308: LdapErr: DSID-0C090334, comment: AcceptSecurityContext error, data 525, vece]". Is it reasonable to assume that the maintainers will be able to Google the error code and work out what it means, or should the software try to parse the error code and log that this is caused by an incorrect user DN in the LDAP config?
I don't know if there is a definitive best-practices answer for this, so I'm keen to hear a variety of views.
The approach I tend to agree with is that you should explain as much as possible if the fatal error is caused by some code in your own responsibility (i.e. not third party). Otherwise if the error is caused "further down", for example at the database level, then the administrators should be passed up the error returned without adding much further information. So if the database server dies, then your connector with throw some exception, and you would log the error code in the exception.
The administrator or support staff should then have sufficient knowledge to resolve the issue with the provided information.
When you do provide too much details on errors which are not caused by your own code you run the risk of having error details NOT matching the cause of the actual error, especially if the error codes stop matching between versions.
Of course, there are exceptions. We have worked with open source libraries that were so poorly documented that we ended up writing wrappers around the libraries just to provide decent logging of what actually is going on.
Just my 2c
The answer, as for all broad questions, is "it depends."
If you're looking at a configuration error, then by all means you should try to explain what was wrong (in the logs). If it's an out-of-memory error, there's not much you can do -- and you may not even be able to write a log message.
One thing you said concerns me:
If someone happens to interact with
the software in the meantime, e.g. a
request comes in to a server that
failed to initialize properly, then
perhaps an appropriate hint can be
given to check the logs
If this is truly a fatal error, the server should not be running, and therefore any incoming request should fail with absolutely no warning or explanation.
You should at least provide the message from the exception and a stack trace so you can find out where in the code it occurred. If possible, you should also explain what you were attempting to do and what you think may have happened depending on the exception type.
I guess it depends on how much time you have before delivering the software to your customers.
Yes, it would be nice to parse the error and give a more explicit message but, in this day and age, Google is not always very far.
So unless, you have time to create the code to parse errors, I would leave them as is.
IMHO you can never provide too much information in these case.
In the real world it comes down to cost-benefit analysis, though. What's the impact of the error to you, your app, your business, etc. How much time is it worth spending on it.
In a business critical app my first point applies. Everything else is a sliding scale.
I think it depends on who is using the application.
If the application is used by tech savvy people then show more technical details, so they will be able to troubleshoot the problem if they want. I've had some users go to great lengths to solve issues. It can be very helpful, especially for issues that are specific to certain configurations.
If your user base is more of the average Joe then technical details will confuse them in most cases. You should show them a simple error message, and try to offer some solutions if possible.
You could also merge the two techniques. Show a simple error message by default and allow the user to view more detailed error information if they want.
You just don't want to overwhelm the user with too much information that they don't understand. It just frustrates and confuses them in the majority of cases.
There are two aspects I think all errors and exceptions should have:
1) Enough information in the error to help debug the problem. Stacktrace, class/method name, type of exception etc fall in this category.
2) A human understandable message, ideally clear enough for say Ops team or Sysadmins engineer to know who to call or forward that error message. Typically it is of the form "so and so module failed" or "network call failed" etc. Something that will come as close to you explaining the problem to customer, in non technical jargon.
Now with all the time constraints etc it may not be possible to have both messages programmed in. Then I would go out on a limb and say we should have the second type of error message. Remember, the sysadmin would probably be able to call you and since you helped write the code you can maybe pinpoint the error. But if the customer is on phone asking about the error, the sysadmin better be able to explain the possible cause :)
On a different note, all products need a clear exception/error handling mechanism decided at architecture level. And the exceptions NEED to adhere to that design. There are few things more frustrating than trying to debug an error based on a design only to find out a day later that its a one of a kind error message based on completely different design.
See https://meta.stackexchange.com/questions/3122/formatting-sandbox

How to Report Bugs the Smart Way [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I want to write (or find) a guide to effective bug reporting in a style similar to ESR's How To Ask Questions The Smart Way
What are your top tips for effective bug reports?
Step-by-step instructions on how to recreate the bug
Make sure you've attempted to isolate the bug to what you are actually writing a bug against, instead of something else that could be the cause.
List attempts to isolate the bug to something other than the software you are writing a bug against
Make yourself available to answer questions and be available to help troubleshoot/recreate the bug
The bottom line is you have to engage some level of critical thinking when the bug is encountered. Once you've exhausted all possibilities that it could be your fault, write up a bug. If you find out its your fault, but the software you are using/testing could have done something more usable to indicate its your fault, still write a bug.
Also, to be a truly great bug-reporter, you must avail yourself to those testing the bug to help them recreate it. Its likely you've just "got the knack" for recreating that bug and there may be steps you are not conscious of. You can't just complain and walk away, participate in the process and help the team out by testing, recreating, and troubleshooting.
Report the observable facts and then your interpretation of those facts.
Sometimes the best bug reports include something that is a gut feel of an understanding of the problem. Facts-only bug reporting discounts this valuable human resource.
Procedure used to re-create the bug including what was being done, what area of the application was being used and what event was happening at the time.
Statement of reproduceability (reliably, not) - helps the developer know how hard it should be to reproduce so they don't give up to quickly
Screen shots or documentation of error message / stack trace produced
Criticality/Priority of the bug (can it be avoid, avoidance steps, is it catastropic, does it have a business impact, what's the business risk, etc)
Environment - which environment was the bug found in. Remote, local, etc.
Too often, our QA people think they can just put in a ticket saying, here's my exception without any backup documentation. Its near impossible to reproduce let alone fix the issue without more information.
Don't assume the reader of your bug report knows the software as well as you do. Even the person who wrote the software may not know what you are talking about if enough time has passed since they wrote it. Write it so that anyone can understand and reproduce the problem.
Recommend this article: How to Report Bugs Effectively
For all the people who won't look at something without steps to reproduce:
My first programming co-op job I was assigned a bug that was essentially a random race condition that was making the system unstable. It happened at any point in the system execution, and all we had were a few stack traces pointing to a section of code that was pretty obviously fine. Somewhere another thread was mucking about with data it shouldn't be and if this thread was at the right point it would crash. Our QA got crashes about once a month. It took two weeks of combing through the system to find the culprit (yup, unchecked access to shared resources, about a 2 line fix) and fix it. There never was a decent steps to reproduce because there was no general way to reproduce it (save shoving a bunch of yield()'s in the right spot). If you're going to work on a multithreaded system, you better be ready to deal with bugs that can't be reproduced reliably, may not have stable steps to reproduce, and not just whine to QA because you couldn't reproduce the bug.
Note that the above is no excuse for QA to not include as much detail as they can when possible, just pointing out that it isn't always possible on modern software.
Write the steps to reproduce the bug. If you can't reproduce it, it won't get fixed.
Always report version number of software under test
Always report versions of any other software (browser, OS, etc.)
Always list all hardware
Steps to reproduce
Symptoms of bug
Screenshots, traces, logs, other attachments (if any)
How critical -- crash, UI, etc.
Report whether reproducible
Anything else tried, that did or did not reproduce the bug