What are logging libraries for? - language-agnostic

This may be a stupid question, as most of my programming consists of one-man scientific computing research prototypes and developing relatively low-level libraries. I've never programmed in the large in an enterprise environment before. I've always wondered, what are the main things that logging libraries make substantially easier than just using good old fashioned print statements or file output, simple programming logic and a few global variables to determine how verbosely things get logged? How do you know when a few print statements or some basic file output ain't gonna cut it and you need a real logging library?

Logging helps debug problems especially when you move to production and problems occur on people's machines you can't control. Best laid plans never survive contact with the enemy, and logging helps you track how that battle went when faced with real world data.
Off the shel logging libraries are easy to plug in and play in less than 5 minutes.
Log libraries allow for various levels of logging per statement (FATAL, ERROR, WARN, INFO, DEBUG, etc).
And you can turn up or down logging to get more of less information at runtime.
Highly threaded systems help sort out what thread was doing what. Log libraries can log information about threads, timestamps, that ordinary print statements can't.
Most allow you to turn on only portions of the logging to get more detail. So one system can log debug information, and another can log only fatal errors.
Logging libraries allow you to configure logging through an external file so it's easy to turn on or off in production without having to recompile, deploy, etc.
3rd party libraries usually log so you can control them just like the other portions of your system.
Most libraries allow you to log portions or all of your statements to one or many files based on criteria. So you can log to both the console AND a log file.
Log libraries allow you to rotate logs so it will keep several log files based on many different criteria. Say after the log gets 20MB rotate to another file, and keep 10 log files around so that log data is always 100MB.
Some log statements can be compiled in or out (language dependent).
Log libraries can be extended to add new features.
You'll want to start using a logging libraries when you start wanting some of these features. If you find yourself changing your program to get some of these features you might want to look into a good log library. They are easy to learn, setup, and use and ubiquitous.

There are used in environments where the requirements for logging may change, but the cost of changing or deploying a new executable are high. (Even when you have the source code, adding a one line logging change to a program can be infeasible because of internal bureaucracy.)
The logging libraries provide a framework that the program will use to emit a wide variety of messages. These can be described by source (e.g. the logger object it is first sent to, often corresponding to the class the event has occurred in), severity, etc.
During runtime the actual delivery of the messaages is controlled using an "easily" edited config file. For normal situations most messages may be obscured altogether. But if the situation changes, it is a simpler fix to enable more messages, without needing to deploy a new program.
The above describes the ideal logging framework as I understand the intention; in practice I have used them in Java and Python and in neither case have I found them worth the added complexity. :-(

They're for logging things.
Or more seriously, for saving you having to write it yourself, giving you flexible options on where logs are store (database, event log, text file, CSV, sent to a remote web service, delivered by pixies on a velvet cushion) and on what is logged at runtime, rather than having to redefine a global variable and then recompile.
If you're only writing for yourself then it's unlikely you need one, and it may introduce an external dependency you don't want, but once your libraries start to be used by others then having a logging framework in place may well help your users, and you, track down problems.

I know that a logging library is useful when I have more than one subsystem with "verbose logging," but where I only want to see that verbose data from one of them.
Certainly this can be achieved by having a global log level per subsystem, but for me it's easier to use a "system" of some sort for that.
I generally have a 2D logging environment too; "Info/Warning/Error" (etc) on one axis and "AI/UI/Simulation/Networking" (etc) on the other. With this I can specify the logging level that I care about seeing for each subsystem easily. It's not actually that complicated once it's in place, indeed it's a lot cleaner than having if my_logging_level == DEBUG then print("An error occurred"); Plus, the logging system can stuff file/line info into the messages, and then getting totally fancy you can redirect them to multiple targets pretty easily (file, TTY, debugger, network socket...).

Related

Mercurial command-line "API" reference?

I'm working on a Mercurial GUI client that interacts with hg.exe through the command line (the preferred high-level API, as I understand it).
However, I am having trouble determining the possible outputs of each command. I can see several outputs by simulating situations, but I was wondering if there is a complete reference of the possible outputs for each command.
For instance, for the command hg fetch, some possible outputs are:
pulling from https://User#server.com/Repo
searching for changes
no changes found
if there are no changes, or:
abort: outstanding uncommitted changes
or one of several other messages, depending on the situation.
I would like to structure my program to handle as many of these cases as possible, but it's hard for me to know in advance what they all are.
Is there a documented reference for the command-line? I have not been able to find one with The Google.
Look through the translation strings file. Then you'll know you have every message handled and be able to see what parts of it vary.
Also, fetch is just a convenience wrapper around pull/update/merge. If you're invoking mercurial programmatically you probably want to keep those three very different concepts separate in your running it so you know which part failed. In your example above it's the 'update' failing, so the 'pull' would have succeeded and the 'update's failing would allow you to provide the user with a better message.
(fetch is an abomination, which is part of why it's disabled by default)
Is this what you were looking for: https://www.mercurial-scm.org/wiki/MercurialBook ?
Mercurial 1.9 brings a command server, a stable (in a sense that API doesn't change that much) and low overhead (there is no need to run hg process for every command). The communication is done via a pipe.

What is instrumentation?

I've heard this term used a lot in the same context as logging, but I can't seem to find a clear definition of what it actually is.
Is it simply a more general class of logging/monitoring tools and activities?
Please provide sample code/scenarios when/how instrumentation should be used.
I write tools that perform instrumentation. So here is what I think it is.
DLL rewriting. This is what tools like Purify and Quantify do. A previous reply to this question said that they instrument post-compile/link. That is not correct. Purify and Quantify instrument the DLL the first time it is executed after a compile/link cycle, then cache the result so that it can be used more quickly next time around. For large applications, profiling the DLLs can be very time consuming. It is also problematic - at a company I worked at between 1998-2000 we had a large 2 million line app that would take 4 hours to instrument, and 2 of the DLLs would randomly crash during instrumentation and if either failed you would have do delete both of them, then start over.
In place instrumentation. This is similar to DLL rewriting, except that the DLL is not modified and the image on the disk remains untouched. The DLL functions are hooked appropriately to the task required when the DLL is first loaded (either during startup or after a call to LoadLibrary(Ex). You can see techniques similar to this in the Microsoft Detours library.
On-the-fly instrumentation. Similar to in-place but only actually instruments a method the first time the method is executed. This is more complex than in-place and delays the instrumentation penalty until the first time the method is encountered. Depending on what you are doing, that could be a good thing or a bad thing.
Intermediate language instrumentation. This is what is often done with Java and .Net languages (C~, VB.Net, F#, etc). The language is compiled to an intermediate language which is then executed by a virtual machine. The virtual machine provides an interface (JVMTI for Java, ICorProfiler(2) for .Net) which allows you to monitor what the virtual machine is doing. Some of these options allow you to modify the intermediate language just before it gets compiled to executable instructions.
Intermediate language instrumentation via reflection. Java and .Net both provide reflection APIs that allow the discovery of metadata about methods. Using this data you can create new methods on the fly and instrument existing methods just as with the previously mentioned Intermediate language instrumentation.
Compile time instrumentation. This technique is used at compile time to insert appropriate instructions into the application during compilation. Not often used, a profiling feature of Visual Studio provides this feature. Requires a full rebuild and link.
Source code instrumentation. This technique is used to modify source code to insert appropriate code (usually conditionally compiled so you can turn it off).
Link time instrumentation. This technique is only really useful for replacing the default memory allocators with tracing allocators. An early example of this was the Sentinel memory leak detector on Solaris/HP in the early 1990s.
The various in-place and on-the-fly instrumentation methods are fraught with danger as it is very hard to stop all threads safely and modify the code without running the risk of requiring an API call that may want to access a lock which is held by a thread you've just paused - you don't want to do that, you'll get a deadlock. You also have to check if any of the other threads are executing that method, because if they are you can't modify it.
The virtual machine based instrumentation methods are much easier to use as the virtual machine guarantees that you can safely modify the code at that point.
(EDIT - this item added later) IAT hooking instrumentation. This involved modifying the import addess table for functions linked against in other DLLs/Shared Libraries. This type of instrumentation is probably the simplest method to get working, you do not need to know how to disassemble and modify existing binaries, or do the same with virtual machine opcodes. You just patch the import table with your own function address and call the real function from your hook. Used in many commercial and open source tools.
I think I've covered them all, hope that helps.
instrumentation is usually used in dynamic code analysis.
it differs from logging as instrumentation is usually done automatically by software, while logging needs human intelligence to insert the logging code.
It's a general term for doing something to your code necessary for some further analysis.
Especially for languages like C or C++, there are tools like Purify or Quantify that profile memory usage, performance statistics, and the like. To make those profiling programs work correctly, an "instrumenting" step is necessary to insert the counters, array-boundary checks, etc that is used by the profiling programs. Note that in the Purify/Quantify scenario, the instrumentation is done automatically as a post-compilation step (actually, it's an added step to the linking process) and you don't touch your source code.
Some of that is less necessary with dynamic or VM code (i.e. profiling tools like OptimizeIt are available for Java that does a lot of what Quantify does, but no special linking is required) but that doesn't negate the concept.
A excerpt from wikipedia article
In context of computer programming,instrumentation refers to an
ability to monitor or measure the level of a product's performance, to
diagnose errors and to write trace information. Programmers implement
instrumentation in the form of code instructions that monitor specific
components in a system (for example, instructions may output logging
information to appear on screen). When an application contains
instrumentation code, it can be managed using a management tool.
Instrumentation is necessary to review the performance of the
application. Instrumentation approaches can be of two types, source
instrumentation and binary instrumentation.
Whatever Wikipedia says, there is no standard / widely agreed definition for code instrumentation in IT industry.
Please consider, instrumentation is a noun derived from instrument which has very broad meaning.
"Code" is also everything in IT, I mean - data, services, everything.
Hence, code instrumentation is a set of applications that is so wide ... not worth giving it a separate name ;-).
That's probably why this Wikipedia article is only a stub.

Is it bad practice to use the system() function when library functions could be used instead? Why?

Say there is some functionality needed for an application under development which could be achieved by making a system call to either a command line program or utilizing a library. Assuming efficiency is not an issue, is it bad practice to simply make a system call to a program instead of utilizing a library? What are the disadvantages of doing this?
To make things more concrete, an example of this scenario would be an application which needs to download a file from a web server, either the cURL program or the libcURL library could be used for this.
Unless you are writing code for only one OS, there is no way of knowing if your system call will even work. What happens when there is a system update or an OS upgrade?
Never use a system call if there is a library to do the same function.
I prefer libraries because of the dependency issue, namely the executable might not be there when you call it, but the library will be (assuming external library references get taken care of when the process starts on your platform). In other words, using libraries would seem to guarantee a more stable, predictable outcome in more environments than system calls would.
There are several factors to take into account. One key one is the reliability of whether the external program will be present on all systems where your software is installed. If there is a possibility that it will be missing, then maybe it is better to do it inside your program.
Weighing against that, you might consider that the extra code loaded into your program is prohibitive - you don't need the code bloat for such a seldom-used part of your application.
The system() function is convenient, but dangerous, not least because it invokes a shell, usually. You may be better off calling the program more directly - on Unix, via the fork() and exec() system calls. [Note that a system call is very different from calling the system() function, incidentally!] OTOH, you may need to worry about ensuring all open file descriptors in your program are closed - especially if your program is some sort of daemon running on behalf of other users; that is less of a problem if your are not using special privileges, but it is still a good idea not to give the invoked program access to anything you did not intend. You may need to look at the fcntl() system call and the FD_CLOEXEC flag.
Generally, it is easier to keep control of things if you build the functionality into your program, but it is not a trivial decision.
Security is one concern. A malicious cURL could cause havoc in your program. It depends if this is a personal program where coding speed is your main focus, or a commercial application where things like security play a factor.
System calls are much harder to make safely.
All sorts of funny characters need to be correctly encoded to pass arguments in, and the types of encoding may vary by platform or even version of the command. So making a system call that contains any user data at all requires a lot of sanity-checking and it's easy to make a mistake.
Yeah, as mentioned above, keep in mind the difference between system calls (like fcntl() and open()) and system() calls. :)
In the early stages of prototyping a c program, I often make external calls to programs like grep and sed for manipulation of files using popen(). It's not safe, it's not secure, and it's certainly not portable. But it can allow you to get going quickly. That's valuable to me. It lets me focus on the really important core of the program, usually the reason I used c in the first place.
In high level languages, you'd better have a pretty good reason. :)
Instead of doing either, I'd Unix it up and build a script framework around your app, using the command line arguments and stdin.
Other's have mentioned good points (reliability, security, safety, portability, etc) - but I'll throw out another. Performance. Generally it is many times faster to call a library function or even spawn a new thread then it is to start an entire new process (and then you still have to correctly check/verify it's execution and parse it's output!)

Sensible defaults for configuration

I've recently started to play with Ruby on Rails which favours convention over configuration and relies on sensible defaults to tie various aspects of the application together.
I was thinking that it might be useful if this concept of sensible default configuration was used in general configation for various frameworks then it might save some development headache.
For example, in a .net app I usually want to log an exception in the windows event log using enterprise library exception handling block but if I don't explicity state the behaviour I want in a config file then EL will complain. I think that instead, if it can't find custom configuration then it should revert to a sensible default configuration, like logging my exception in the event log.
Would this be a good or bad concept for frameworks to adopt for their configuration?
I work a lot with a framework that does this exact thing. My trouble with this way of working is that:
the framework grew to having an excessive amount of configuration keys that are actually never used/set in a configuration file.
behavior of the software becomes implicit sometimes, I want to explicitly set the system to behave a certain way instead of having it fallback on some other code path due to a 'default'.
a missed typo in configuration key may result in a very long diagnostic session before figuring out what is going on.
When forgetting to set a configuration value I rather have the software tell me, instead of assuming some form of behavior that I might not at all be after.
I'd prefer a 'template' configuration file in which I change what I want and have the unchanged settings serve as the default.
Figuring which out which convention the software picked when debugging can be a lot of time wasted also.

What are the best practices to log an error?

Many times I saw logging of errors like these:
System.out.println("Method aMethod with parameters a:"+a+" b: "+b);
print("Error in line 88");
so.. What are the best practices to log an error?
EDIT:
This is java but could be C/C++, basic, etc.
Logging directly to the console is horrendous and frankly, the mark of an inexperienced developer. The only reason to do this sort of thing is 1) he or she is unaware of other approaches, and/or 2) the developer has not thought one bit about what will happen when his/her code is deployed to a production site, and how the application will be maintained at that point. Dealing with an application that is logging 1GB/day or more of completely unneeded debug logging is maddening.
The generally accepted best practice is to use a Logging framework that has concepts of:
Different log objects - Different classes/modules/etc can log to different loggers, so you can choose to apply different log configurations to different portions of the application.
Different log levels - so you can tweak the logging configuration to only log errors in production, to log all sorts of debug and trace info in a development environment, etc.
Different log outputs - the framework should allow you to configure where the log output is sent to without requiring any changes in the codebase. Some examples of different places you might want to send log output to are files, files that roll over based on date/size, databases, email, remoting sinks, etc.
The log framework should never never never throw any Exceptions or errors from the logging code. Your application should not fail to load or fail to start because the log framework cannot create it's log file or obtain a lock on the file (unless this is a critical requirement, maybe for legal reasons, for your app).
The eventual log framework you will use will of course depend on your platform. Some common options:
Java:
Apache Commons Logging
log4j
logback
Built-in java.util.logging
.NET:
log4net
C++:
log4cxx
Apache Commons Logging is not intended for applications general logging. It's intended to be used by libraries or APIs that don't want to force a logging implementation on the API's user.
There are also classloading issues with Commons Logging.
Pick one of the [many] logging api's, the most widely used probably being log4j or the Java Logging API.
If you want implementation independence, you might want to consider SLF4J, by the original author of log4j.
Having picked an implementation, then use the logging levels/severity within that implementation consistently, so that searching/filtering logs is easier.
The easiest way to log errors in a consistent format is to use a logging framework such as Log4j (assuming you're using Java). It is useful to include a logging section in your code standards to make sure all developers know what needs to be logged. The nice thing about most logging frameworks is they have different logging levels so you can control how verbose the logging is between development, test, and production.
A best practice is to use the java.util.logging framework
Then you can log messages in either of these formats
log.warning("..");
log.fine("..");
log.finer("..");
log.finest("..");
Or
log.log(Level.WARNING, "blah blah blah", e);
Then you can use a logging.properties (example below) to switch between levels of logging, and do all sorts of clever stuff like logging to files, with rotation etc.
handlers = java.util.logging.ConsoleHandler
.level = WARNING
java.util.logging.ConsoleHandler.level = ALL
com.example.blah = FINE
com.example.testcomponents = FINEST
Frameworks like log4j and others should be avoided in my opinion, Java has everything you need already.
EDIT
This can apply as a general practice for any programming language. Being able to control all levels of logging from a single property file is often very important in enterprise applications.
Some suggested best-practices
Use a logging framework. This will allow you to:
Easily change the destination of your log messages
Filter log messages based on severity
Support internationalised log messages
If you are using java, then slf4j is now preferred to Jakarta commons logging as the logging facade.
As stated slf4j is a facade, and you have to then pick an underlying implementation. Either log4j, java.util.logging, or 'simple'.
Follow your framework's advice to ensuring expensive logging operations are not needlessly carried out
The apache common logging API as mentioned above is a great resource. Referring back to java, there is also a standard error output stream (System.err).
Directly from the Java API:
This stream is already open and ready
to accept output data.
Typically this stream corresponds to
display output or another output
destination specified by the host
environment or user. By convention,
this output stream is used to display
error messages or other information
that should come to the immediate
attention of a user even if the
principal output stream, the value of
the variable out, has been redirected
to a file or other destination that is
typically not continuously monitored.
Aside from technical considerations from other answers it is advisable to log a meaningful message and perhaps some steps to avoid the error in the future. Depending on the errors, of course.
You could get more out of a I/O-Error when the message states something like "Could not read from file X, you don't have the appropriate permission."
See more examples on SO or search the web.
There really is no best practice for logging an error. It basically just needs to follow a consistent pattern (within the software/company/etc) that provides enough information to track the problem down. For Example, you might want to keep track of the time, the method, parameters, calling method, etc.
So long as you dont just print "Error in "