CONTROL phasers from a trait - exception

Is it possible to add a CONTROL phaser from a trait?
Following the example from the docs, it's simple to add a custom control exception in runtime code:
class CX::Oops does X::Control {};
sub f { CONTROL { when CX::Oops { note 'Oops!'; .resume}}
CX::Oops.new.throw; }
f; # OUTPUT: «Oops»
However, my attempts to do so from a trait haven't worked:
sub trait_mod:<is>(Sub $fn, :$oopsable) {
$fn.add_phaser: 'CONTROL', { when CX::Oops { note 'Oops!'; .resume} }}
sub g is oopsable { CX::Oops.new.throw; }
g; # OUTPUT: «control exception without handler»
From the .has_phasers and fire_phasers (fun name!) methods, I can tell that this is adding the control phaser. Do I need to do something to register it as a handler, or is there something else I'm missing?

It's an interesting question whether CATCH and CONTROL really are phasers. They fit insofar as they use capital letters and that they react to something that happens at a certain time. However, they are also parsed in a different part of the grammar, as statement controls, and so are restricted to occurring at statement level. And the compiler doesn't handle them with a call to add_phasers, either. An exception handler implies some code generation, and it is the generated code in the body of the routine that actually results in the exception being handled.
One could reasonably ask if the compiler should not look to see if a trait did call add_phasers with CATCH or CONTROL and then generate code accordingly. However, since in the current compiler implementation the handlers are part of the routine body, and all work (except optimization) is completed on the routine body before trait handlers are called, it's simply too late for the trait to have an effect.
Additionally, the body of a CATCH and CONTROL block is not compiled simply as an ordinary block, but also gets code generated to handle the when smartmatching, and to rethrow the exception if none of the handlers match; at least this final step would need to be done manually. There's also something about updating $!, if I remember correctly.
The one bit of good news is that the forthcoming rakuast-based compiler frontend will:
Provide an API to the AST of the routine in the trait handler. This can then be used to add the CATCH/CONTROL handler at AST level, and thus will get all of the correct semantics and perform the same as one written literally into the code.
Delay the code generation until much later, so that the timing works out too.

Related

How do i know when a function body ends in assembly

I want to know when a function body end in assemby, for example in c you have this brakets {} that tell you when the function body start and when it ends but how do i know this in assembly?
Is there a parser that can extract me all the functions from assembly and start line and endline of their body?
There's no foolproof way, and there might not even be a well-defined correct answer in hand-written asm.
Usually (e.g. in compiler-generated code) you know a function ends when you see the next global symbol, like objdump does to decide when to print a new "banner". But without all function-start symbols being visible, there's no unambigious way. That's why some object file formats have room for size metadata associated with a symbol. Like .size foo, . - foo in GAS syntax.
It's not as easy as looking for a ret; some functions end with a jmp tail-call to another function. And some call a noreturn function like abort or __stack_chk_fail (not tailcall because they want to push a return address for a backtrace.) Or just fall off into whatever's next because that path had undefined behaviour in the source so the compiler assumed it wasn't reachable and stopped generating instructions for it, e.g. a C++ non-void function where execution can/does fall off the end without a return.
In general, assembly can blur the lines of what a function is.
Asm has features you can use to implement the high-level concept of a function, but you're not restricted to that.
e.g. multiple asm functions could all return by jumping to a common block of code that pops some registers before a ret. Is that shared tail a separate function that's called with a tail-called with a special calling convention?
Compilers don't usually do that, but humans could.
As for function entry points, usually some other code somewhere in the program will contain a call to it. But not necessarily; it might only be reachable via a table of function pointers, and you don't know that a block of .rodata holds function pointers until you find some code loading from it and calling or jumping.
But that doesn't work if the lowest-address instruction of the function isn't its entry point. See Does a function with instructions before the entry-point label cause problems for anything (linking)? for an example
Compilers don't generate code like that, but humans can. (It's a handy trick sometimes for https://codegolf.stackexchange.com/ questions.)
Or in the general case, a function might have multiple entry points. Or you could describe that as multiple functions with overlapping implementations. Sometimes it's as simple as one tailcalling another by falling into it without needing a jmp, i.e. it starts a few instructions before another.
I wan't to know when a function body ends in assembly, [...]
There are mainly four ways that the execution of a stream of (userspace) instructions can "end":
An unconditional jump like jmp or a conditional one like Jcc (je,jnz,jg ...)
A ret instruction (meaning the end of a subroutine) which probably comes closest to the intent of your question (including the ExitProcess "ret" command)
The call of another "function"
An exception. Not a C style exception, but rather an exception like "Invalid instruction" or "Division by 0" which terminates the user space program
[...] for example in c you have this brakets {} that tell you when the function body start and when it ends but how do i know this in assembly?
Simple answer: you don't. On the machine level every address can (theoretically) be an entry point to a "function". So there is no unique entry point to a "function" other than defined - and you can define anything.
On a tangent, this relates to self-modifying code and viruses, but it must not. The exit/end is as described in the first part above.
Is there a parser that can extract me all the functions from assembly and
start line and endline of their body?
Disassemblers create some kind of "functions" with entry and exit points. But they are merely assumed. No way to know if that assumption is correct. This may cause problems.
The usual approach is using a disassembler and the work to recombinate the stream of instructions to different "functions" remains to the person that mandated this task (vulgo: you). Some tools exist that claim to simplify this, but I cannot judge their efficacy.
From the perspective of a high level language, there are decompilers that try to reverse the transformation from (for example) C to assembly/machine code that try to automatize that task and will work more or less or in some cases.

What is a simple do-once technique?

What is a simple technique to perform some action just once, no matter how many times the function is executed? Do any programming languages have specific ways built-in to handle this somewhat common problem?
Example: initialize() shall set global_variable to true on ONLY its first execution.
A c++ example (looking for alternatives to this out of curiosity - not necessity):
init.h:
int global_variable;
void initialize(void);
init.c:
static bool already_initialized = false;
void initialize(void)
{
if (!already_initialized)
{
already_initialized = true;
global_variable = true;
}
}
Apart from the global variable technique that's available in any language there are several other ways to do this.
In languages that have static variables using a static variable instead of global is preferable in order to prevent variable name collisions in the global scope.
In some languages you can redefine/redeclare functions at runtime so you can do something like this:
function initialize (void) {
// do some stuff...
function initialize(void) {}; // redefine function here to do nothing
}
In some languages you can't quite redeclare functions within functions due to scope issues (inner functions) but you can still reassign other functions to an existing function. So you can do something like this:
function initialize (void) {
// do some stuff ...
initialize = function (void) {}; // assign no-op anonymous function
// to this function
}
Some languages (especially declarative languages) actually have a "latch" functionality built in that executes just once. Sometimes there is even a reset functionality. So you can actually do something like this:
function do_once initialize (void) {
// do some stuff
}
If the language allows it you can reset the do_once directive if you really want to re-execute the function:
reset initialize;
initialize();
Note: The C-like syntax above are obviously pseudocode and don't represent any real language but the features described do exist in real languages. Also, programmers rarely encounter declarative languages apart from HTML, XML and CSS but Turing complete declarative languages do exist and are typically used for hardware design and the "do_once" feature typically compiles down to a D flip-flop or latch.
Eiffel has a built-in notion of once routines. A once routine is executed only the first time it is called, on the next call it is not executed. If the routine is a function, i.e. returns a result, the result of the first execution is returned for all subsequent calls. If the first call terminates with an exception, the same exception is raised for all subsequent calls.
The declaration of the once function foo looks like
foo: RETURN_TYPE
once
... -- Some code to initialize Result.
end
In a multithreaded environment it might be desirable to distinguish objects used by different threads. This is accomplished by adding the key "THREAD" to the declaration (it is actually the default):
foo: RETURN_TYPE
once ("THREAD")
...
end
If the same object has to be shared by all the threads, the key "PROCESS" is used instead:
foo: RETURN_TYPE
once ("PROCESS")
...
end
The same syntax though without return type is used for procedures.
Process-wide once routines are guaranteed to be executed just once for the whole process. Because race conditions are possible, Eiffel run-time makes sure at most one thread may trigger evaluation of the given once routine at a time. Other threads become suspended until the primary execution completes so that they can use the single result or be sure the action is performed only once.
In other respects once routines are no different from the regular routines in a sense that they follow the same rules of object-oriented programming like inheritance and redeclaration (overriding). Because this is a normal routine it can call other routines, directly or indirectly involving itself. When such a recursive call occurs, the once routine is not executed again, but the last known value of the result is returned instead.
yes, some languages (scala) does support it (using lazy) but usually this functionality is provided by frameworks because there are some trade offs. sometimes you need thread level, blocking synchronization. sometimes spin-offs is enough. sometimes you don't need synchronization because simple single-threaded cache is enough. sometimes you need to remember many calculated values and you are willing to forget last recently used ones. and so on. probably that's why languages generally don't support that pattern - that's frameworks' job

When is passing a subprogram as a parameter necessary

I've been reading a Concepts of Programming Languages by Robert W. Sebesta and in chapter 9 there is a brief section on passing a SubProgram to a function as a parameter. The section on this is extremely brief, about 1.5 pages, and the only explanation to its application is:
When a subprogram must sample some mathematical function. Such as a Subprogram that does numerical integration by estimating the area under a graph of a function by sampling the function at a number of different points. Such a Subprogram should be usable everywhere.
This is completely off from anything I have ever learned. If I were to approach this problem in my own way I would create a function object and create a function that accomplishes the above and accepts function objects.
I have no clue why this is a design issue for languages because I have no idea where I would ever use this. A quick search hasn't made this any clearer for me.
Apparently you can accomplish this in C and C++ by utilizing pointers. Languages that allow nested Subprograms such as JavaScript allow you do do this in 3 separate ways:
function sub1() {
var x;
function sub2() {
alert( x ); //Creates a dialog box with the value of x
};
function sub3() {
var x;
x = 3;
sub4( sub2 ); //*shallow binding* the environment of the
//call statement that enacts the passed
//subprogram
};
function sub4( subx ) {
var x;
x = 4;
subx();
};
x=1;
sub3();
};
I'd appreciate any insight offered.
Being able to pass "methods" is very useful for a variety of reasons. Among them:
Code which is performing a complicated operation might wish to provide a means of either notifying a user of its progress or allowing the user to cancel it. Having the code for the complicated operation has to do those actions itself will both add complexity to it and also cause ugliness if it's invoked from code which uses a different style of progress bar or "Cancel" button. By contrast, having the caller supply an UpdateStatusAndCheckCancel() method means that the caller can supply a method which will update whatever style of progress bar and cancellation method the caller wants to use.
Being able to store methods within a table can greatly simplify code that needs to export objects to a file and later import them again. Rather than needing to have code say
if (ObjectType == "Square")
AddObject(new Square(ObjectParams));
else if (ObjectType == "Circle")
AddObject(new Circle(ObjectParams));`
etc. for every kind of object
code can say something like
if (ObjectCreators.TryGetValue(ObjectType, out factory))
AddObject(factory(ObjectParams));
to handle all kinds of object whose creation methods have been added to ObjectCreators.
Sometimes it's desirable to be able to handle events that may occur at some unknown time in the future; the author of code which knows when those events occur might have no clue about what things are supposed to happen then. Allowing the person who wants the action to happen to give a method to the code which will know when it happens allows for that code to perform the action at the right time without having to know what it should do.
The first situation represents a special case of callback where the function which is given the method is expected to only use it before it returns. The second situation is an example of what's sometimes referred to as a "factory pattern" or "dependency injection" [though those terms are useful in some broader contexts as well]. The third case is commonly handled using constructs which frameworks refer to as events, or else with an "observer" pattern [the observer asks the observable object to notify it when something happens].

Should you check for wrong parameter values in the constructor?

Do you check for data validity in every constructor, or do you just assume the data is correct and throw exceptions in the specific function that has a problem with the parameter?
A constructor is a function too - why differentiate?
Creating an object implies that all the integrity checks have been done. It's perfectly reasonable to check parameters in a constructor and throw an exception once an illegal value has been detected.
Among all this simplifies debugging. When your program throws exception in a constructor you can observe a stack trace and often immediately see the cause. If you delay the check you'll then have to do more investigation to detect what earlier event causes the current error.
It's always better to have a fully-constructed object with all the invariants "satisfied" from the very beginning. Beware, however, of throwing exceptions from constructor in non-managed languages since that may result in a memory leak.
I always coerce values in constructors. If the users can't be bothered to follow the rules, I just silently enforce them without telling them.
So, if they pass a value of 107%, I'll set it to 100%. I just make it clear in the documentation that that's what happens.
Only if there's no obvious logical coercion do I throw an exception back to them. I like to call this the "principal of most astonishment to those too lazy or stupid to read the documentation".
If you throw in the constructor, the stack trace is more likely to show where the wrong values are coming from.
I agree with sharptooth in the general case, but there are sometimes objects that can have states in which some functions are valid and some are not. In these situations it's better to defer checks to the functions with these particular dependencies.
Usually you'll want to validate in a constructor, if only because that means your objects will always be in a valid state. But some kinds of objects need function-specific checks, and that's OK too.
This is a difficult question. I have changed my habit related to this question several times in the last years, so here are some thoughts coming to my mind :
Checking the arguments in the constructor is like checking the arguments for a traditional method... so I would not differentiate here...
Checking the arguments of a method of course does help to ensure parameters passed are correct, but also introduces a lot of more code you have to write, which not always pays off in my opinion... Especially when you do write short methods which delegate quite frequently to other methods, I guess you would end up with 50 % of your code just doing parameter checks...
Furthermore consider that very often when you write a unit test for your class, you don't want to have to pass in valid parameters for all methods... Quite often it does make sense to only pass those parameters important for the current test case, so having checks would slow you down in writing unit tests...
I think this really depends on the situation... what I ended up is to only write parameter checks when I know that the parameters do come from an external system (like user input, databases, web services, or something like that...). If data comes from my own system, I don't write tests most of the time.
I always try to fail as early as possible. So I definitively check the parameters in the constructor.
In theory, the calling code should always ensure that preconditions are met, before calling a function. The same goes for constructors.
In practice, programmers are lazy, and to be sure it's best to still check preconditions. Asserts come in handy there.
Example. Excuse my non-curly brace syntax:
// precondition b<>0
function divide(a,b:double):double;
begin
assert(b<>0); // in case of a programming error.
result := a / b;
end;
// calling code should be:
if foo<>0 then
bar := divide(1,0)
else
// do whatever you need to do when foo equals 0
Alternatively, you can always change the preconditions. In case of a constructor this is not really handy.
// no preconditions.. still need to check the result
function divide(a,b:double; out hasResult:boolean):double;
begin
hasResult := b<>0;
if not hasResult then
Exit;
result := a / b;
end;
// calling code:
bar := divide(1,0,result);
if not result then
// do whatever you need to do when the division failed
Always fail as soon as possible. A good example of this practice is exhibited by the runtime ..
* if you try and use an invalid index for an array you get an exception.
* casting fails immediately when attempted not at some later stage.
Imagine what a disaster code be if a bad cast remained in a reference and everytime you tried to use that you got an exception. The only sensible approach is to fail as soon as possible and try and avoid bad / invalid state asap.

Conditional logging with minimal cyclomatic complexity

After reading "What’s your/a good limit for cyclomatic complexity?", I realize many of my colleagues were quite annoyed with this new QA policy on our project: no more 10 cyclomatic complexity per function.
Meaning: no more than 10 'if', 'else', 'try', 'catch' and other code workflow branching statement. Right. As I explained in 'Do you test private method?', such a policy has many good side-effects.
But: At the beginning of our (200 people - 7 years long) project, we were happily logging (and no, we can not easily delegate that to some kind of 'Aspect-oriented programming' approach for logs).
myLogger.info("A String");
myLogger.fine("A more complicated String");
...
And when the first versions of our System went live, we experienced huge memory problem not because of the logging (which was at one point turned off), but because of the log parameters (the strings), which are always calculated, then passed to the 'info()' or 'fine()' functions, only to discover that the level of logging was 'OFF', and that no logging were taking place!
So QA came back and urged our programmers to do conditional logging. Always.
if(myLogger.isLoggable(Level.INFO) { myLogger.info("A String");
if(myLogger.isLoggable(Level.FINE) { myLogger.fine("A more complicated String");
...
But now, with that 'can-not-be-moved' 10 cyclomatic complexity level per function limit, they argue that the various logs they put in their function is felt as a burden, because each "if(isLoggable())" is counted as +1 cyclomatic complexity!
So if a function has 8 'if', 'else' and so on, in one tightly-coupled not-easily-shareable algorithm, and 3 critical log actions... they breach the limit even though the conditional logs may not be really part of said complexity of that function...
How would you address this situation ?
I have seen a couple of interesting coding evolution (due to that 'conflict') in my project, but I just want to get your thoughts first.
Thank you for all the answers.
I must insist that the problem is not 'formatting' related, but 'argument evaluation' related (evaluation that can be very costly to do, just before calling a method which will do nothing)
So when a wrote above "A String", I actually meant aFunction(), with aFunction() returning a String, and being a call to a complicated method collecting and computing all kind of log data to be displayed by the logger... or not (hence the issue, and the obligation to use conditional logging, hence the actual issue of artificial increase of 'cyclomatic complexity'...)
I now get the 'variadic function' point advanced by some of you (thank you John).
Note: a quick test in java6 shows that my varargs function does evaluate its arguments before being called, so it can not be applied for function call, but for 'Log retriever object' (or 'function wrapper'), on which the toString() will only be called if needed. Got it.
I have now posted my experience on this topic.
I will leave it there until next Tuesday for voting, then I will select one of your answers.
Again, thank you for all the suggestions :)
With current logging frameworks, the question is moot
Current logging frameworks like slf4j or log4j 2 don't require guard statements in most cases. They use a parameterized log statement so that an event can be logged unconditionally, but message formatting only occurs if the event is enabled. Message construction is performed as needed by the logger, rather than pre-emptively by the application.
If you have to use an antique logging library, you can read on to get more background and a way to retrofit the old library with parameterized messages.
Are guard statements really adding complexity?
Consider excluding logging guards statements from the cyclomatic complexity calculation.
It could be argued that, due to their predictable form, conditional logging checks really don't contribute to the complexity of the code.
Inflexible metrics can make an otherwise good programmer turn bad. Be careful!
Assuming that your tools for calculating complexity can't be tailored to that degree, the following approach may offer a work-around.
The need for conditional logging
I assume that your guard statements were introduced because you had code like this:
private static final Logger log = Logger.getLogger(MyClass.class);
Connection connect(Widget w, Dongle d, Dongle alt)
throws ConnectionException
{
log.debug("Attempting connection of dongle " + d + " to widget " + w);
Connection c;
try {
c = w.connect(d);
} catch(ConnectionException ex) {
log.warn("Connection failed; attempting alternate dongle " + d, ex);
c = w.connect(alt);
}
log.debug("Connection succeeded: " + c);
return c;
}
In Java, each of the log statements creates a new StringBuilder, and invokes the toString() method on each object concatenated to the string. These toString() methods, in turn, are likely to create StringBuilder instances of their own, and invoke the toString() methods of their members, and so on, across a potentially large object graph. (Before Java 5, it was even more expensive, since StringBuffer was used, and all of its operations are synchronized.)
This can be relatively costly, especially if the log statement is in some heavily-executed code path. And, written as above, that expensive message formatting occurs even if the logger is bound to discard the result because the log level is too high.
This leads to the introduction of guard statements of the form:
if (log.isDebugEnabled())
log.debug("Attempting connection of dongle " + d + " to widget " + w);
With this guard, the evaluation of arguments d and w and the string concatenation is performed only when necessary.
A solution for simple, efficient logging
However, if the logger (or a wrapper that you write around your chosen logging package) takes a formatter and arguments for the formatter, the message construction can be delayed until it is certain that it will be used, while eliminating the guard statements and their cyclomatic complexity.
public final class FormatLogger
{
private final Logger log;
public FormatLogger(Logger log)
{
this.log = log;
}
public void debug(String formatter, Object... args)
{
log(Level.DEBUG, formatter, args);
}
… &c. for info, warn; also add overloads to log an exception …
public void log(Level level, String formatter, Object... args)
{
if (log.isEnabled(level)) {
/*
* Only now is the message constructed, and each "arg"
* evaluated by having its toString() method invoked.
*/
log.log(level, String.format(formatter, args));
}
}
}
class MyClass
{
private static final FormatLogger log =
new FormatLogger(Logger.getLogger(MyClass.class));
Connection connect(Widget w, Dongle d, Dongle alt)
throws ConnectionException
{
log.debug("Attempting connection of dongle %s to widget %s.", d, w);
Connection c;
try {
c = w.connect(d);
} catch(ConnectionException ex) {
log.warn("Connection failed; attempting alternate dongle %s.", d);
c = w.connect(alt);
}
log.debug("Connection succeeded: %s", c);
return c;
}
}
Now, none of the cascading toString() calls with their buffer allocations will occur unless they are necessary! This effectively eliminates the performance hit that led to the guard statements. One small penalty, in Java, would be auto-boxing of any primitive type arguments you pass to the logger.
The code doing the logging is arguably even cleaner than ever, since untidy string concatenation is gone. It can be even cleaner if the format strings are externalized (using a ResourceBundle), which could also assist in maintenance or localization of the software.
Further enhancements
Also note that, in Java, a MessageFormat object could be used in place of a "format" String, which gives you additional capabilities such as a choice format to handle cardinal numbers more neatly. Another alternative would be to implement your own formatting capability that invokes some interface that you define for "evaluation", rather than the basic toString() method.
In Python you pass the formatted values as parameters to the logging function. String formatting is only applied if logging is enabled. There's still the overhead of a function call, but that's minuscule compared to formatting.
log.info ("a = %s, b = %s", a, b)
You can do something like this for any language with variadic arguments (C/C++, C#/Java, etc).
This isn't really intended for when the arguments are difficult to retrieve, but for when formatting them to strings is expensive. For example, if your code already has a list of numbers in it, you might want to log that list for debugging. Executing mylist.toString() will take a while to no benefit, as the result will be thrown away. So you pass mylist as a parameter to the logging function, and let it handle string formatting. That way, formatting will only be performed if needed.
Since the OP's question specifically mentions Java, here's how the above can be used:
I must insist that the problem is not 'formatting' related, but 'argument evaluation' related (evaluation that can be very costly to do, just before calling a method which will do nothing)
The trick is to have objects that will not perform expensive computations until absolutely needed. This is easy in languages like Smalltalk or Python that support lambdas and closures, but is still doable in Java with a bit of imagination.
Say you have a function get_everything(). It will retrieve every object from your database into a list. You don't want to call this if the result will be discarded, obviously. So instead of using a call to that function directly, you define an inner class called LazyGetEverything:
public class MainClass {
private class LazyGetEverything {
#Override
public String toString() {
return getEverything().toString();
}
}
private Object getEverything() {
/* returns what you want to .toString() in the inner class */
}
public void logEverything() {
log.info(new LazyGetEverything());
}
}
In this code, the call to getEverything() is wrapped so that it won't actually be executed until it's needed. The logging function will execute toString() on its parameters only if debugging is enabled. That way, your code will suffer only the overhead of a function call instead of the full getEverything() call.
In languages supporting lambda expressions or code blocks as parameters, one solution for this would be to give just that to the logging method. That one could evaluate the configuration and only if needed actually call/execute the provided lambda/code block.
Did not try it yet, though.
Theoretically this is possible. I would not like to use it in production due to performance issues i expect with that heavy use of lamdas/code blocks for logging.
But as always: if in doubt, test it and measure the impact on cpu load and memory.
Thank you for all your answers! You guys rock :)
Now my feedback is not as straight-forward as yours:
Yes, for one project (as in 'one program deployed and running on its own on a single production platform'), I suppose you can go all technical on me:
dedicated 'Log Retriever' objects, which can be pass to a Logger wrapper only calling toString() is necessary
used in conjunction with a logging variadic function (or a plain Object[] array!)
and there you have it, as explained by #John Millikin and #erickson.
However, this issue forced us to think a little about 'Why exactly we were logging in the first place ?'
Our project is actually 30 different projects (5 to 10 people each) deployed on various production platforms, with asynchronous communication needs and central bus architecture.
The simple logging described in the question was fine for each project at the beginning (5 years ago), but since then, we has to step up. Enter the KPI.
Instead of asking to a logger to log anything, we ask to an automatically created object (called KPI) to register an event. It is a simple call (myKPI.I_am_signaling_myself_to_you()), and does not need to be conditional (which solves the 'artificial increase of cyclomatic complexity' issue).
That KPI object knows who calls it and since he runs from the beginning of the application, he is able to retrieve lots of data we were previously computing on the spot when we were logging.
Plus that KPI object can be monitored independently and compute/publish on demand its information on a single and separate publication bus.
That way, each client can ask for the information he actually wants (like, 'has my process begun, and if yes, since when ?'), instead of looking for the correct log file and grepping for a cryptic String...
Indeed, the question 'Why exactly we were logging in the first place ?' made us realize we were not logging just for the programmer and his unit or integration tests, but for a much broader community including some of the final clients themselves. Our 'reporting' mechanism had to be centralized, asynchronous, 24/7.
The specific of that KPI mechanism is way out of the scope of this question. Suffice it to say its proper calibration is by far, hands down, the single most complicated non-functional issue we are facing. It still does bring the system on its knee from time to time! Properly calibrated however, it is a life-saver.
Again, thank you for all the suggestions. We will consider them for some parts of our system when simple logging is still in place.
But the other point of this question was to illustrate to you a specific problem in a much larger and more complicated context.
Hope you liked it. I might ask a question on KPI (which, believe or not, is not in any question on SOF so far!) later next week.
I will leave this answer up for voting until next Tuesday, then I will select an answer (not this one obviously ;) )
Maybe this is too simple, but what about using the "extract method" refactoring around the guard clause? Your example code of this:
public void Example()
{
if(myLogger.isLoggable(Level.INFO))
myLogger.info("A String");
if(myLogger.isLoggable(Level.FINE))
myLogger.fine("A more complicated String");
// +1 for each test and log message
}
Becomes this:
public void Example()
{
_LogInfo();
_LogFine();
// +0 for each test and log message
}
private void _LogInfo()
{
if(!myLogger.isLoggable(Level.INFO))
return;
// Do your complex argument calculations/evaluations only when needed.
}
private void _LogFine(){ /* Ditto ... */ }
In C or C++ I'd use the preprocessor instead of the if statements for the conditional logging.
Pass the log level to the logger and let it decide whether or not to write the log statement:
//if(myLogger.isLoggable(Level.INFO) {myLogger.info("A String");
myLogger.info(Level.INFO,"A String");
UPDATE: Ah, I see that you want to conditionally create the log string without a conditional statement. Presumably at runtime rather than compile time.
I'll just say that the way we've solved this is to put the formatting code in the logger class so that the formatting only takes place if the level passes. Very similar to a built-in sprintf. For example:
myLogger.info(Level.INFO,"A String %d",some_number);
That should meet your criteria.
Conditional logging is evil. It adds unnecessary clutter to your code.
You should always send in the objects you have to the logger:
Logger logger = ...
logger.log(Level.DEBUG,"The foo is {0} and the bar is {1}",new Object[]{foo, bar});
and then have a java.util.logging.Formatter that uses MessageFormat to flatten foo and bar into the string to be output. It will only be called if the logger and handler will log at that level.
For added pleasure you could have some kind of expression language to be able to get fine control over how to format the logged objects (toString may not always be useful).
(source: scala-lang.org)
Scala has a annontation #elidable() that allows you to remove methods with a compiler flag.
With the scala REPL:
C:>scala
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.
6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.annotation.elidable
import scala.annotation.elidable
scala> import scala.annotation.elidable._
import scala.annotation.elidable._
scala> #elidable(FINE) def logDebug(arg :String) = println(arg)
logDebug: (arg: String)Unit
scala> logDebug("testing")
scala>
With elide-beloset
C:>scala -Xelide-below 0
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.
6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.annotation.elidable
import scala.annotation.elidable
scala> import scala.annotation.elidable._
import scala.annotation.elidable._
scala> #elidable(FINE) def logDebug(arg :String) = println(arg)
logDebug: (arg: String)Unit
scala> logDebug("testing")
testing
scala>
See also Scala assert definition
As much as I hate macros in C/C++, at work we have #defines for the if part, which if false ignores (does not evaluate) the following expressions, but if true returns a stream into which stuff can be piped using the '<<' operator.
Like this:
LOGGER(LEVEL_INFO) << "A String";
I assume this would eliminate the extra 'complexity' that your tool sees, and also eliminates any calculating of the string, or any expressions to be logged if the level was not reached.
Here is an elegant solution using ternary expression
logger.info(logger.isInfoEnabled() ? "Log Statement goes here..." : null);
Consider a logging util function ...
void debugUtil(String s, Object… args) {
if (LOG.isDebugEnabled())
LOG.debug(s, args);
}
);
Then make the call with a "closure" round the expensive evaluation that you want to avoid.
debugUtil(“We got a %s”, new Object() {
#Override String toString() {
// only evaluated if the debug statement is executed
return expensiveCallToGetSomeValue().toString;
}
}
);