what I do for these conditions to follow FP? - function

I'm reading FP and I have two basic questions:
FP says function should take one input and gives single output. So what should I do with void methods? It doesn't return anything right?
FP says function should have single
resresponsibility, then how do we handle log statements inside the method? That doesn't violate the rule?
Wish to know how they handle these things in Scala, Haskell.
Thanks in advance.

I'm assuming you're reading a book called "Functional Programming", although it would help to know who the author is as well. In any case, these questions are relatively easy to answer and I'll give my answers with respect to Haskell because I don't know Scala.
So what should I do with void methods? It doesn't return anything right?
There are no void methods in a pure functional language like Haskell. A pure function has no side effects, so a pure function without a return value is meaningless, something like
f :: Int -> ()
f x = let y = x * x + 3 in ()
won't do any computation, y is never calculated and all inputs you give will return the same value. However, if you have an impure function, such as one that writes a file or prints something to the screen then it must exist in a monadic context. If you don't understand monads yet, don't worry. They take a bit to get used to, but they're a very powerful and useful abstraction that can make a lot of problems easier. A monad is something like IO, and in Haskell this takes a type parameter to indicate the value that can be stored inside this context. So you can have something like
putStrLn :: String -> IO ()
Or
-- FYI: FilePath is an alias for String
writeFile :: FilePath -> String -> IO ()
these have side effects, denoted by the return value of IO something, and the () something means that there is no meaningful result from that operation. In Python 3, for example, the print function returns None because there isn't anything meaningful to return after printing a value to the screen. The () can also mean that a monadic context has a meaningful value, such as in readFile or getLine:
getLine :: IO String
readFile :: FilePath -> IO String
When writing your main function, you could do something like
main = do
putStrLn "Enter a filename:"
fname <- getLine -- fname has type String
writeFile fname "This text will be in a file"
contents <- readFile fname
putStrLn "I wrote the following text to the file:"
putStrLn contents
FP says function should have single resresponsibility, then how do we handle log statements inside the method? That doesn't violate the rule?
Most functions don't need logging inside them. I know that sounds weird, but it's true. In Haskell and most other functional languages, you'll write a lot of small, easily testable functions that each do one step. It's very common to have lots of 1 or 2 line functions in your application.
When you actually do need to do logging, say you're building a web server, there are a couple different approaches you can take. There is actually a monad out there called Writer that lets you aggregate values as you perform operations. These operations don't have to be impure and do IO, they can be entirely pure. However, a true logging framework that one might use for a web server or large application would likely come with its own framework. This is so that you can set up logging to the screen, to files, network locations, email, and more. This monad will wrap the IO monad so that it can perform these side effects. A more advanced one would probably use some more advanced libraries like monad transformers or extensible effects. These let you "combine" different monads together so you can use utilities for both at the same time. You might see code like
type MyApp a = LogT IO a
-- log :: Monad m => LogLevel -> String -> LogT m ()
getConnection :: Socket -> MyApp Connection
getConnection sock = do
log DEBUG "Waiting for next connection"
conn <- liftIO $ acceptConnection sock
log INFO $ "Accepted connection from IP: " ++ show (connectionIP conn)
return conn
I'm not expecting you to understand this code fully, but I hope you can see that it has logging and network operations mixed together. The liftIO function is a common one with monad transformers that "transforms" an IO operation into a new monad that wraps IO.
This may sound pretty confusing, and it can be at first if you're used to Python, Java, or C++ like languages. I certainly was! But after I got used to thinking about problems in this different way makes me wish I had these constructs in OOP languages all the time.

I can answer from Haskell perspective.
FP says function should take one input and gives single output. So what should I do with void methods? It doesn't return anything right?
Because that's what actually functions are! In mathematics, every functions takes some input and gives you some output. You cannot expect some output without giving any input. void methods you see in other languages doesn't make sense in a mathematical way. But in reality void methods in other languages do some kind of IO operations, which is abstracted as IO monad in Haskell.
how do we handle log statements inside the method
You can use a monad transformer stack and lift your IO log operations to perform there. In fact, writer monad can do log operations purely without any IO activities.

Related

'Auxiliary' function in Haskell

My lecturer at the moment has a strange habit I've not seen before, I'm wondering if this is a Haskell standard or a quirk of his programming style.
Basically, he'll often do thing such as this:
functionEx :: String -> Int
functionEx s = functionExA s 0
functionExA :: String -> Int -> Int
functionExA s n = --function code
He calls these 'auxiliary' functions, and for the most part the only advantage I can see to these is to make a function callable with fewer supplied arguments. But most of these are hidden away in code anyway and in my view adding the argument to the original call is much more readable.
As I said, I'm not suggesting my view is correct, I've just not seen it done like this before and would like to know if it's common in Haskell.
Yes, this is commonplace, and not only in functional programming. It's good practice in your code to separate the interface to your code (in this case, that means the function signature: what arguments you have to pass) from the details of the implementation (the need to have a counter or similar in recursive code).
In real-world programming, one manifestation of this is having default arguments or multiple overloads of one function. Another common way of doing this is returning or taking an instance of an interface instead of a particular class that implements that interface. In Java, this might mean returning a List from a method instead of ArrayList, even when you know that the code actually uses an ArrayList (where ArrayList implements the List interface). In Haskell, typeclasses often serve the same function.
The "one argument which always should be zero at the start" pattern happens occasionally in the real world, but it's especially common in functional programming teaching, because you want to show how to write the same function in a recursive style vs. tail-recursive. The wrapper function is also important to demonstrate that both the implementations actually have the same result.
In Haskell, it's more common to use where as follows:
functionEx :: String -> Int
functionEx s = functionExA s 0 where
functionExA s n = --function code
This way, even the existence of the "real" function is hidden from the external interface. There's no reason to expose the fact that this function is (say) tail-recursive with a count argument.
If the special case definition is used frequently, it can be an advantage to do this. For example, the sum function is just a special case of the fold function. So why don't we just use foldr (+) 0 [1, 2, 3] each time instead of sum [1,2,3]? Because sum is much more readable.

Practical difference between def f(x: Int) = x+1 and val f = (x: Int) => x+1 in Scala

I'm new to Scala and I'm having a problem understanding this. Why are there two syntaxes for the same concept, and none of them more efficient or shorter at that (merely from a typing standpoint, maybe they differ in behavior - which is what I'm asking).
In Go the analogues have a practical difference - you can't forward-reference the lambda assigned to a variable, but you can reference a named function from anywhere. Scala blends these two if I understand it correctly: you can forward-reference any variable (please correct me if I'm wrong).
Please note that this question is not a duplicate of What is the difference between “def” and “val” to define a function.
I know that def evaluates the expression after = each time it is referenced/called, and val only once. But this is different because the expression in the val definition evaluates to a function.
It is also not a duplicate of Functions vs methods in Scala.
This question concerns the syntax of Scala, and is not asking about the difference between functions and methods directly. Even though the answers may be similar in content, it's still valuable to have this exact point cleared up on this site.
There are three main differences (that I know of):
1. Internal Representation
Function expressions (aka anonymous functions or lambdas) are represented in the generated bytecode as instances of any of the Function traits. This means that function expressions are also objects. Method definitions, on the other hand, are first class citizens on the JVM and have a special bytecode representation. How this impacts performance is hard to tell without profiling.
2. Reference Syntax
References to functions and methods have different syntaxes. You can't just say foo when you want to send the reference of a method as an argument to some other part of your code. You'll have to say foo _. With functions you can just say foo and things will work as intended. The syntax foo _ is effectively wrapping the call to foo inside an anonymous function.
3. Generics Support
Methods support type parametrization, functions do not. For example, there's no way to express the following using a function value:
def identity[A](a: A): A = a
The closest would be this, but it loses the type information:
val identity = (a: Any) => a
As an extension to Ionut's first point, it may be worth taking a quick look at http://www.scala-lang.org/api/current/#scala.Function1.
From my understanding, an instance of a function as you described (ie.
val f = (x: Int) => x + 1) extends the Function1 class. The implications of this are that an instance of a function consumes more memory than defining a method. Methods are innate to the JVM, hence they can be determined at compile time. The obvious cost of a Function is its memory consumption, but with it come added benefits such as composition with other Function objects.
If I understand correctly, the reason defs and lambdas can work together is because the Function class has a self-type (T1) ⇒ R which is implied by its apply() method https://github.com/scala/scala/blob/v2.11.8/src/library/scala/Function1.scala#L36. (At least I THINK that's what going on, please correct me if I'm wrong). This is all just my own speculation, however. There's certain to be some extra compiler magic taking place underneath to allow method and function interoperability.

Concrete (code-) examples for benefits of dynamic programming languages

I am currently working on the design of a controlled experiment where I
hope to measure a benefit of dynamically typed programming languages
compared to statically typed ones.
I am not looking for another "which one is better"- debate here, as there
are enough discussions on this topic (e.g.
Dynamic type languages versus static type languages
or
What do people find so appealing about dynamic languages?).
I also asked the same question on LtU, which ended up in another
discussion. :-)
All these discussions mention some good points, but nearly all are missing
some concrete code(!)-examples which proves their point.
So my question is: Can anyone of you give me some example code which
directly reveals some benefits of a dynamically typed language or shows a
situation where a static type system is more an obstacle than a helpful
tool?
Thanks in advance.
by the definition: duck typing. but you want more specific example so, let's say: building objects. in statically typed languages you need to declare type or use built in methods for runtime code generation. in dynamic languages you can just build objects ad hoc. for example to create a mock in groovy you can just write the one method you need to mock and pass it as an object: http://groovy.codehaus.org/Developer+Testing+using+Closures+instead+of+Mocks
in static languages people write big and complex libraries to make it easier. and still it's verbose and often has limitations
There was a time, not long ago, where these were the kind of choices we had to make: Do I want it to be fast and easy to write the program, but sometimes there are more runtime bugs? Or do I want to be protected from more of those bugs at compiletime, but at the cost of having to waste more time writing more boilerplate?
That's over now. Dynamic languages were extremely exciting back when they were the only way to avoid adding a lot more bureaucracy to your code, but dynamic typing is now being obsoleted by type inference. So you're actually right -- dynamic typing doesn't add value anymore, now that more powerful strong type systems are catching on.
Huh? Where? How?
Languages like Haskell, the MLs, Scala, Rust, and some of the .NET family combine strong typing with type inference so you get the best of both: everything has a strong, enforced type, but you don't have to actually write it down unless the compiler can't figure it out, which it usually can.
Don't ignore them for being academic. Things have changed. Those kinds of languages with powerful typing systems have been steadily getting much more popular over the last few years. (2) And, I'm finding it more and more rare to hear about a brand new language these days that doesn't sport a combination of strong typing for safety with type inference for convenience, at least to some degree. Today's brand new esoteric languages are next decade's "enterprise standards" so I think everyone will be using this best-of-both-worlds approach eventually.
Even old man Java is moving slowly in that direction! (examples: diamond operator; inferring the functional interface for lambda expressions)
Learning dynamic languages for the first time now is too late, it's like if throughout the 1990s you were putting off replacing your tape deck with a CD player and then finally around 2002 you get around to buying a CD player.
EDIT: I reread your question. I sympathize with your desire for concrete code. Here's a Rosetta stone. A generic procedure to check whether a list is sorted or not, and code that calls it. The code that calls it does the right thing every day and the wrong thing on leap days, as an example of how compiletime errors win big over runtime errors when you have a bug in rarely-executed code. Code is untested, just a sketch. In particular, I'm still learning Haskell (hence the zeal of the newly converted... :P) and I haven't used Python as heavily as I used to for a couple years, so forgive
Traditional static typing:
// Java 7 - strong typing but no type inference
// Notice that you get safety, because the wrong usage is caught at compile time
// But you have to type a lot :-(
static interface Comparable { boolean compare(Comparable other); }
...
boolean isListSorted( List<T extends Comparable> xs ) {
T prev = null; for( T x: xs ) {
if( prev!=null && !prev.compare(xs) ) return false;
prev = x;
}
return true;
}
...
public final class String implement Comparable { ... }
public final class Animal /* definitely does not implement Comparable! :-) */ { ... }
...
// proper usage
List<String> names = Lists.newArrayListOf("Red", "Spinelli", "Ankh", "Morporkh", "Sam");
boolean areTheNamesSorted = isListSorted(names);
if(todayIsALeapDay()) {
// bad usage -- the compiler catches it, so you don't have to worry about this happening
// in production when a user enters data that send your app down a rarely-touched codepath
List<Animal> pets = Lists.newArrayListOf( Animal.getCat(), Animal.getDog(), Animal.getBird() );
boolean areTheNamesSorted = isListSorted(pets);
}
Dynamic typing:
class Animal:
...
# Python 2.6 -- dynamic typing
# notice how little keystrokes are needed
# but the bug is now
def isListSorted(xs):
# to keep it more similar to the Java7 version, zip is avoied
prev = None
for x in xs:
if prev is not None and not prev.compare(x): return False
prev = x
return True
...
# usage -- beautiful, isn't it?
names = ["Raph", "Argh", "Marge"]
areNamesSorted = isListSorted(names)
if isLeapDayToday():
# ...but here comes trouble in paradise!
animals = [Animal.getCat(), Animal.getDog()]
# raises a runtime exception. I hope your unit tests catch it, but unlike type
# systems it's hard to make absolutely sure you'll be alerted if you forget a test
# besides, then you've just traded the Evil of Having to Write Lots of Type Signatures
# for the Evil of Having to Write Lots of Unit Tests. What kind of terrible deal is that?
areAnimalsSorted = isListSorted(animals)
Powerful typing:
-- Haskell - full safety of strong typing, but easy on the fingers and eyes
class Comparable a where
compare :: a -> a -> Bool
end
instance Comparable String where
...
end
data Animal = ...
isListSorted [] = True
isListSorted x:[] = True
isListSorted x1:x2:xs = (compare x1 x2) && (isListSorted xs)
names = ["Raplph", "Argh", "Blarge"]
areNamesSorted = isListSorted names
animals = [Cat, Dog, Parrot]
-- compile time error because [Animal] is not compatible with Comparable a => [a] since
-- Animal is not an instance of comparable
areAnimalsSorted = isListSorted animals
Verbose powerful typing:
-- Same Haskell program as before, but we explicitly write down the types
-- Just for educational purposes. Like comments, you can omit them if you don't think
-- the reader needs them because it's obvious. Unlike comments, the compiler verifies their truth!
class Comparable a where
compare :: a -> a -> Bool
end
instance Comparable String where
...
end
data Animal = Cat | Dog | Parrot
isListSorted :: Ord a => [a] -> Bool
isListSorted [] = True
isListSorted x:[] = True
isListSorted x1:x2:xs = (compare x1 x2) && (isListSorted xs)
names :: [String]
names = ["Raplph", "Argh", "Blarge"]
areNamesSorted = isListSorted names
-- compile time error because [Animal] is not compatible with Comparable a => [a] since
-- Animal is not an instance of comparable
animals :: [Animal]
animals = [Cat, Dog, Parrot]
areAnimalsSorted = isListSorted animals

How to override a structure constructor in fortran

Is it currently possible to override the structure constructor in Fortran? I have seen proposed examples like this (such as in the Fortran 2003 spec):
module mymod
type mytype
integer :: x
! Other stuff
end type
interface mytype
module procedure init_mytype
end interface
contains
type(mytype) function init_mytype(i)
integer, intent(in) :: i
if(i > 0) then
init_mytype%x = 1
else
init_mytype%x = 2
end if
end function
end
program test
use mymod
type(mytype) :: x
x = mytype(0)
end program
This basically generates a heap of errors due to redundant variable names (e.g. Error: DERIVED attribute of 'mytype' conflicts with PROCEDURE attribute at (1)). A verbatim copy of the fortran 2003 example generates similar errors. I've tried this in gfortran 4.4, ifort 10.1 and 11.1 and they all produce the same errors.
My question: is this just an unimplemented feature of fortran 2003? Or am I implementing this incorrectly?
Edit: I've come across a bug report and an announced patch to gfortran regarding this issue. However, I've tried using a November build of gcc46 with no luck and similar errors.
Edit 2: The above code appears to work using Intel Fortran 12.1.0.
Is it currently possible to override the structure constructor in Fortran?
No. Anyway even using your approach is completely not about constructor overriding. The main reason is that structure constructor # OOP constructor. There is some similarity but this is just another idea.
You can not use your non-intrinsic function in initialization expression. You can use only constant, array or structure constructor, intrinsic functions, ... For more information take a look at 7.1.7 Initialization expression in Fortran 2003 draft.
Taking that fact into account I completely do not understand what is the real difference between
type(mytype) :: x
x = mytype(0)
and
type(mytype) :: x
x = init_mytype(0)
and what is the whole point of using INTERFACE block inside mymod MODULE.
Well, honestly speaking there is a difference, the huge one - the first way is misleading. This function is not the constructor (because there are no OOP constructors at all in Fortran), it is an initializer.
In mainstream OOP constructor is responsible for sequentially doing two things:
Memory allocation.
Member initialization.
Let's take a look at some examples of instantiating classes in different languages.
In Java:
MyType mt = new MyType(1);
a very important fact is hidden - the fact the object is actually a pointer to a varibale of a class type. The equivalent in C++ will be allocation on heap using:
MyType* mt = new MyType(1);
But in both languages one can see that two constructor duties are reflected even at syntax level. It consists of two parts: keyword new (allocation) and constructor name (initialization). In Objective-C syntax this fact is even more emphasized:
MyType* mt = [[MyType alloc] init:1];
Many times, however, you can see some other form of constructor invocation. In the case of allocation on stack C++ uses special (very poor) syntax construction
MyType mt(1);
which is actually so misleading that we can just not consider it.
In Python
mt = MyType(1)
both the fact the object is actually a pointer and the fact that allocation take place first are hidden (at syntax level). And this method is called ... __init__! O_O So misleading. С++ stack allocation fades in comparison with that one. =)
Anyway, the idea of having constructor in the language imply the ability to do allocation an initialization in one statement using some special kind of method. And if you think that this is "true OOP" way I have bad news for you. Even Smalltalk doesn't have constructors. It just a convention to have a new method on classes themselves (they are singleton objects of meta classes). The Factory Design Pattern is used in many other languages to achieve the same goal.
I read somewhere that concepts of modules in Fortran was inspired by Modula-2. And it seems for me that OOP features are inspired by Oberon-2. There is no constructors in Oberon-2 also. But there is of course pure allocation with predeclared procedure NEW (like ALLOCATE in Fortran, but ALLOCATE is statement). After allocation you can (should in practice) call some initializer, which is just an ordinary method. Nothing special there.
So you can use some sort of factories to initialize objects. It's what you actually did using modules instead of singleton objects. Or it's better to say that they (Java/C#/... programmers) use singleton objects methods instead of ordinary functions due to the lack of the later one (no modules - no way to have ordinary functions, only methods).
Also you can use type-bound SUBROUTINE instead.
MODULE mymod
TYPE mytype
PRIVATE
INTEGER :: x
CONTAINS
PROCEDURE, PASS :: init
END TYPE
CONTAINS
SUBROUTINE init(this, i)
CLASS(mytype), INTENT(OUT) :: this
INTEGER, INTENT(IN) :: i
IF(i > 0) THEN
this%x = 1
ELSE
this%x = 2
END IF
END SUBROUTINE init
END
PROGRAM test
USE mymod
TYPE(mytype) :: x
CALL x%init(1)
END PROGRAM
INTENT(OUT) for this arg of init SUBROUTINE seems to be fine. Because we expect this method to be called only once and right after allocation. Might be a good idea to control that this assumption will not be wrong. To add some boolean flag LOGICAL :: inited to mytype, check if it is .false. and set it to .true. upon first initialization, and do something else on attempt to re-initialization. I definitely remember some thread about it in Google Groups... I can not find it.
I consulted my copy of the Fortran 2008 standard. That does allow you to define a generic interface with the same name as a derived type. My compiler (Intel Fortran 11.1) won't compile the code though so I'm left suspecting (without a copy of the 2003 standard to hand) that this is an as-yet-unimplemented feature of the Fortran 2003 standard.
Besides that, there is an error in your program. Your function declaration:
type(mytype) function init_mytype
integer, intent(in) :: i
specifies the existence and intent of an argument which is not present in the function specification, which should perhaps be rewritten as:
type(mytype) function init_mytype(i)

Conditional logging with minimal cyclomatic complexity

After reading "What’s your/a good limit for cyclomatic complexity?", I realize many of my colleagues were quite annoyed with this new QA policy on our project: no more 10 cyclomatic complexity per function.
Meaning: no more than 10 'if', 'else', 'try', 'catch' and other code workflow branching statement. Right. As I explained in 'Do you test private method?', such a policy has many good side-effects.
But: At the beginning of our (200 people - 7 years long) project, we were happily logging (and no, we can not easily delegate that to some kind of 'Aspect-oriented programming' approach for logs).
myLogger.info("A String");
myLogger.fine("A more complicated String");
...
And when the first versions of our System went live, we experienced huge memory problem not because of the logging (which was at one point turned off), but because of the log parameters (the strings), which are always calculated, then passed to the 'info()' or 'fine()' functions, only to discover that the level of logging was 'OFF', and that no logging were taking place!
So QA came back and urged our programmers to do conditional logging. Always.
if(myLogger.isLoggable(Level.INFO) { myLogger.info("A String");
if(myLogger.isLoggable(Level.FINE) { myLogger.fine("A more complicated String");
...
But now, with that 'can-not-be-moved' 10 cyclomatic complexity level per function limit, they argue that the various logs they put in their function is felt as a burden, because each "if(isLoggable())" is counted as +1 cyclomatic complexity!
So if a function has 8 'if', 'else' and so on, in one tightly-coupled not-easily-shareable algorithm, and 3 critical log actions... they breach the limit even though the conditional logs may not be really part of said complexity of that function...
How would you address this situation ?
I have seen a couple of interesting coding evolution (due to that 'conflict') in my project, but I just want to get your thoughts first.
Thank you for all the answers.
I must insist that the problem is not 'formatting' related, but 'argument evaluation' related (evaluation that can be very costly to do, just before calling a method which will do nothing)
So when a wrote above "A String", I actually meant aFunction(), with aFunction() returning a String, and being a call to a complicated method collecting and computing all kind of log data to be displayed by the logger... or not (hence the issue, and the obligation to use conditional logging, hence the actual issue of artificial increase of 'cyclomatic complexity'...)
I now get the 'variadic function' point advanced by some of you (thank you John).
Note: a quick test in java6 shows that my varargs function does evaluate its arguments before being called, so it can not be applied for function call, but for 'Log retriever object' (or 'function wrapper'), on which the toString() will only be called if needed. Got it.
I have now posted my experience on this topic.
I will leave it there until next Tuesday for voting, then I will select one of your answers.
Again, thank you for all the suggestions :)
With current logging frameworks, the question is moot
Current logging frameworks like slf4j or log4j 2 don't require guard statements in most cases. They use a parameterized log statement so that an event can be logged unconditionally, but message formatting only occurs if the event is enabled. Message construction is performed as needed by the logger, rather than pre-emptively by the application.
If you have to use an antique logging library, you can read on to get more background and a way to retrofit the old library with parameterized messages.
Are guard statements really adding complexity?
Consider excluding logging guards statements from the cyclomatic complexity calculation.
It could be argued that, due to their predictable form, conditional logging checks really don't contribute to the complexity of the code.
Inflexible metrics can make an otherwise good programmer turn bad. Be careful!
Assuming that your tools for calculating complexity can't be tailored to that degree, the following approach may offer a work-around.
The need for conditional logging
I assume that your guard statements were introduced because you had code like this:
private static final Logger log = Logger.getLogger(MyClass.class);
Connection connect(Widget w, Dongle d, Dongle alt)
throws ConnectionException
{
log.debug("Attempting connection of dongle " + d + " to widget " + w);
Connection c;
try {
c = w.connect(d);
} catch(ConnectionException ex) {
log.warn("Connection failed; attempting alternate dongle " + d, ex);
c = w.connect(alt);
}
log.debug("Connection succeeded: " + c);
return c;
}
In Java, each of the log statements creates a new StringBuilder, and invokes the toString() method on each object concatenated to the string. These toString() methods, in turn, are likely to create StringBuilder instances of their own, and invoke the toString() methods of their members, and so on, across a potentially large object graph. (Before Java 5, it was even more expensive, since StringBuffer was used, and all of its operations are synchronized.)
This can be relatively costly, especially if the log statement is in some heavily-executed code path. And, written as above, that expensive message formatting occurs even if the logger is bound to discard the result because the log level is too high.
This leads to the introduction of guard statements of the form:
if (log.isDebugEnabled())
log.debug("Attempting connection of dongle " + d + " to widget " + w);
With this guard, the evaluation of arguments d and w and the string concatenation is performed only when necessary.
A solution for simple, efficient logging
However, if the logger (or a wrapper that you write around your chosen logging package) takes a formatter and arguments for the formatter, the message construction can be delayed until it is certain that it will be used, while eliminating the guard statements and their cyclomatic complexity.
public final class FormatLogger
{
private final Logger log;
public FormatLogger(Logger log)
{
this.log = log;
}
public void debug(String formatter, Object... args)
{
log(Level.DEBUG, formatter, args);
}
… &c. for info, warn; also add overloads to log an exception …
public void log(Level level, String formatter, Object... args)
{
if (log.isEnabled(level)) {
/*
* Only now is the message constructed, and each "arg"
* evaluated by having its toString() method invoked.
*/
log.log(level, String.format(formatter, args));
}
}
}
class MyClass
{
private static final FormatLogger log =
new FormatLogger(Logger.getLogger(MyClass.class));
Connection connect(Widget w, Dongle d, Dongle alt)
throws ConnectionException
{
log.debug("Attempting connection of dongle %s to widget %s.", d, w);
Connection c;
try {
c = w.connect(d);
} catch(ConnectionException ex) {
log.warn("Connection failed; attempting alternate dongle %s.", d);
c = w.connect(alt);
}
log.debug("Connection succeeded: %s", c);
return c;
}
}
Now, none of the cascading toString() calls with their buffer allocations will occur unless they are necessary! This effectively eliminates the performance hit that led to the guard statements. One small penalty, in Java, would be auto-boxing of any primitive type arguments you pass to the logger.
The code doing the logging is arguably even cleaner than ever, since untidy string concatenation is gone. It can be even cleaner if the format strings are externalized (using a ResourceBundle), which could also assist in maintenance or localization of the software.
Further enhancements
Also note that, in Java, a MessageFormat object could be used in place of a "format" String, which gives you additional capabilities such as a choice format to handle cardinal numbers more neatly. Another alternative would be to implement your own formatting capability that invokes some interface that you define for "evaluation", rather than the basic toString() method.
In Python you pass the formatted values as parameters to the logging function. String formatting is only applied if logging is enabled. There's still the overhead of a function call, but that's minuscule compared to formatting.
log.info ("a = %s, b = %s", a, b)
You can do something like this for any language with variadic arguments (C/C++, C#/Java, etc).
This isn't really intended for when the arguments are difficult to retrieve, but for when formatting them to strings is expensive. For example, if your code already has a list of numbers in it, you might want to log that list for debugging. Executing mylist.toString() will take a while to no benefit, as the result will be thrown away. So you pass mylist as a parameter to the logging function, and let it handle string formatting. That way, formatting will only be performed if needed.
Since the OP's question specifically mentions Java, here's how the above can be used:
I must insist that the problem is not 'formatting' related, but 'argument evaluation' related (evaluation that can be very costly to do, just before calling a method which will do nothing)
The trick is to have objects that will not perform expensive computations until absolutely needed. This is easy in languages like Smalltalk or Python that support lambdas and closures, but is still doable in Java with a bit of imagination.
Say you have a function get_everything(). It will retrieve every object from your database into a list. You don't want to call this if the result will be discarded, obviously. So instead of using a call to that function directly, you define an inner class called LazyGetEverything:
public class MainClass {
private class LazyGetEverything {
#Override
public String toString() {
return getEverything().toString();
}
}
private Object getEverything() {
/* returns what you want to .toString() in the inner class */
}
public void logEverything() {
log.info(new LazyGetEverything());
}
}
In this code, the call to getEverything() is wrapped so that it won't actually be executed until it's needed. The logging function will execute toString() on its parameters only if debugging is enabled. That way, your code will suffer only the overhead of a function call instead of the full getEverything() call.
In languages supporting lambda expressions or code blocks as parameters, one solution for this would be to give just that to the logging method. That one could evaluate the configuration and only if needed actually call/execute the provided lambda/code block.
Did not try it yet, though.
Theoretically this is possible. I would not like to use it in production due to performance issues i expect with that heavy use of lamdas/code blocks for logging.
But as always: if in doubt, test it and measure the impact on cpu load and memory.
Thank you for all your answers! You guys rock :)
Now my feedback is not as straight-forward as yours:
Yes, for one project (as in 'one program deployed and running on its own on a single production platform'), I suppose you can go all technical on me:
dedicated 'Log Retriever' objects, which can be pass to a Logger wrapper only calling toString() is necessary
used in conjunction with a logging variadic function (or a plain Object[] array!)
and there you have it, as explained by #John Millikin and #erickson.
However, this issue forced us to think a little about 'Why exactly we were logging in the first place ?'
Our project is actually 30 different projects (5 to 10 people each) deployed on various production platforms, with asynchronous communication needs and central bus architecture.
The simple logging described in the question was fine for each project at the beginning (5 years ago), but since then, we has to step up. Enter the KPI.
Instead of asking to a logger to log anything, we ask to an automatically created object (called KPI) to register an event. It is a simple call (myKPI.I_am_signaling_myself_to_you()), and does not need to be conditional (which solves the 'artificial increase of cyclomatic complexity' issue).
That KPI object knows who calls it and since he runs from the beginning of the application, he is able to retrieve lots of data we were previously computing on the spot when we were logging.
Plus that KPI object can be monitored independently and compute/publish on demand its information on a single and separate publication bus.
That way, each client can ask for the information he actually wants (like, 'has my process begun, and if yes, since when ?'), instead of looking for the correct log file and grepping for a cryptic String...
Indeed, the question 'Why exactly we were logging in the first place ?' made us realize we were not logging just for the programmer and his unit or integration tests, but for a much broader community including some of the final clients themselves. Our 'reporting' mechanism had to be centralized, asynchronous, 24/7.
The specific of that KPI mechanism is way out of the scope of this question. Suffice it to say its proper calibration is by far, hands down, the single most complicated non-functional issue we are facing. It still does bring the system on its knee from time to time! Properly calibrated however, it is a life-saver.
Again, thank you for all the suggestions. We will consider them for some parts of our system when simple logging is still in place.
But the other point of this question was to illustrate to you a specific problem in a much larger and more complicated context.
Hope you liked it. I might ask a question on KPI (which, believe or not, is not in any question on SOF so far!) later next week.
I will leave this answer up for voting until next Tuesday, then I will select an answer (not this one obviously ;) )
Maybe this is too simple, but what about using the "extract method" refactoring around the guard clause? Your example code of this:
public void Example()
{
if(myLogger.isLoggable(Level.INFO))
myLogger.info("A String");
if(myLogger.isLoggable(Level.FINE))
myLogger.fine("A more complicated String");
// +1 for each test and log message
}
Becomes this:
public void Example()
{
_LogInfo();
_LogFine();
// +0 for each test and log message
}
private void _LogInfo()
{
if(!myLogger.isLoggable(Level.INFO))
return;
// Do your complex argument calculations/evaluations only when needed.
}
private void _LogFine(){ /* Ditto ... */ }
In C or C++ I'd use the preprocessor instead of the if statements for the conditional logging.
Pass the log level to the logger and let it decide whether or not to write the log statement:
//if(myLogger.isLoggable(Level.INFO) {myLogger.info("A String");
myLogger.info(Level.INFO,"A String");
UPDATE: Ah, I see that you want to conditionally create the log string without a conditional statement. Presumably at runtime rather than compile time.
I'll just say that the way we've solved this is to put the formatting code in the logger class so that the formatting only takes place if the level passes. Very similar to a built-in sprintf. For example:
myLogger.info(Level.INFO,"A String %d",some_number);
That should meet your criteria.
Conditional logging is evil. It adds unnecessary clutter to your code.
You should always send in the objects you have to the logger:
Logger logger = ...
logger.log(Level.DEBUG,"The foo is {0} and the bar is {1}",new Object[]{foo, bar});
and then have a java.util.logging.Formatter that uses MessageFormat to flatten foo and bar into the string to be output. It will only be called if the logger and handler will log at that level.
For added pleasure you could have some kind of expression language to be able to get fine control over how to format the logged objects (toString may not always be useful).
(source: scala-lang.org)
Scala has a annontation #elidable() that allows you to remove methods with a compiler flag.
With the scala REPL:
C:>scala
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.
6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.annotation.elidable
import scala.annotation.elidable
scala> import scala.annotation.elidable._
import scala.annotation.elidable._
scala> #elidable(FINE) def logDebug(arg :String) = println(arg)
logDebug: (arg: String)Unit
scala> logDebug("testing")
scala>
With elide-beloset
C:>scala -Xelide-below 0
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.
6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import scala.annotation.elidable
import scala.annotation.elidable
scala> import scala.annotation.elidable._
import scala.annotation.elidable._
scala> #elidable(FINE) def logDebug(arg :String) = println(arg)
logDebug: (arg: String)Unit
scala> logDebug("testing")
testing
scala>
See also Scala assert definition
As much as I hate macros in C/C++, at work we have #defines for the if part, which if false ignores (does not evaluate) the following expressions, but if true returns a stream into which stuff can be piped using the '<<' operator.
Like this:
LOGGER(LEVEL_INFO) << "A String";
I assume this would eliminate the extra 'complexity' that your tool sees, and also eliminates any calculating of the string, or any expressions to be logged if the level was not reached.
Here is an elegant solution using ternary expression
logger.info(logger.isInfoEnabled() ? "Log Statement goes here..." : null);
Consider a logging util function ...
void debugUtil(String s, Object… args) {
if (LOG.isDebugEnabled())
LOG.debug(s, args);
}
);
Then make the call with a "closure" round the expensive evaluation that you want to avoid.
debugUtil(“We got a %s”, new Object() {
#Override String toString() {
// only evaluated if the debug statement is executed
return expensiveCallToGetSomeValue().toString;
}
}
);