I would like to understand good code optimization methods and methodology.
How do I keep from doing premature optimization if I am thinking about performance already.
How do I find the bottlenecks in my code?
How do I make sure that over time my program does not become any slower?
What are some common performance errors to avoid (e.g.; I know it is bad in some languages to return while inside the catch portion of a try{} catch{} block
For the kinds of optimizations you are suggesting, you should write your code for clarity and and not optimize them until you have proof that they are a bottleneck.
One danger of attempting micro-optimizations like this is that you will likely make things slower, because the compiler is smarter than you are a lot of the time.
Take your "optimization":
const int windowPosX = (screenWidth * 0.5) - (windowWidth * 0.5);
There is no serious compiler in the world that doesn't know that the fastest way to divide by two is to shift right by one. Multiplying by floating-point 0.5 is actually more expensive, because it requires converting to floating-point and back, and doing two multiplies (which are more expensive than shifts).
But don't take my word for it. Look at what the compiler actually does. gcc 4.3.3 on 32-bit Ubuntu (-O3, -msse3, -fomit-frame-pointer) compiles this:
int posx(unsigned int screen_width, unsigned int window_width) {
return (screen_width / 2) - (window_width / 2);
}
to this:
00000040 <posx>:
40: 8b 44 24 04 mov eax,DWORD PTR [esp+0x4]
44: 8b 54 24 08 mov edx,DWORD PTR [esp+0x8]
48: d1 e8 shr eax,1
4a: d1 ea shr edx,1
4c: 29 d0 sub eax,edx
4e: c3
Two shifts (using an immediate operand) and a subtract. Very cheap. On the other hand, it compiles this:
int posx(unsigned int screen_width, unsigned int window_width) {
return (screen_width * 0.5) - (window_width * 0.5);
}
to this:
00000000 <posx>:
0: 83 ec 04 sub esp,0x4
3: 31 d2 xor edx,edx
5: 8b 44 24 08 mov eax,DWORD PTR [esp+0x8]
9: 52 push edx
a: 31 d2 xor edx,edx
c: 50 push eax
d: df 2c 24 fild QWORD PTR [esp]
10: 83 c4 08 add esp,0x8
13: d8 0d 00 00 00 00 fmul DWORD PTR ds:0x0
15: R_386_32 .rodata.cst4
19: 8b 44 24 0c mov eax,DWORD PTR [esp+0xc]
1d: 52 push edx
1e: 50 push eax
1f: df 2c 24 fild QWORD PTR [esp]
22: d8 0d 04 00 00 00 fmul DWORD PTR ds:0x4
24: R_386_32 .rodata.cst4
28: de c1 faddp st(1),st
2a: db 4c 24 08 fisttp DWORD PTR [esp+0x8]
2e: 8b 44 24 08 mov eax,DWORD PTR [esp+0x8]
32: 83 c4 0c add esp,0xc
35: c3 ret
What you're seeing is conversion to floating-point, multiplication by a value from the data segment (which may or may not be in cache), and conversion back to integer.
Please think of this example when you're tempted to perform micro-optimizations like this. Not only is it premature, but it might not help at all (in this case it significantly hurt!)
Seriously: don't do it. I think a golden rule is never to do optimizations like this unless you routinely inspect your compiler's output as I have done here.
Don't think about performance, think about clarity and correctness.
Use a profiler.
Keep using a profiler.
See 1.
EDIT This answer originally appeared in another question (that has been merged) where the OP listed some possible optimizations techniques that he just assumed must work. All of them relied heavily on assumptions (such as x << 1 being always faster than x * 2). Below I am trying to point out the danger of such assumptions.
Since all of your points are probably wrong this shows the danger of such premature and trivial optimizations. Leave such decisions to the compiler, unless you know very well what you are doing and that it matters.
Otherwise it – just – doesn’t – matter.
Much more important (and not at all premature) are optimizations in the general program structure. For example, it is probably very bad to re-generate the same large batch of data over and over because it’s needed in many places. Instead, some thought has to be put into the design to allow sharing this data and thus only calculating it once.
It’s also very important to know the domain you’re working in. I come from a bioinformatics background and do a lot of hardcore algorithm work in C++. I often deal with huge amount of data. At the moment though, I’m creating an application in Java and I cringe every time I create a copy of a container because I’m conditioned to avoid such operations as hell. But for my fancy Java GUI those operations are completely trivial and not one bit noticeable to the user. Oh well, I’ve just got to get over myself.
By the way:
Initialise constants when you declare them (if you can) …
Well, in many languages (e.g. C++) constants (i.e. identifiers marked as const) have to be initalized upon definition so you don’t actually have a choice in the matter. However, it is a good idea to follow that rule, not only for constants but in general. The reason isn’t necessarily performance, though. It’s just much cleaner code because it clearly binds each identifier to a purpose instead of letting it fly around.
The rules of Optimization Club:
The first rule of Optimization Club is, you do not Optimize.
The second rule of Optimization Club is, you do not Optimize without measuring.
If your app is running faster than the underlying transport protocol, the optimization is over.
One factor at a time.
No marketroids, no marketroid schedules.
Testing will go on as long as it has to.
If this is your first night at Optimization Club, you have to write a test case.
http://xoa.petdance.com/Rules_of_Optimization_Club
Rule #3 is the one that trips most people up. It doesn't matter how fast your calculations are if your program sits waiting for disk writes or network transfer.
Rules #6 and #7: Always have tests. If you're optimizing, you're refactoring, and you don't want to be refactoring without having a solid test suite.
It's always good to remember what things "cost". Some C# examples:
String concatenation always creates a new string as strings are immutable. Therefore, for repeated concatenations, a StringBuilder is more efficient.
Repeated or large memory allocations are generally something you should watch for.
Exceptions are very expensive to throw. This is one of the reasons why exceptions should only be used for exceptional situations.
Most things beyond this are premature optimization. Use a profiler if speed matters.
Regarding your "optimizations":
I doubt that floating point arithmetic (* 0.5) is faster than an integer division (/ 2).
If you need an array of size 300 you should initialize an array of size 300. There's nothing "magic" about powers of 2 that make arrays of size 256 more efficient.
"requires 2 calls in the code" is wrong.
Make sure you have clearly defined performance goals and tests that measure against those goals so you can quickly find out if you even have a problem.
Think about performance more from a design perspective than from a coding perspective - optimizing a poor-performing design just results in faster slow code
When you do have a performance problem, use a tool such as a profiler to identify the problem - you can guess where your bottlenecks are and usually guess wrong.
Fix performance problems early in development rather than putting them off - as time goes on and features make it into the product fixing perf issues will only become more and more difficult.
We had a subcontractor write us a non-trivial amount of code for non-trivial program. Apparently they always used trivial amount of data. So..
Use profiler
Use non-trivial amount of data when testing. If possible make that humongous amount of data.
Use sane algorithms when things become tight.
Use profiler to check that whatever "optimization" you've done is actually correct, f.example recent "java jar" fiasco where O(1) operation was done as O(n).
The number one rule I use is, DRY (Don't Repeat Yourself). I find that this rule does a good job of highlighting problem areas that can be fixed without hurting the clarity of the program. It also makes it easier to fix bottlenecks once you discover them.
Early optimzation isn't always premature - it's bad only if you hurt other interests (readability, maintenance, time to implement, code size, ...) without justification.
On stackoverflow, early optimization is the new goto, don't get discouraged by that. Any decision going wrong early is hard to fix later. Optimization is special only because experience shows it can often be fixed locally, whereas sucky code requires large scale changes.
Sorry for the rant, now for your actual question:
Know your environment!
This includes all the low level details - - e.g. nonlinearity memory access, things the compiler can optimize, etc. The trick is notto fret at's a lot you shouldn't fret to much, just be aware of.
Measure measure measure!
The results of actual optimization attempts are often surprising, especially if you vary seemingly unlrelated factors. It is also the best way to develop a relaxed attitude towards performance - most of the time it really doesn't matter.
Think about algorithms before you think about implementation details.
Most low level optimizations give you a factor of 1.1, a different algorithm can give you a factor of 10. A good (!) caching strategy can give you a factor of 100. Figuring out that you really don't need to make the call gives you Warp 10.
This usually leads me to thinking about how to organize the data: what are frequent operations that are potential bottlenecks, or scalability issues?
Write readable code showing intent. Do not microoptimize - you cannot outsmart the JIT.
Learn to use a profiler, e.g. jvisualvm in the Sun JDK, and use it.
I would say that little optimisations you can make on the go are exactly those that do not make much sense. If you want to optimise, write the code first, then profile and only then optimise the parts that take too much time. And even then it’s usually algorithms that need optimization, not the actual code.
I'd even say for integral types, instead of multiplying by 0.5 you could just shift them one bit to the right (not forgetting signed/unsigned shifting).
I am quite sure that, at least in the case of C#, the compiler will optimize a lot.
for example:
Guid x = Guid.NewGuid();
as well as
Guid x; x = Guid.NewGuid();
both translate to the following CIL:
call System.Guid.NewGuid
stloc.0
Constant expressions such as (4 + 5) are precalculated as well as string concatenations like "Hello " + "World".
I would primarily concentrate on readable and maintainable code though. It's highly unlikely that such microoptimizations make a big difference except for special edge cases.
In my case, the most gain in (sometimes perceived) performance were things like the following:
If you fetch data from a database or the internet, use a separate thread
If you load large XML Files, do it on a separate thread.
of course, that's only my experience from C#, but in-/output operations are common bottlenecks.
This takes experience, I'm afraid. When you conceive of solutions to a problem, you may think in terms of class hierarchies, or you may think in terms of what information goes in, what comes out, how long does it need to be persistent in between. I recommend the latter.
In any case, what people have said is mostly good advice - keep it clean and simple, and get rid of performance problems as they come in, because they will come in.
Where I part company is I don't find measurement very helpful for locating performance problems compared to this method.
But whatever method you use, hopefully experience will teach you what NOT to do in developing software. I've been solving performance problems for a long time, and these days, the single most popular performance killer is galloping generality. Nobody likes to hear their favorite beliefs questioned, but time after time, especially in big software, what is killing performance is using bazookas to swat flies.
Oddly enough, the reason often given for this over-design is guess what? Performance.
In whatever venue you may have learned to program, chances are you've learned all about academic things like sophisticated data structures, abstract class hierarchies, tricky compiler optimization techniques - all the latest stuff that's fun and interesting to know, and that I like as much as anybody. What they didn't teach you is when to use it, which is almost never.
So what I recommend you do is: Get the experience. It is the best teacher.
There are certainly optimizations you can do as you go, such as passing large objects by const reference in C++ (C++ Optimization Strategies and Techniques). The first suggestion ("don't divide by 2") might fall under a "strategy that bombs" - assuming certain operations are faster than others.
One premature (?) optimization I'm guilty of is moving declarations of expensive objects out of loops. E.g if a loop would require a fresh vector for each iteration, I often do:
std::vector<T> vec;
while (x) {
vec.clear(); //instead of declaring the vector here
...
}
I once made a vector used locally in a member function static as an optimization (to reduce memory allocations - this function was called very often), but that bombed when I decided I'd gain real performance from using more than one of these objects in multiple threads.
I don't know if you will buy it, but early optimization is not the root of all evil
Profile, profile, profile. Use valgrind if you can (along with the kcachegrind visualizer), otherwise use the venerable gprof.
My top performance hits:
Allocating memory without freeing it. Possible only using C and C++.
Allocating memory.
Calls to really small procedures, functions, or methods that your compiler somehow fails to inline.
Memory traffic.
Everything else is in the noise.
You may want to look into what compilers DO optimize away - many compilers can optimize away things like tail recursion, and most other minor optimizations are trivial by comparison. My methodology is to write things so that they're as readable/manageable as possible, and then, if I need to, look so see if the generated assembly code needs optimization of some sort. This way no time needs to be spent optimizing things that don't need to be optimized.
1, 2, and 3 have the same answer: Profile. Get a good profiler and run it on your app, both in intrusive and in sampling modes. This will show you where your bottlenecks are, how severe they are, and doing it on a regular basis will show you where perf has gotten worse from week to week.
You can also simply put stopwatches into your app so that it tells you, say, exactly how many seconds it takes to load a file; you'll notice if that number gets bigger, especially if you log it.
4 is a big, big question that ranges all the way from high-level algorithm design down to tiny details of a particular CPU's pipeline. There's lots of resources out there, but always start at the high level -- take your outer loop from O(N^2) to O(N log N) before you start to worry about integer opcode latency and the like.
Only optimize when you have performance issues.
Only optimize the slow parts, as measured!
Finding a better algorithm can save you orders of magnitude, rather than a few percent.
It's mentioned above, but it's worth talking about more: measure! You must measure to make sure you're optimizing the right thing. You must measure to know if you've improved, or improved enough, and by how much. Record your measurements!
Also, often you will identify a routine as taking, say, >75% of the total time. It's worth taking the time to profile at a finer grain... often you will find most of the time within that routine is spent in a very small part of the code.
How do I keep from doing premature optimization if I am thinking about performance already.
How important is the optimization, and how will it affect readbility and maintainability?
How do I find the bottlenecks in my code?
Walk through its execution flow in your mind. Understand the costs of the steps taken. As you go, evaluate the existing implementation in this context. You can also walk through with a debugger for another perspective. Consider and actually try alternative solutions.
To contradict to popular approaches, profiling after the program's written or once there is a problem is naive -- it's like adding sauce to a poorly prepared meal to mask unpleasantries. It can also be likened to the difference between the person who always asks for solutions rather than actually determining the reason (and learning why in the process). If you have implemented a program, then spent time profiling, then made the easy fixes, and made it 20% faster in the process… that's not usually a "good" implementation if performance and resource utilization are important because all the small issues that have accumulated will be high noise in the profiler's output. It's not unusual for a good implementation to be 5, 10, or even 25 times better than the casual constructor's implementation.
How do I make sure that over time my program does not become any slower?
That depends on many things. One approach would involve continuous integration and actually executing the program. However, that may be a moving target even in strictly controlled environments. Minimize changes by focusing on creating a good implementation the first time… :)
What are some common performance errors to avoid (e.g.; I know it is bad in some languages to return while inside the catch portion of a try{} catch{} block
I'll add: Multithreading is often used for the wrong reasons. It's used too commonly to exacerbate poor implementations rather than locating existing problems/weaknesses in the existing design.
A summary of code optimization methods (language-independent) on github (ps i'm the author)
Outline:
General Principles (i.e caching, contiguous blocks, lazy loading, etc..)
Low-level (binary formats, arithmetic operations, data allocations, etc..)
Language-independent optimization (re-arranging expressions, code simplifications, loop optimisations, etc..)
Databases (lazy loading, efficient queries, redundancy, etc..)
Web (minimal transactions, compactification, etc..)
References
Related
I've seen quite a few examples where binary numbers are being used in code, like 32,64,128 and so on (for instance, very well known example - minecraft)
I want to ask, does using binary numbers in such high level languages as Java / C++ help anything?
I know assembly and that you would always rather use these because in low level language it overcomplicates things if you go above register limit.
Will programs run any faster/save up more memory if you use binary numbers?
As with most things, "it depends".
In compiled languages, the better compilers will deduce that slow machine instructions can sometimes be done with different faster machine instructions (but only for special values, such as powers of two). Sometimes coders know this and program accordingly. (e.g. multiplying by a power of two is cheap)
Other times, algorithms are suited towards representations involving powers of two (e.g. many divide and conquer algorithms like the Fast Fourier Transform or a merge sort).
Yet other times, it's the most compact way to represent boolean values (like a bitmask).
And on top of that, other times it's more efficiency for memory purposes (typically because it's so fast do to multiply and divide logic with powers of two, the OS/hardware/etc will use cache line / page sizes / etc that are powers of two, so you'd do well to have nice power of two sizes for your important data structures).
And then, on top of that, other times.. programmers are just so used to using powers of two that they simply do it because it seems like a nice number.
There are some benefits of using powers of two numbers in your programs. Bitmasks are one application of this, mainly because bitwise operators (&, |, <<, >>, etc) are incredibly fast.
In C++ and Java, this is done a fair bit- especially with GUI applications. You could have a field of 32 different menu options (such as resizable, removable, editable, etc), and apply each one without having to go through convoluted addition of values.
In terms of raw speedup or any performance improvement, that really depends on the application itself. GUI packages can be huge, so getting any speedup out of those when applying menu/interface options is a big win.
From the title of your question, it sounds like you mean, "Does it make your program more efficient if you write constants in binary?" If that's what you meant, the answer is emphatically, No. The compiler translates all your constants to binary at compile time, so by the time the program runs, it makes no difference. I don't know if the compiler can interpret binary constants faster than decimal, but the difference would surely be trivial.
But the body of your question seems to indicate that you mean, "use constants that are round number in binary" rather than necessarily expressing them in binary digits.
For most purposes, the answer would be no. If, say, the computer has to add two numbers together, adding a number that happens to be a round number in binary is not going to be any faster than adding a not-round number.
It might be slightly faster for multiplication. Some compilers are smart enough to turn multiplication by powers of 2 into a bit shift operation rather than a hardware multiply, and bit shifts are usually faster than multiplies.
Back in my assembly-language days I often made elements in arrays have sizes that were powers of 2 so I could index into the array with a bit-shift rather than a multiply. But in a high-level language that would be hard to do, as you'd have to do some research to find out just how much space your primitives take in memory, whether the compiler adds padding bytes between them, etc etc. And if you did add some bytes to an array element to pad it out to a power of 2, the entire array is now bigger, and so you might generate an extra page fault, i.e. the operating system runs out of memory and has to write a chunck of your data to the hard drive and then read it back when it needs it. One extra hard drive right takes more time than 1000 multiplications.
In practice, (a) the difference is so trivial that it would almost never be worth worrying about; and (b) you don't normally know everything happenning at the low level, so it would often be hard to predict whether a change with its intendent ramifications would help or hurt.
In short: Don't bother. Use the constant values that are natural to the problem.
The reason they're used is probably different - e.g. bitmasks.
If you see them in array sizes, it doesn't really increase performance, but usually memory is allocated by power of 2. E.g. if you wrote char x[100], you'd probably get 128 allocated bytes.
No, your code will ran the same way, no matter what is the number you use.
If by binary numbers you mean numbers that are power of 2, like: 2, 4, 8, 16, 1024.... they are common due to optimization of space, normally. Example, if you have a 8 bit pointer it is capable of point to 256 (that is a power of 2), addresses, so if you use less than 256 you are wasting your pointer.... so normally you allocate a 256 buffer... this same works for all other power of 2 numbers....
In most cases the answer is almost always no, there is no noticeable performance difference.
However, there are certain cases (very few) when NOT using binary numbers for array/structure sizes/length will give noticeable performance benefits. These are cases when you're filling the cache and because you're looping over a structure that fills the cache in a such a way that you have cache collisions every time you loop through your array/structure. This case is very rare, and shouldn't be preoptimized unless you're having problems with your code performing much more slowly than theoretical limits say it should. Also, this case is very hardware dependent and will change from system to system.
I would like to know how many machine cycles does it take to compare two integers and how many if I add that and which one is easier?
basically i m looking to see which one is more expensive generally ??
Also I need an answer from c, c++, java perspective ....
helps is appreciated thanks!!
The answer is yes. And no. And maybe.
There are machines that can compare two values in their spare time between cycles, and others that need several cycles. On the old PDP8 you first had to negate one operand, do an add, and then test the result to do a compare.
But other machines can compare much faster than add, because no register needs to be modified.
But on still other machines the operations take the same time, but it takes several cycles for the result of the compare to make it to a place where one can test it, so, if you can use those cycles the compare is almost free, but fairly expensive if you have no other operations to shove into those cycles.
The simple answer is one cycle, both operations are equally easy.
A totally generic answer is difficult to give, since processor architectures are amazingly complex when you get down into the details.
All modern processors are pipelined. That is, there are no instructions where the operands go in on cycle c, and the result is available on cycle c+1. Instead, the instruction is broken down into multiple steps.
The instructions are read into the front end of the processor, which decodes the instruction. This may include breaking it down into multiple micro-ops. The operands are then read into registers, and then the execution units handle the actual operation. Eventually the answer is returned back to a register.
The instructions go through one pipeline stage each cycle, and modern CPUs have 10-20 pipeline stages. So it could be upto 20 processor cycles to add or compare two numbers. However, once one instruction has been through one stage of the pipeline, another instruction can be read into that stage. The ideal is that each clock cycle, one instruction goes into the front end, while one set of results comes out the other.
There is massive complexity involved in getting all this to work. If you want to do a+b+c, you need to add a+b before you can add c. So a lot of the work in the front end of the processor involves scheduling. Modern processors employ out-of-order execution, so that the processor will examine the incoming instructions, and re-order them such that it does a+b, then gets on with some other work, and then does result+c once the result is available.
Which all brings us back to the original question of which is easier. Because usually, if you're comparing two integers, it is to make a decision on what to do next. Which means you won't know your next instruction until you've got the result of the last one. Because the instructions are pipelined, this means you can lose 20 clock cycles of work if you wait.
So modern CPUs have a branch predictor which makes a guess what the result will be, and continues executing the instructions. If it guesses wrong, the pipeline has to be thrown out, and work restarted on the other branch. The branch predictor helps enormously, but still, if the comparison is a decision point in the code, that is for more difficult for the CPU to deal with than the addition.
Comparison is done via subtraction, which is almost the same as addition, except that the carry and subtrahend are complemented, so a - b - c becomes a + ~b + ~c. This is already accounted for in the CPU and basically takes the same amount of time either way.
I have a series of functions that are all designed to do the same thing. The same inputs produce the same outputs, but the time that it takes to do them varies by function. I want to determine which one is 'fastest', and I want to have some confidence that my measurement is 'statistically significant'.
Perusing Wikipedia and the interwebs tells me that statistical significance means that a measurement or group of measurements is different from a null hypothesis by a p-value threshold. How would that apply here? What is the null hypothesis between function A being faster than function B?
Once I've got that whole setup defined, how do I figure out when to stop measuring? I'll typically see that a benchmark is run three times, and then the average is reported; why three times and not five or seven? According to this page on Statistical Significance (which I freely admit I do not understand fully), Fisher used 8 as the number of samples that he needed to measure something with 98% confidence; why 8?
I would not bother applying statistics principles to benchmarking results. In general, the term "statistical significance" refers to the likelihood that your results were achieved accidentally, and do not represent an accurate assessment of the true values. In statistics, as a result of simple probability, the likelihood of a result being achieved by chance decreases as the number of measurements increases. In the benchmarking of computer code, it is a trivial matter to increase the number of trials (the "n" in statistics) so that the likelihood of an accidental result is below any arbitrary threshold you care to define (the "alpha" or level of statistical significance).
To simplify: benchmark by running your code a huge number of times, and don't worry about statistical measurements.
Note to potential down-voters of this answer: this answer is somewhat of a simplification of the matter, designed to illustrate the concepts in an accessible way. Comments like "you clearly don't understand statistics" will result in a savage beat-down. Remember to be polite.
You are asking two questions:
How do you perform a test of statistical significance that the mean time of function A is greater than the mean time of function B?
If you want a certain confidence in your answer, how many samples should you take?
The most common answer to the first question is that you either want to compute a confidence interval or perform a t-test. It's not different than any other scientific experiment with random variation. To compute the 95% confidence interval of the mean response time for function A simply take the mean and add 1.96 times the standard error to either side. The standard error is the square root of the variance divided by N. That is,
95% CI = mean +/- 1.96 * sqrt(sigma2/N))
where sigma2 is the variance of speed for function A and N is the number of runs you used to calculate mean and variance.
Your second question relates to statistical power analysis and the design of experiments. You describe a sequential setup where you are asking whether to continue sampling. The design of sequential experiments is actually a very tricky problem in statistics, since in general you are not allowed to calculate confidence intervals or p-values and then draw additional samples conditional on not reaching your desired significance. If you wish to do this, it would be wiser to set up a Bayesian model and calculate your posterior probability that speed A is greater than speed B. This, however, is massive overkill.
In a computing environment it is generally pretty trivial to achieve a very small confidence interval both because drawing large N is easy and because the variance is generally small -- one function obviously wins.
Given that Wikipedia and most online sources are still horrible when it comes to statistics, I recommend buying Introductory Statistics with R. You will learn both the statistics and the tools to apply what you learn.
The research you site sounds more like a highly controlled environment. This is purely a practical answer that has proven itself time and again to be effective for performance testing.
If you are benchmarking code in a modern, multi-tasking, multi-core, computing environment, the number of iterations required to achieve a useful benchmark goes up as the length of time of the operation to be measured goes down.
So, if you have an operation that takes ~5 seconds, you'll want, typically, 10 to 20 iterations. As long as the deviation across the iterations remains fairly constant, then your data is sound enough to draw conclusions. You'll often want to throw out the first iteration or two because the system is typically warming up caches, etc...
If you are testing something in the millisecond range, you'll want 10s of thousands of iterations. This will eliminate noise caused by other processes, etc, firing up.
Once you hit the sub-millisecond range -- 10s of nanoseconds -- you'll want millions of iterations.
Not exactly scientific, but neither is testing "in the real world" on a modern computing system.
When comparing the results, consider the difference in execution speed as percentage, not absolute. Anything less than about 5% difference is pretty close to noise.
Do you really care about statistical significance or plain old significance? Ultimately you're likely to have to form a judgement about readability vs performance - and statistical significance isn't really going to help you there.
A couple of rules of thumb I use:
Where possible, test for enough time to make you confident that little blips (like something else interrupting your test for a short time) won't make much difference. Usually I reckon 30 seconds is enough for this, although it depends on your app. The longer you test for, the more reliable the test will be - but obviously your results will be delayed :)
Running a test multiple times can be useful, but if you're timing for long enough then it's not as important IMO. It would alleviate other forms of error which made a whole test take longer than it should. If a test result looks suspicious, certainly run it again. If you see significantly different results for different runs, run it several more times and try to spot a pattern.
The fundamental question you're trying to answer is how likley is it that what you observe could have happened by chance? Is this coin fair? Throw it once: HEADS. No it's not fair it always comes down heads. Bad conclusion! Throw it 10 times and get 7 Heads, now what do you conclude? 1000 times and 700 heads?
For simple cases we can imagine how to figure out when to stop testing. But you have a slightly different situation - are you really doing a statistical analysis?
How much control do you have of your tests? Does repeating them add any value? Your computer is deterministic (maybe). Eistein's definition of insanity is to repeat something and expect a different outcome. So when you run your tests do you get repeatable answers? I'm not sure that statistical analyses help if you are doing good enough tests.
For what you're doing I would say that the first key thing is to make sure that you really are measuring what you think. Run every test for long enough that any startup or shutdown effects are hidden. Useful performance tests tend to run for quite extended periods for that reason. Make sure that you are not actually measuing the time in your test harness rather than the time in your code.
You have two primary variables: how many iterations of your method to run in one test? How many tests to run?
Wikipedia says this
In addition to expressing the
variability of a population, standard
deviation is commonly used to measure
confidence in statistical conclusions.
For example, the margin of error in
polling data is determined by
calculating the expected standard
deviation in the results if the same
poll were to be conducted multiple
times. The reported margin of error is
typically about twice the standard
deviation.
Hence if your objective is to be sure that one function is faster than another you could run a number of tests of each, compute the means and standard deviations. My expectation is that if your number of iterations within any one test is high then the standard deviation is going to be low.
If we accept that defintion of margin of error, you can see whether the two means are further apart than their total margin's of error.
I'm writing a compiler, and I'm looking for resources on optimization. I'm compiling to machine code, so anything at runtime is out of the question.
What I've been looking for lately is less code optimization and more semantic/high-level optimization. For example:
free(malloc(400)); // should be completely optimized away
Even if these functions were completely inlined, they could eventually call OS memory functions which can never be inlined. I'd love to be able to eliminate that statement completely without building special-case rules into the compiler (after all, malloc is just another function).
Another example:
string Parenthesize(string str) {
StringBuilder b; // similar to C#'s class of the same name
foreach(str : ["(", str, ")"])
b.Append(str);
return b.Render();
}
In this situation I'd love to be able to initialize b's capacity to str.Length + 2 (enough to exactly hold the result, without wasting memory).
To be completely honest, I have no idea where to begin in tackling this problem, so I was hoping for somewhere to get started. Has there been any work done in similar areas? Are there any compilers that have implemented anything like this in a general sense?
To do an optimization across 2 or more operations, you have to understand the
algebraic relationship of those two operations. If you view operations
in their problem domain, they often have such relationships.
Your free(malloc(400)) is possible because free and malloc are inverses
in the storage allocation domain.
Lots of operations have inverses and teaching the compiler that they are inverses,
and demonstrating that the results of one dataflow unconditionally into the other,
is what is needed. You have to make sure that your inverses really are inverses
and there isn't a surprise somewhere; a/x*x looks like just the value a,
but if x is zero you get a trap. If you don't care about the trap, it is an inverse;
if you do care about the trap then the optimization is more complex:
(if (x==0) then trap() else a)
which is still a good optimization if you think divide is expensive.
Other "algebraic" relationships are possible. For instance, there are
may idempotent operations: zeroing a variable (setting anything to the same
value repeatedly), etc. There are operations where one operand acts
like an identity element; X+0 ==> X for any 0. If X and 0 are matrices,
this is still true and a big time savings.
Other optimizations can occur when you can reason abstractly about what the code
is doing. "Abstract interpretation" is a set of techniques for reasoning about
values by classifying results into various interesting bins (e.g., this integer
is unknown, zero, negative, or positive). To do this you need to decide what
bins are helpful, and then compute the abstract value at each point. This is useful
when there are tests on categories (e.g., "if (x<0) { ... " and you know
abstractly that x is less than zero; you can them optimize away the conditional.
Another way is to define what a computation is doing symbolically, and simulate the computation to see the outcome. That is how you computed the effective size of the required buffer; you computed the buffer size symbolically before the loop started,
and simulated the effect of executing the loop for all iterations.
For this you need to be able to construct symbolic formulas
representing program properties, compose such formulas, and often simplify
such formulas when they get unusably complex (kinds of fades into the abstract
interpretation scheme). You also want such symbolic computation to take into
account the algebraic properties I described above. Tools that do this well are good at constructing formulas, and program transformation systems are often good foundations for this. One source-to-source program transformation system that can be used to do this
is the DMS Software Reengineering Toolkit.
What's hard is to decide which optimizations are worth doing, because you can end
of keeping track of vast amounts of stuff, which may not pay off. Computer cycles
are getting cheaper, and so it makes sense to track more properties of the code in the compiler.
The Broadway framework might be in the vein of what you're looking for. Papers on "source-to-source transformation" will probably also be enlightening.
Recently our company has started measuring the cyclomatic complexity (CC) of the functions in our code on a weekly basis, and reporting which functions have improved or worsened. So we have started paying a lot more attention to the CC of functions.
I've read that CC could be informally calculated as 1 + the number of decision points in a function (e.g. if statement, for loop, select etc), or also the number of paths through a function...
I understand that the easiest way of reducing CC is to use the Extract Method refactoring repeatedly...
There are somethings I am unsure about, e.g. what is the CC of the following code fragments?
1)
for (int i = 0; i < 3; i++)
Console.WriteLine("Hello");
And
Console.WriteLine("Hello");
Console.WriteLine("Hello");
Console.WriteLine("Hello");
They both do the same thing, but does the first version have a higher CC because of the for statement?
2)
if (condition1)
if (condition2)
if (condition 3)
Console.WriteLine("wibble");
And
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
Assuming the language does short-circuit evaluation, such as C#, then these two code fragments have the same effect... but is the CC of the first fragment higher because it has 3 decision points/if statements?
3)
if (condition1)
{
Console.WriteLine("one");
if (condition2)
Console.WriteLine("one and two");
}
And
if (condition3)
Console.WriteLine("fizz");
if (condition4)
Console.WriteLine("buzz");
These two code fragments do different things, but do they have the same CC? Or does the nested if statement in the first fragment have a higher CC? i.e. nested if statements are mentally more complex to understand, but is that reflected in the CC?
Yes. Your first example has a decision point and your second does not, so the first has a higher CC.
Yes-maybe, your first example has multiple decision points and thus a higher CC. (See below for explanation.)
Yes-maybe. Obviously they have the same number of decision points, but there are different ways to calculate CC, which means ...
... if your company is measuring CC in a specific way, then you need to become familiar with that method (hopefully they are using tools to do this). There are different ways to calculate CC for different situations (case statements, Boolean operators, etc.), but you should get the same kind of information from the metric no matter what convention you use.
The bigger problem is what others have mentioned, that your company seems to be focusing more on CC than on the code behind it. In general, sure, below 5 is great, below 10 is good, below 20 is okay, 21 to 50 should be a warning sign, and above 50 should be a big warning sign, but those are guides, not absolute rules. You should probably examine the code in a procedure that has a CC above 50 to ensure it isn't just a huge heap of code, but maybe there is a specific reason why the procedure is written that way, and it's not feasible (for any number of reasons) to refactor it.
If you use tools to refactor your code to reduce CC, make sure you understand what the tools are doing, and that they're not simply shifting one problem to another place. Ultimately, you want your code to have few defects, to work properly, and to be relatively easy to maintain. If that code also has a low CC, good for it. If your code meets these criteria and has a CC above 10, maybe it's time to sit down with whatever management you can and defend your code (and perhaps get them to examine their policy).
After browsing thru the wikipedia entry and on Thomas J. McCabe's original paper, it seems that the items you mentioned above are known problems with the metric.
However, most metrics do have pros and cons. I suppose in a large enough program the CC value could point to possibly complex parts of your code. But that higher CC does not necessarily mean complex.
Like all software metrics, CC is not perfect. Used on a big enough code base, it can give you an idea of where might be a problematic zone.
There are two things to keep in mind here:
Big enough code base: In any non trivial project you will have functions that have a really high CC value. So high that it does not matter if in one of your examples, the CC would be 2 or 3. A function with a CC of let's say over 300 is definitely something to analyse. Doesn't matter if the CC is 301 or 302.
Don't forget to use your head. There are methods that need many decision points. Often they can be refactored somehow to have fewer, but sometimes they can't. Do not go with a rule like "Refactor all methods with a CC > xy". Have a look at them and use your brain to decide what to do.
I like the idea of a weekly analysis. In quality control, trend analysis is a very effective tool for indentifying problems during their creation. This is so much better than having to wait until they get so big that they become obvious (see SPC for some details).
CC is not a panacea for measuring quality. Clearly a repeated statement is not "better" than a loop, even if a loop has a bigger CC. The reason the loop has a bigger CC is that sometimes it might get executed and sometimes it might not, which leads to two different "cases" which should both be tested. In your case the loop will always be executed three times because you use a constant, but CC is not clever enough to detect this.
Same with the chained ifs in example 2 - this structure allows you to have a statment which would be executed if only condition1 and condition2 is true. This is a special case which is not possible in the case using &&. So the if-chain has a bigger potential for special cases even if you dont utilize this in your code.
This is the danger of applying any metric blindly. The CC metric certainly has a lot of merit but as with any other technique for improving code it can't be evaluated divorced from context. Point your management at Casper Jone's discussion of the Lines of Code measurement (wish I could find a link for you). He points out that if Lines of Code is a good measure of productivity then assembler language developers are the most productive developers on earth. Of course they're no more productive than other developers; it just takes them a lot more code to accomplish what higher level languages do with less source code. I mention this, as I say, so you can show your managers how dumb it is to blindly apply metrics without intelligent review of what the metric is telling you.
I would suggest that if they're not, that your management would be wise to use the CC measure as a way of spotting potential hot spots in the code that should be reviewed further. Blindly aiming for the goal of lower CC without any reference to code maintainability or other measures of good coding is just foolish.
Cyclomatic complexity is analogous to temperature. They are both measurements, and in most cases meaningless without context. If I said the temperature outside was 72 degrees that doesn’t mean much; but if I added the fact that I was at North Pole, the number 72 becomes significant. If someone told me a method has a cyclomatic complexity of 10, I can’t determine if that is good or bad without its context.
When I code review an existing application, I find cyclomatic complexity a useful “starting point” metric. The first thing I check for are methods with a CC > 10. These “>10” methods are not necessarily bad. They just provide me a starting point for reviewing the code.
General rules when considering a CC number:
The relationship between CC # and # of tests, should be CC# <= #tests
Refactor for CC# only if it increases
maintainability
CC above 10 often indicates one or more Code Smells
[Off topic] If you favor readability over good score in the metrics (Was it J.Spolsky that said, "what's measured, get's done" ? - meaning that metrics are abused more often than not I suppose), it is often better to use a well-named boolean to replace your complex conditional statement.
then
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
become
bool/boolean theWeatherIsFine = condition1 && condition2 && condition3;
if (theWeatherIsFine)
Console.WriteLine("wibble");
I'm no expert at this subject, but I thought I would give my two cents. And maybe that's all this is worth.
Cyclomatic Complexity seems to be just a particular automated shortcut to finding potentially (but not definitely) problematic code snippets. But isn't the real problem to be solved one of testing? How many test cases does the code require? If CC is higher, but number of test cases is the same and code is cleaner, don't worry about CC.
1.) There is no decision point there. There is one and only one path through the program there, only one possible result with either of the two versions. The first is more concise and better, Cyclomatic Complexity be damned.
1 test case for both
2.) In both cases, you either write "wibble" or you don't.
2 test cases for both
3.) First one could result in nothing, "one", or "one" and "one and two". 3 paths. 2nd one could result in nothing, either of the two, or both of them. 4 paths.
3 test cases for the first
4 test cases for the second