Related
I've seen quite a few examples where binary numbers are being used in code, like 32,64,128 and so on (for instance, very well known example - minecraft)
I want to ask, does using binary numbers in such high level languages as Java / C++ help anything?
I know assembly and that you would always rather use these because in low level language it overcomplicates things if you go above register limit.
Will programs run any faster/save up more memory if you use binary numbers?
As with most things, "it depends".
In compiled languages, the better compilers will deduce that slow machine instructions can sometimes be done with different faster machine instructions (but only for special values, such as powers of two). Sometimes coders know this and program accordingly. (e.g. multiplying by a power of two is cheap)
Other times, algorithms are suited towards representations involving powers of two (e.g. many divide and conquer algorithms like the Fast Fourier Transform or a merge sort).
Yet other times, it's the most compact way to represent boolean values (like a bitmask).
And on top of that, other times it's more efficiency for memory purposes (typically because it's so fast do to multiply and divide logic with powers of two, the OS/hardware/etc will use cache line / page sizes / etc that are powers of two, so you'd do well to have nice power of two sizes for your important data structures).
And then, on top of that, other times.. programmers are just so used to using powers of two that they simply do it because it seems like a nice number.
There are some benefits of using powers of two numbers in your programs. Bitmasks are one application of this, mainly because bitwise operators (&, |, <<, >>, etc) are incredibly fast.
In C++ and Java, this is done a fair bit- especially with GUI applications. You could have a field of 32 different menu options (such as resizable, removable, editable, etc), and apply each one without having to go through convoluted addition of values.
In terms of raw speedup or any performance improvement, that really depends on the application itself. GUI packages can be huge, so getting any speedup out of those when applying menu/interface options is a big win.
From the title of your question, it sounds like you mean, "Does it make your program more efficient if you write constants in binary?" If that's what you meant, the answer is emphatically, No. The compiler translates all your constants to binary at compile time, so by the time the program runs, it makes no difference. I don't know if the compiler can interpret binary constants faster than decimal, but the difference would surely be trivial.
But the body of your question seems to indicate that you mean, "use constants that are round number in binary" rather than necessarily expressing them in binary digits.
For most purposes, the answer would be no. If, say, the computer has to add two numbers together, adding a number that happens to be a round number in binary is not going to be any faster than adding a not-round number.
It might be slightly faster for multiplication. Some compilers are smart enough to turn multiplication by powers of 2 into a bit shift operation rather than a hardware multiply, and bit shifts are usually faster than multiplies.
Back in my assembly-language days I often made elements in arrays have sizes that were powers of 2 so I could index into the array with a bit-shift rather than a multiply. But in a high-level language that would be hard to do, as you'd have to do some research to find out just how much space your primitives take in memory, whether the compiler adds padding bytes between them, etc etc. And if you did add some bytes to an array element to pad it out to a power of 2, the entire array is now bigger, and so you might generate an extra page fault, i.e. the operating system runs out of memory and has to write a chunck of your data to the hard drive and then read it back when it needs it. One extra hard drive right takes more time than 1000 multiplications.
In practice, (a) the difference is so trivial that it would almost never be worth worrying about; and (b) you don't normally know everything happenning at the low level, so it would often be hard to predict whether a change with its intendent ramifications would help or hurt.
In short: Don't bother. Use the constant values that are natural to the problem.
The reason they're used is probably different - e.g. bitmasks.
If you see them in array sizes, it doesn't really increase performance, but usually memory is allocated by power of 2. E.g. if you wrote char x[100], you'd probably get 128 allocated bytes.
No, your code will ran the same way, no matter what is the number you use.
If by binary numbers you mean numbers that are power of 2, like: 2, 4, 8, 16, 1024.... they are common due to optimization of space, normally. Example, if you have a 8 bit pointer it is capable of point to 256 (that is a power of 2), addresses, so if you use less than 256 you are wasting your pointer.... so normally you allocate a 256 buffer... this same works for all other power of 2 numbers....
In most cases the answer is almost always no, there is no noticeable performance difference.
However, there are certain cases (very few) when NOT using binary numbers for array/structure sizes/length will give noticeable performance benefits. These are cases when you're filling the cache and because you're looping over a structure that fills the cache in a such a way that you have cache collisions every time you loop through your array/structure. This case is very rare, and shouldn't be preoptimized unless you're having problems with your code performing much more slowly than theoretical limits say it should. Also, this case is very hardware dependent and will change from system to system.
Introduction
I know I'm going to lose a lot of reputation for this question and I also know it will be flagged as inappropriate but I'm really curious about that so I'm not giving up if there's any chance I'm getting at least an answer.
Question
Today I woke up thinking:
Hei, how could random functions be really random if they are created by an algorithm?
Think about it. How could you create a function that simulates randomness without the concept of random already built in? I began to think:
Hei, I'd take an array of int, then I'd do [thing], then [thing], than [thing] again, then I'd choose only odd numbers... ecc
But it seems more likely a function that make it more confusing to predict what the choose will be rather than real randomness.
Is it possible to create randomness? How are functions that returns random ints (such as rand() in PHP) created? How can they simulate randomness?
Functions that algorithmically produce so-called random numbers are pseudorandom number generators. If you know the seed used to generate the sequence, then the numbers are predictable. The sequence itself is a statistically random distribution but not truly random.
There are true random number generators that typically involve some hardware that samples randomness from the physical world, e.g., radioactivity or acoustic noise. A naive implementation would be to sample hard disk access and mouse movements. See random.org for a real RNG.
Obligatory xkcd strip:
There's a reason they're called pseudorandom numbers; they're not truly random. From Wikipedia:
A pseudorandom number generator
(PRNG), also known as a deterministic
random bit generator (DRBG),[1] is an
algorithm for generating a sequence of
numbers that approximates the
properties of random numbers. The
sequence is not truly random in that
it is completely determined by a
relatively small set of initial
values, called the PRNG's state.
Read volume 2, chapter 3 of this seminal work if you want the maths behind it. You can buy it to look impressive on your bookshelf. (Just keep in mind that most people who buy it wind up never actually reading it -- for a good reason. It's VERY dense and VERY difficult reading.) The short answer that doesn't involve massive tomes of difficult text is that "random" numbers generated purely algorithmically are pseudorandom, which is to say that they are "random enough".
You might want to look into wikipedia's article on PRNGS - what all random number generators we have on PCs (pretty much) are.
About the closest you can get to random, which I think is done somewhere, is to use temperatures in the CPU or some other sensor reading as a seed for one of these. If the seed is random (the temperature is unlikely to ever be exactly the same), the sequence is about as close to random as possible.
I usually "get Milliseconds" and divide it to a pseudorandom number. This makes it even more random and unpredictable.
I'm writing a compiler, and I'm looking for resources on optimization. I'm compiling to machine code, so anything at runtime is out of the question.
What I've been looking for lately is less code optimization and more semantic/high-level optimization. For example:
free(malloc(400)); // should be completely optimized away
Even if these functions were completely inlined, they could eventually call OS memory functions which can never be inlined. I'd love to be able to eliminate that statement completely without building special-case rules into the compiler (after all, malloc is just another function).
Another example:
string Parenthesize(string str) {
StringBuilder b; // similar to C#'s class of the same name
foreach(str : ["(", str, ")"])
b.Append(str);
return b.Render();
}
In this situation I'd love to be able to initialize b's capacity to str.Length + 2 (enough to exactly hold the result, without wasting memory).
To be completely honest, I have no idea where to begin in tackling this problem, so I was hoping for somewhere to get started. Has there been any work done in similar areas? Are there any compilers that have implemented anything like this in a general sense?
To do an optimization across 2 or more operations, you have to understand the
algebraic relationship of those two operations. If you view operations
in their problem domain, they often have such relationships.
Your free(malloc(400)) is possible because free and malloc are inverses
in the storage allocation domain.
Lots of operations have inverses and teaching the compiler that they are inverses,
and demonstrating that the results of one dataflow unconditionally into the other,
is what is needed. You have to make sure that your inverses really are inverses
and there isn't a surprise somewhere; a/x*x looks like just the value a,
but if x is zero you get a trap. If you don't care about the trap, it is an inverse;
if you do care about the trap then the optimization is more complex:
(if (x==0) then trap() else a)
which is still a good optimization if you think divide is expensive.
Other "algebraic" relationships are possible. For instance, there are
may idempotent operations: zeroing a variable (setting anything to the same
value repeatedly), etc. There are operations where one operand acts
like an identity element; X+0 ==> X for any 0. If X and 0 are matrices,
this is still true and a big time savings.
Other optimizations can occur when you can reason abstractly about what the code
is doing. "Abstract interpretation" is a set of techniques for reasoning about
values by classifying results into various interesting bins (e.g., this integer
is unknown, zero, negative, or positive). To do this you need to decide what
bins are helpful, and then compute the abstract value at each point. This is useful
when there are tests on categories (e.g., "if (x<0) { ... " and you know
abstractly that x is less than zero; you can them optimize away the conditional.
Another way is to define what a computation is doing symbolically, and simulate the computation to see the outcome. That is how you computed the effective size of the required buffer; you computed the buffer size symbolically before the loop started,
and simulated the effect of executing the loop for all iterations.
For this you need to be able to construct symbolic formulas
representing program properties, compose such formulas, and often simplify
such formulas when they get unusably complex (kinds of fades into the abstract
interpretation scheme). You also want such symbolic computation to take into
account the algebraic properties I described above. Tools that do this well are good at constructing formulas, and program transformation systems are often good foundations for this. One source-to-source program transformation system that can be used to do this
is the DMS Software Reengineering Toolkit.
What's hard is to decide which optimizations are worth doing, because you can end
of keeping track of vast amounts of stuff, which may not pay off. Computer cycles
are getting cheaper, and so it makes sense to track more properties of the code in the compiler.
The Broadway framework might be in the vein of what you're looking for. Papers on "source-to-source transformation" will probably also be enlightening.
I will expand here on a comment I made to When a method has too many parameters? where the OP was having minor problems with someone else's function which had 97 parameters.
I am a great believer in writing maintainable code (and it is often easier to write than to read, hence Steve McConnell(praise be upon his name)'s phrase "write only code").
Since statistics how that most car accidents happen at junctions and my experience (ymmv) shows that most "anomalies" occur at interfaces, I will list some things that I do to attempt to avoid misunderstandings at interfaces and invite your comments if I am going badly wrong.
But, more importantly, I invite your suggestions for making things even more prophylactic (see, there is a question after all - how to improve things?).
Adequate documentation, in the form of (up to date) DoxyGen format comments describing the nature and porpoise of each parameter.
absolutely NO back-door shenanigans with global variables as hidden parameters.
try to limit parameters to six or eight. If more, pass related parameters as a structure; if they are not related then seriously reconsider the function. If it needs so much information, is it too complex to maintain? Can it be broken down into several smaller functions?
use the CONST as often as possible and meaningful.
a coding standard that says that input parameters come first, then output only, and finally input/output, which are modified by the function.
I also #define some empty macros to make declarations even easier to read:
#define INPUT
#define OUTPUT
#define MODIFY
bool DoSomething(INPUT int howOften, MODIFY Wdiget *myWidget, OUTPUT WidgetPtr * const nextWidget)
Just a few ideas. How can I improve on these? Thanks.
Addressing your points in order:
Well-designed types usually render Doxygen format comments a waste of time.
While true as stated ("shenanigans" are bad by definition), not all use of globals is really as bad as many people imply. If you have to pass a parameter more than about four times before it's really used, chances are that a global will be less error prone.
Eight or even six parameters is usually excessive. Any more than two or three starts to indicate that the function is doing more than one thing. One obvious exception is a constructor that aggregates a number of other items into an object (e.g. an address object that takes a street name, number, city, country, postal code, etc., as inputs).
Better stated as "write const-correct code."
Given C++'s default parameter capability, it's generally best to sort in ascending order of likelihood to use a default value.
Don't. Just don't! If it's not obvious what are inputs and what are outputs, that pretty much proves that the basic design is fatally flawed.
As for ideas I think are actually good:
As implied in the first point, concentrate on types. Once you get them right, most of the other problems just disappear.
Use a few (even just one) central theme(s). For Lisp, everything is a list. For Unix, everything is a file (and files are all simple streams of bytes). Emulate this simplicity.
Edit: replying to comments:
While you do have something of a point, my experience still indicates that documentation produced with Doxygen (and similar such as javadoc) is almost universally useless. In theory the tool doesn't prevent decent documentation, but in fact it's rare at best.
Globals certainly can cause problems -- but I'm old enough to have used Fortran back before it provided much alternative, and with some care it really wasn't nearly as bad as many people imply. A lot of the stories seem to be at least third hand, with a bit of extra "spice" added each time they're re-told. I've seen one story that sounds a lot like an exaggerated version of one I told a couple decades ago or so...
Hm...Markdown formatting doesn't seem to approve of my skipping numbers.
And again...
My comment was specific to C++, but quite a few other languages also support default parameters and/or overloading, and it can apply about as well to most of them. Even without it, a call like f(param1, param2, 0,0,0); is pretty easy to see as having default parameters. To an extent, ordering by usage is handy, but when you do the order you pick doesn't matter nearly as much as simply being consistent.
True, a void * parameter doesn't tell you much -- but a MODIFY void * is little better. A real type and consistent use of const provides far more information and gets checked by the compiler. Other languages may not have/use const, but they probably don't have macros either. OTOH, some directly support what you want -- e.g., Ada has in, out and inout specifiers.
I am not sure we will end at a single point of agreement about this, everyone will come up with different ideas (good or bad in each others perspective). Having said that, i find Code Complete to be a good place to go to when I am stuck with this sort of problems.
A big peeve of mine is control coupling between functions. (Control coupling is when one module controls the execution flow of another, by passing flags telling the called function what to do.)
For example (cut & paste from code I just had to work on):
void UartEnable(bool enable, int baud);
as opposed to:
void UartEnable(int baud);
void UartDisable(void);
Put another way -- parameters are for passing "data", not "control".
I'd use the 'rule' put forward by Uncle Bob in his book Clean Code.
These the ones I think I remember:
2 parameters are ok, 3 are bad, more need refactoring
Comments are a sign of bad names. So there should be none, and the purpose of the function and the parameters should be clear from the names
make the method short. Aim for below 10 lines of code.
Recently our company has started measuring the cyclomatic complexity (CC) of the functions in our code on a weekly basis, and reporting which functions have improved or worsened. So we have started paying a lot more attention to the CC of functions.
I've read that CC could be informally calculated as 1 + the number of decision points in a function (e.g. if statement, for loop, select etc), or also the number of paths through a function...
I understand that the easiest way of reducing CC is to use the Extract Method refactoring repeatedly...
There are somethings I am unsure about, e.g. what is the CC of the following code fragments?
1)
for (int i = 0; i < 3; i++)
Console.WriteLine("Hello");
And
Console.WriteLine("Hello");
Console.WriteLine("Hello");
Console.WriteLine("Hello");
They both do the same thing, but does the first version have a higher CC because of the for statement?
2)
if (condition1)
if (condition2)
if (condition 3)
Console.WriteLine("wibble");
And
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
Assuming the language does short-circuit evaluation, such as C#, then these two code fragments have the same effect... but is the CC of the first fragment higher because it has 3 decision points/if statements?
3)
if (condition1)
{
Console.WriteLine("one");
if (condition2)
Console.WriteLine("one and two");
}
And
if (condition3)
Console.WriteLine("fizz");
if (condition4)
Console.WriteLine("buzz");
These two code fragments do different things, but do they have the same CC? Or does the nested if statement in the first fragment have a higher CC? i.e. nested if statements are mentally more complex to understand, but is that reflected in the CC?
Yes. Your first example has a decision point and your second does not, so the first has a higher CC.
Yes-maybe, your first example has multiple decision points and thus a higher CC. (See below for explanation.)
Yes-maybe. Obviously they have the same number of decision points, but there are different ways to calculate CC, which means ...
... if your company is measuring CC in a specific way, then you need to become familiar with that method (hopefully they are using tools to do this). There are different ways to calculate CC for different situations (case statements, Boolean operators, etc.), but you should get the same kind of information from the metric no matter what convention you use.
The bigger problem is what others have mentioned, that your company seems to be focusing more on CC than on the code behind it. In general, sure, below 5 is great, below 10 is good, below 20 is okay, 21 to 50 should be a warning sign, and above 50 should be a big warning sign, but those are guides, not absolute rules. You should probably examine the code in a procedure that has a CC above 50 to ensure it isn't just a huge heap of code, but maybe there is a specific reason why the procedure is written that way, and it's not feasible (for any number of reasons) to refactor it.
If you use tools to refactor your code to reduce CC, make sure you understand what the tools are doing, and that they're not simply shifting one problem to another place. Ultimately, you want your code to have few defects, to work properly, and to be relatively easy to maintain. If that code also has a low CC, good for it. If your code meets these criteria and has a CC above 10, maybe it's time to sit down with whatever management you can and defend your code (and perhaps get them to examine their policy).
After browsing thru the wikipedia entry and on Thomas J. McCabe's original paper, it seems that the items you mentioned above are known problems with the metric.
However, most metrics do have pros and cons. I suppose in a large enough program the CC value could point to possibly complex parts of your code. But that higher CC does not necessarily mean complex.
Like all software metrics, CC is not perfect. Used on a big enough code base, it can give you an idea of where might be a problematic zone.
There are two things to keep in mind here:
Big enough code base: In any non trivial project you will have functions that have a really high CC value. So high that it does not matter if in one of your examples, the CC would be 2 or 3. A function with a CC of let's say over 300 is definitely something to analyse. Doesn't matter if the CC is 301 or 302.
Don't forget to use your head. There are methods that need many decision points. Often they can be refactored somehow to have fewer, but sometimes they can't. Do not go with a rule like "Refactor all methods with a CC > xy". Have a look at them and use your brain to decide what to do.
I like the idea of a weekly analysis. In quality control, trend analysis is a very effective tool for indentifying problems during their creation. This is so much better than having to wait until they get so big that they become obvious (see SPC for some details).
CC is not a panacea for measuring quality. Clearly a repeated statement is not "better" than a loop, even if a loop has a bigger CC. The reason the loop has a bigger CC is that sometimes it might get executed and sometimes it might not, which leads to two different "cases" which should both be tested. In your case the loop will always be executed three times because you use a constant, but CC is not clever enough to detect this.
Same with the chained ifs in example 2 - this structure allows you to have a statment which would be executed if only condition1 and condition2 is true. This is a special case which is not possible in the case using &&. So the if-chain has a bigger potential for special cases even if you dont utilize this in your code.
This is the danger of applying any metric blindly. The CC metric certainly has a lot of merit but as with any other technique for improving code it can't be evaluated divorced from context. Point your management at Casper Jone's discussion of the Lines of Code measurement (wish I could find a link for you). He points out that if Lines of Code is a good measure of productivity then assembler language developers are the most productive developers on earth. Of course they're no more productive than other developers; it just takes them a lot more code to accomplish what higher level languages do with less source code. I mention this, as I say, so you can show your managers how dumb it is to blindly apply metrics without intelligent review of what the metric is telling you.
I would suggest that if they're not, that your management would be wise to use the CC measure as a way of spotting potential hot spots in the code that should be reviewed further. Blindly aiming for the goal of lower CC without any reference to code maintainability or other measures of good coding is just foolish.
Cyclomatic complexity is analogous to temperature. They are both measurements, and in most cases meaningless without context. If I said the temperature outside was 72 degrees that doesn’t mean much; but if I added the fact that I was at North Pole, the number 72 becomes significant. If someone told me a method has a cyclomatic complexity of 10, I can’t determine if that is good or bad without its context.
When I code review an existing application, I find cyclomatic complexity a useful “starting point” metric. The first thing I check for are methods with a CC > 10. These “>10” methods are not necessarily bad. They just provide me a starting point for reviewing the code.
General rules when considering a CC number:
The relationship between CC # and # of tests, should be CC# <= #tests
Refactor for CC# only if it increases
maintainability
CC above 10 often indicates one or more Code Smells
[Off topic] If you favor readability over good score in the metrics (Was it J.Spolsky that said, "what's measured, get's done" ? - meaning that metrics are abused more often than not I suppose), it is often better to use a well-named boolean to replace your complex conditional statement.
then
if (condition1 && condition2 && condition3)
Console.WriteLine("wibble");
become
bool/boolean theWeatherIsFine = condition1 && condition2 && condition3;
if (theWeatherIsFine)
Console.WriteLine("wibble");
I'm no expert at this subject, but I thought I would give my two cents. And maybe that's all this is worth.
Cyclomatic Complexity seems to be just a particular automated shortcut to finding potentially (but not definitely) problematic code snippets. But isn't the real problem to be solved one of testing? How many test cases does the code require? If CC is higher, but number of test cases is the same and code is cleaner, don't worry about CC.
1.) There is no decision point there. There is one and only one path through the program there, only one possible result with either of the two versions. The first is more concise and better, Cyclomatic Complexity be damned.
1 test case for both
2.) In both cases, you either write "wibble" or you don't.
2 test cases for both
3.) First one could result in nothing, "one", or "one" and "one and two". 3 paths. 2nd one could result in nothing, either of the two, or both of them. 4 paths.
3 test cases for the first
4 test cases for the second