When to use unsigned values over signed ones? - language-agnostic

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.

I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).

In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.

I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.

C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.

size_t is often a good choice for this, or size_type if you're using an STL class.

Related

How do interpreters load their values?

I mean, interpreters work on a list of instructions, which seem to be composed more or less by sequences of bytes, usually stored as integers. Opcodes are retrieved from these integers, by doing bit-wise operations, for use in a big switch statement where all operations are located.
My specific question is: How do the object values get stored/retrieved?
For example, let's (non-realistically) assume:
Our instructions are unsigned 32 bit integers.
We've reserved the first 4 bits of the integer for opcodes.
If I wanted to store data in the same integer as my opcode, I'm limited to a 24 bit integer. If I wanted to store it in the next instruction, I'm limited to a 32 bit value.
Values like Strings require lots more storage than this. How do most interpreters get away with this in an efficient manner?
I'm going to start by assuming that you're interested primarily (if not exclusively) in a byte-code interpreter or something similar (since your question seems to assume that). An interpreter that works directly from source code (in raw or tokenized form) is a fair amount different.
For a typical byte-code interpreter, you basically design some idealized machine. Stack-based (or at least stack-oriented) designs are pretty common for this purpose, so let's assume that.
So, first let's consider the choice of 4 bits for op-codes. A lot here will depend on how many data formats we want to support, and whether we're including that in the 4 bits for the op code. Just for the sake of argument, let's assume that the basic data types supported by the virtual machine proper are 8-bit and 64-bit integers (which can also be used for addressing), and 32-bit and 64-bit floating point.
For integers we pretty much need to support at least: add, subtract, multiply, divide, and, or, xor, not, negate, compare, test, left/right shift/rotate (right shifts in both logical and arithmetic varieties), load, and store. Floating point will support the same arithmetic operations, but remove the logical/bitwise operations. We'll also need some branch/jump operations (unconditional jump, jump if zero, jump if not zero, etc.) For a stack machine, we probably also want at least a few stack oriented instructions (push, pop, dupe, possibly rotate, etc.)
That gives us a two-bit field for the data type, and at least 5 (quite possibly 6) bits for the op-code field. Instead of conditional jumps being special instructions, we might want to have just one jump instruction, and a few bits to specify conditional execution that can be applied to any instruction. We also pretty much need to specify at least a few addressing modes:
Optional: small immediate (N bits of data in the instruction itself)
large immediate (data in the 64-bit word following the instruction)
implied (operand(s) on top of stack)
Absolute (address specified in 64 bits following instruction)
relative (offset specified in or following instruction)
I've done my best to keep everything about as minimal as is at all reasonable here -- you might well want more to improve efficiency.
Anyway, in a model like this, an object's value is just some locations in memory. Likewise, a string is just some sequence of 8-bit integers in memory. Nearly all manipulation of objects/strings is done via the stack. For example, let's assume you had some classes A and B defined like:
class A {
int x;
int y;
};
class B {
int a;
int b;
};
...and some code like:
A a {1, 2};
B b {3, 4};
a.x += b.a;
The initialization would mean values in the executable file loaded into the memory locations assigned to a and b. The addition could then produce code something like this:
push immediate a.x // put &a.x on top of stack
dupe // copy address to next lower stack position
load // load value from a.x
push immediate b.a // put &b.a on top of stack
load // load value from b.a
add // add two values
store // store back to a.x using address placed on stack with `dupe`
Assuming one byte for each instruction proper, we end up around 23 bytes for the sequence as a whole, 16 bytes of which are addresses. If we use 32-bit addressing instead of 64-bit, we can reduce that by 8 bytes (i.e., a total of 15 bytes).
The most obvious thing to keep in mind is that the virtual machine implemented by a typical byte-code interpreter (or similar) isn't all that different from a "real" machine implemented in hardware. You might add some instructions that are important to the model you're trying to implement (e.g., the JVM includes instructions to directly support its security model), or you might leave out a few if you only want to support languages that don't include them (e.g., I suppose you could leave out a few like xor if you really wanted to). You also need to decide what sort of virtual machine you're going to support. What I've portrayed above is stack-oriented, but you can certainly do a register-oriented machine if you prefer.
Either way, most of object access, string storage, etc., comes down to them being locations in memory. The machine will retrieve data from those locations into the stack/registers, manipulate as appropriate, and store back to the locations of the destination object(s).
Bytecode interpreters that I'm familiar with do this using constant tables. When the compiler is generating bytecode for a chunk of source, it is also generating a little constant table that rides along with that bytecode. (For example, if the bytecode gets stuffed into some kind of "function" object, the constant table will go in there too.)
Any time the compiler encounters a literal like a string or a number, it creates an actual runtime object for the value that the interpreter can work with. It adds that to the constant table and gets the index where the value was added. Then it emits something like a LOAD_CONSTANT instruction that has an argument whose value is the index in the constant table.
Here's an example:
static void string(Compiler* compiler, int allowAssignment)
{
// Define a constant for the literal.
int constant = addConstant(compiler, wrenNewString(compiler->parser->vm,
compiler->parser->currentString, compiler->parser->currentStringLength));
// Compile the code to load the constant.
emit(compiler, CODE_CONSTANT);
emit(compiler, constant);
}
At runtime, to implement a LOAD_CONSTANT instruction, you just decode the argument, and pull the object out of the constant table.
Here's an example:
CASE_CODE(CONSTANT):
PUSH(frame->fn->constants[READ_ARG()]);
DISPATCH();
For things like small numbers and frequently used values like true and null, you may devote dedicated instructions to them, but that's just an optimization.

What are "magic numbers" in computer programming?

When people talk about the use of "magic numbers" in computer programming, what do they mean?
Magic numbers are any number in your code that isn't immediately obvious to someone with very little knowledge.
For example, the following piece of code:
sz = sz + 729;
has a magic number in it and would be far better written as:
sz = sz + CAPACITY_INCREMENT;
Some extreme views state that you should never have any numbers in your code except -1, 0 and 1 but I prefer a somewhat less dogmatic view since I would instantly recognise 24, 1440, 86400, 3.1415, 2.71828 and 1.414 - it all depends on your knowledge.
However, even though I know there are 1440 minutes in a day, I would probably still use a MINS_PER_DAY identifier since it makes searching for them that much easier. Whose to say that the capacity increment mentioned above wouldn't also be 1440 and you end up changing the wrong value? This is especially true for the low numbers: the chance of dual use of 37197 is relatively low, the chance of using 5 for multiple things is pretty high.
Use of an identifier means that you wouldn't have to go through all your 700 source files and change 729 to 730 when the capacity increment changed. You could just change the one line:
#define CAPACITY_INCREMENT 729
to:
#define CAPACITY_INCREMENT 730
and recompile the lot.
Contrast this with magic constants which are the result of naive people thinking that just because they remove the actual numbers from their code, they can change:
x = x + 4;
to:
#define FOUR 4
x = x + FOUR;
That adds absolutely zero extra information to your code and is a total waste of time.
"magic numbers" are numbers that appear in statements like
if days == 365
Assuming you didn't know there were 365 days in a year, you'd find this statement meaningless. Thus, it's good practice to assign all "magic" numbers (numbers that have some kind of significance in your program) to a constant,
DAYS_IN_A_YEAR = 365
And from then on, compare to that instead. It's easier to read, and if the earth ever gets knocked out of alignment, and we gain an extra day... you can easily change it (other numbers might be more likely to change).
There's more than one meaning. The one given by most answers already (an arbitrary unnamed number) is a very common one, and the only thing I'll say about that is that some people go to the extreme of defining...
#define ZERO 0
#define ONE 1
If you do this, I will hunt you down and show no mercy.
Another kind of magic number, though, is used in file formats. It's just a value included as typically the first thing in the file which helps identify the file format, the version of the file format and/or the endian-ness of the particular file.
For example, you might have a magic number of 0x12345678. If you see that magic number, it's a fair guess you're seeing a file of the correct format. If you see, on the other hand, 0x78563412, it's a fair guess that you're seeing an endian-swapped version of the same file format.
The term "magic number" gets abused a bit, though, referring to almost anything that identifies a file format - including quite long ASCII strings in the header.
http://en.wikipedia.org/wiki/File_format#Magic_number
Wikipedia is your friend (Magic Number article)
Most of the answers so far have described a magic number as a constant that isn't self describing. Being a little bit of an "old-school" programmer myself, back in the day we described magic numbers as being any constant that is being assigned some special purpose that influences the behaviour of the code. For example, the number 999999 or MAX_INT or something else completely arbitrary.
The big problem with magic numbers is that their purpose can easily be forgotten, or the value used in another perfectly reasonable context.
As a crude and terribly contrived example:
while (int i != 99999)
{
DoSomeCleverCalculationBasedOnTheValueOf(i);
if (escapeConditionReached)
{
i = 99999;
}
}
The fact that a constant is used or not named isn't really the issue. In the case of my awful example, the value influences behaviour, but what if we need to change the value of "i" while looping?
Clearly in the example above, you don't NEED a magic number to exit the loop. You could replace it with a break statement, and that is the real issue with magic numbers, that they are a lazy approach to coding, and without fail can always be replaced by something less prone to either failure, or to losing meaning over time.
Anything that doesn't have a readily apparent meaning to anyone but the application itself.
if (foo == 3) {
// do something
} else if (foo == 4) {
// delete all users
}
Magic numbers are special value of certain variables which causes the program to behave in an special manner.
For example, a communication library might take a Timeout parameter and it can define the magic number "-1" for indicating infinite timeout.
The term magic number is usually used to describe some numeric constant in code. The number appears without any further description and thus its meaning is esoteric.
The use of magic numbers can be avoided by using named constants.
Using numbers in calculations other than 0 or 1 that aren't defined by some identifier or variable (which not only makes the number easy to change in several places by changing it in one place, but also makes it clear to the reader what the number is for).
In simple and true words, a magic number is a three-digit number, whose sum of the squares of the first two digits is equal to the third one.
Ex-202,
as, 2*2 + 0*0 = 2*2.
Now, WAP in java to accept an integer and print whether is a magic number or not.
It may seem a bit banal, but there IS at least one real magic number in every programming language.
0
I argue that it is THE magic wand to rule them all in virtually every programmer's quiver of magic wands.
FALSE is inevitably 0
TRUE is not(FALSE), but not necessarily 1! Could be -1 (0xFFFF)
NULL is inevitably 0 (the pointer)
And most compilers allow it unless their typechecking is utterly rabid.
0 is the base index of array elements, except in languages that are so antiquated that the base index is '1'. One can then conveniently code for(i = 0; i < 32; i++), and expect that 'i' will start at the base (0), and increment to, and stop at 32-1... the 32nd member of an array, or whatever.
0 is the end of many programming language strings. The "stop here" value.
0 is likewise built into the X86 instructions to 'move strings efficiently'. Saves many microseconds.
0 is often used by programmers to indicate that "nothing went wrong" in a routine's execution. It is the "not-an-exception" code value. One can use it to indicate the lack of thrown exceptions.
Zero is the answer most often given by programmers to the amount of work it would take to do something completely trivial, like change the color of the active cell to purple instead of bright pink. "Zero, man, just like zero!"
0 is the count of bugs in a program that we aspire to achieve. 0 exceptions unaccounted for, 0 loops unterminated, 0 recursion pathways that cannot be actually taken. 0 is the asymptote that we're trying to achieve in programming labor, girlfriend (or boyfriend) "issues", lousy restaurant experiences and general idiosyncracies of one's car.
Yes, 0 is a magic number indeed. FAR more magic than any other value. Nothing ... ahem, comes close.
rlynch#datalyser.com

Any reason to use hex notation for null pointers?

I'm currently improving the part of our COM component that logs all external calls into a file. For pointers we write something like (IInterface*)0x12345678 with the value being equal to the actual address.
Currently no difference is made for null pointers - they are displayed as 0x0 which IMO is suboptimal and inelegant. Changing this behaviour is not a problem at all. But first I'd like to know - is there any real advantage in representing null pointers in hex?
In C or C++, you should be able to use the standard %p formatting code, which will then make your pointers look like everybody else's.
I'm not sure how null pointers are formatted in Win32 by %p, on Linux I think you get "null" or something similar.
Using the notation 0x0 (IMO) makes it clearer that it's referring to an address (even if it's not the internal representation of the null pointer). (In actual code, I prefer would using the NULL macro, though, but it sounds like you're talking specifically about debugging spew.)
It gives some context, just like I prefer using '\0' for the NUL-terminator.
It's a stylistic preference, though, so do what appeals to you (and to your colleagues).
Personally, I'd print 0x0 to the log file[*]. Some day when someone comes to parse the file automatically, the more uniform the data is the better. I don't find 0x0 difficult to read, so it seems silly to have a special case in the writer code, and another special case in the reader code, for no benefit that I can think of.
0x0 is preferable to 0 for grepping the log for NULLs, too: saves you having to figure out that you should be grepping for )0 or something funny.
I wouldn't write 0x0 for a null pointer constant in C or C++, though. I write non-null addresses so unbelievably rarely that there's nothing for the nulls to be uniform with. I guess if I was defining a bunch of constants to represent the memory map of some device, and the zero address was significant in that memory map, then I might write it 0x0 in that context.
[*] Or perhaps 0x00000000. I like 32-bit pointers to be printed 8 chars long, because when I read/remember a pointer I start out in pairs from the left. If it turns out to have 7 chars, I get horribly confused at the end ;-). 64-bit pointers it doesn't matter, because I can't remember a number that long anyway...
It's all positive zero in the end.
There is: You can always convert them back to a number (0), with no additional effort. And the only disadvantage is readability.
There is no reason to prefer (SomeType*)0x0 to (SomeType*)0.
As an aside: In C, the null pointer constant is a somewhat strange construct; the compiler recognizes (SomeType*)0 as "the null pointer", even if the internal representation on some machine might differ from the numerical value 0. It is more like NULL in SQL -- not a "real" pointer value. In practice, all machines I know of model the null pointer as the "0" address.
I am pretty sure the hex notation is a result of the layout of memory. Memory is word aligned, where a word is 32 bits if you are on a 32 bit processor. These words are segmented into pages, which are arranged in page tables, etc. etc. Hex notation is the only way to make sense of this arrangements (unless you really like using your calculator).
My opinion, is for readability, think about it, if you were to look at 0, what does that mean, does that mean its a unsigned integer, or if it was 0x0, then instinctively, it has something to do with binary notation, more likely platform dependent.
Since the tag is language agnostic, and the word 'null pointer', in Delphi/Object Pascal, it is 'nil', in C#, it is 'null', in C/C++ it is 'NULL'.
Look at for example in the C-FAQ, in Section 5 on NULL pointers, specifically, 5.4, 5.5, 5.6 and 5.7 to give you an insight into this.
In a nutshell, the usage and notation of null pointers is dependent on
What language is used?
Semantics and syntax of the language specifications.
What type of compiler?
Type of platform, in terms of how memory is accessed, the processor, bits...
Hope this helps,
Best regards,
Tom.

What exactly is the danger of using magic debug values (such as 0xDEADBEEF) as literals?

It goes without saying that using hard-coded, hex literal pointers is a disaster:
int *i = 0xDEADBEEF;
// god knows if that location is available
However, what exactly is the danger in using hex literals as variable values?
int i = 0xDEADBEEF;
// what can go wrong?
If these values are indeed "dangerous" due to their use in various debugging scenarios, then this means that even if I do not use these literals, any program that during runtime happens to stumble upon one of these values might crash.
Anyone care to explain the real dangers of using hex literals?
Edit: just to clarify, I am not referring to the general use of constants in source code. I am specifically talking about debug-scenario issues that might come up to the use of hex values, with the specific example of 0xDEADBEEF.
There's no more danger in using a hex literal than any other kind of literal.
If your debugging session ends up executing data as code without you intending it to, you're in a world of pain anyway.
Of course, there's the normal "magic value" vs "well-named constant" code smell/cleanliness issue, but that's not really the sort of danger I think you're talking about.
With few exceptions, nothing is "constant".
We prefer to call them "slow variables" -- their value changes so slowly that we don't mind recompiling to change them.
However, we don't want to have many instances of 0x07 all through an application or a test script, where each instance has a different meaning.
We want to put a label on each constant that makes it totally unambiguous what it means.
if( x == 7 )
What does "7" mean in the above statement? Is it the same thing as
d = y / 7;
Is that the same meaning of "7"?
Test Cases are a slightly different problem. We don't need extensive, careful management of each instance of a numeric literal. Instead, we need documentation.
We can -- to an extent -- explain where "7" comes from by including a tiny bit of a hint in the code.
assertEquals( 7, someFunction(3,4), "Expected 7, see paragraph 7 of use case 7" );
A "constant" should be stated -- and named -- exactly once.
A "result" in a unit test isn't the same thing as a constant, and requires a little care in explaining where it came from.
A hex literal is no different than a decimal literal like 1. Any special significance of a value is due to the context of a particular program.
I believe the concern raised in the IP address formatting question earlier today was not related to the use of hex literals in general, but the specific use of 0xDEADBEEF. At least, that's the way I read it.
There is a concern with using 0xDEADBEEF in particular, though in my opinion it is a small one. The problem is that many debuggers and runtime systems have already co-opted this particular value as a marker value to indicate unallocated heap, bad pointers on the stack, etc.
I don't recall off the top of my head just which debugging and runtime systems use this particular value, but I have seen it used this way several times over the years. If you are debugging in one of these environments, the existence of the 0xDEADBEEF constant in your code will be indistinguishable from the values in unallocated RAM or whatever, so at best you will not have as useful RAM dumps, and at worst you will get warnings from the debugger.
Anyhow, that's what I think the original commenter meant when he told you it was bad for "use in various debugging scenarios."
There's no reason why you shouldn't assign 0xdeadbeef to a variable.
But woe betide the programmer who tries to assign decimal 3735928559, or octal 33653337357, or worst of all: binary 11011110101011011011111011101111.
Big Endian or Little Endian?
One danger is when constants are assigned to an array or structure with different sized members; the endian-ness of the compiler or machine (including JVM vs CLR) will affect the ordering of the bytes.
This issue is true of non-constant values, too, of course.
Here's an, admittedly contrived, example. What is the value of buffer[0] after the last line?
const int TEST[] = { 0x01BADA55, 0xDEADBEEF };
char buffer[BUFSZ];
memcpy( buffer, (void*)TEST, sizeof(TEST));
I don't see any problem with using it as a value. Its just a number after all.
There's no danger in using a hard-coded hex value for a pointer (like your first example) in the right context. In particular, when doing very low-level hardware development, this is the way you access memory-mapped registers. (Though it's best to give them names with a #define, for example.) But at the application level you shouldn't ever need to do an assignment like that.
I use CAFEBABE
I haven't seen it used by any debuggers before.
int *i = 0xDEADBEEF;
// god knows if that location is available
int i = 0xDEADBEEF;
// what can go wrong?
The danger that I see is the same in both cases: you've created a flag value that has no immediate context. There's nothing about i in either case that will let me know 100, 1000 or 10000 lines that there is a potentially critical flag value associated with it. What you've planted is a landmine bug that, if I don't remember to check for it in every possible use, I could be faced with a terrible debugging problem. Every use of i will now have to look like this:
if (i != 0xDEADBEEF) { // Curse the original designer to oblivion
// Actual useful work goes here
}
Repeat the above for all of the 7000 instances where you need to use i in your code.
Now, why is the above worse than this?
if (isIProperlyInitialized()) { // Which could just be a boolean
// Actual useful work goes here
}
At a minimum, I can spot several critical issues:
Spelling: I'm a terrible typist. How easily will you spot 0xDAEDBEEF in a code review? Or 0xDEADBEFF? On the other hand, I know that my compile will barf immediately on isIProperlyInitialised() (insert the obligatory s vs. z debate here).
Exposure of meaning. Rather than trying to hide your flags in the code, you've intentionally created a method that the rest of the code can see.
Opportunities for coupling. It's entirely possible that a pointer or reference is connected to a loosely defined cache. An initialization check could be overloaded to check first if the value is in cache, then to try to bring it back into cache and, if all that fails, return false.
In short, it's just as easy to write the code you really need as it is to create a mysterious magic value. The code-maintainer of the future (who quite likely will be you) will thank you.

Why don't languages raise errors on integer overflow by default?

In several modern programming languages (including C++, Java, and C#), the language allows integer overflow to occur at runtime without raising any kind of error condition.
For example, consider this (contrived) C# method, which does not account for the possibility of overflow/underflow. (For brevity, the method also doesn't handle the case where the specified list is a null reference.)
//Returns the sum of the values in the specified list.
private static int sumList(List<int> list)
{
int sum = 0;
foreach (int listItem in list)
{
sum += listItem;
}
return sum;
}
If this method is called as follows:
List<int> list = new List<int>();
list.Add(2000000000);
list.Add(2000000000);
int sum = sumList(list);
An overflow will occur in the sumList() method (because the int type in C# is a 32-bit signed integer, and the sum of the values in the list exceeds the value of the maximum 32-bit signed integer). The sum variable will have a value of -294967296 (not a value of 4000000000); this most likely is not what the (hypothetical) developer of the sumList method intended.
Obviously, there are various techniques that can be used by developers to avoid the possibility of integer overflow, such as using a type like Java's BigInteger, or the checked keyword and /checked compiler switch in C#.
However, the question that I'm interested in is why these languages were designed to by default allow integer overflows to happen in the first place, instead of, for example, raising an exception when an operation is performed at runtime that would result in an overflow. It seems like such behavior would help avoid bugs in cases where a developer neglects to account for the possibility of overflow when writing code that performs an arithmetic operation that could result in overflow. (These languages could have included something like an "unchecked" keyword that could designate a block where integer overflow is permitted to occur without an exception being raised, in those cases where that behavior is explicitly intended by the developer; C# actually does have this.)
Does the answer simply boil down to performance -- the language designers didn't want their respective languages to default to having "slow" arithmetic integer operations where the runtime would need to do extra work to check whether an overflow occurred, on every applicable arithmetic operation -- and this performance consideration outweighed the value of avoiding "silent" failures in the case that an inadvertent overflow occurs?
Are there other reasons for this language design decision as well, other than performance considerations?
In C#, it was a question of performance. Specifically, out-of-box benchmarking.
When C# was new, Microsoft was hoping a lot of C++ developers would switch to it. They knew that many C++ folks thought of C++ as being fast, especially faster than languages that "wasted" time on automatic memory management and the like.
Both potential adopters and magazine reviewers are likely to get a copy of the new C#, install it, build a trivial app that no one would ever write in the real world, run it in a tight loop, and measure how long it took. Then they'd make a decision for their company or publish an article based on that result.
The fact that their test showed C# to be slower than natively compiled C++ is the kind of thing that would turn people off C# quickly. The fact that your C# app is going to catch overflow/underflow automatically is the kind of thing that they might miss. So, it's off by default.
I think it's obvious that 99% of the time we want /checked to be on. It's an unfortunate compromise.
I think performance is a pretty good reason. If you consider every instruction in a typical program that increments an integer, and if instead of the simple op to add 1, it had to check every time if adding 1 would overflow the type, then the cost in extra cycles would be pretty severe.
You work under the assumption that integer overflow is always undesired behavior.
Sometimes integer overflow is desired behavior. One example I've seen is representation of an absolute heading value as a fixed point number. Given an unsigned int, 0 is 0 or 360 degrees and the max 32 bit unsigned integer (0xffffffff) is the biggest value just below 360 degrees.
int main()
{
uint32_t shipsHeadingInDegrees= 0;
// Rotate by a bunch of degrees
shipsHeadingInDegrees += 0x80000000; // 180 degrees
shipsHeadingInDegrees += 0x80000000; // another 180 degrees, overflows
shipsHeadingInDegrees += 0x80000000; // another 180 degrees
// Ships heading now will be 180 degrees
cout << "Ships Heading Is" << (double(shipsHeadingInDegrees) / double(0xffffffff)) * 360.0 << std::endl;
}
There are probably other situations where overflow is acceptable, similar to this example.
C/C++ never mandate trap behaviour. Even the obvious division by 0 is undefined behaviour in C++, not a specified kind of trap.
The C language doesn't have any concept of trapping, unless you count signals.
C++ has a design principle that it doesn't introduce overhead not present in C unless you ask for it. So Stroustrup would not have wanted to mandate that integers behave in a way which requires any explicit checking.
Some early compilers, and lightweight implementations for restricted hardware, don't support exceptions at all, and exceptions can often be disabled with compiler options. Mandating exceptions for language built-ins would be problematic.
Even if C++ had made integers checked, 99% of programmers in the early days would have turned if off for the performance boost...
Because checking for overflow takes time. Each primitive mathematical operation, which normally translates into a single assembly instruction would have to include a check for overflow, resulting in multiple assembly instructions, potentially resulting in a program that is several times slower.
It is likely 99% performance. On x86 would have to check the overflow flag on every operation which would be a huge performance hit.
The other 1% would cover those cases where people are doing fancy bit manipulations or being 'imprecise' in mixing signed and unsigned operations and want the overflow semantics.
Backwards compatibility is a big one. With C, it was assumed that you were paying enough attention to the size of your datatypes that if an over/underflow occurred, that that was what you wanted. Then with C++, C# and Java, very little changed with how the "built-in" data types worked.
If integer overflow is defined as immediately raising a signal, throwing an exception, or otherwise deflecting program execution, then any computations which might overflow will need to be performed in the specified sequence. Even on platforms where integer overflow checking wouldn't cost anything directly, the requirement that integer overflow be trapped at exactly the right point in a program's execution sequence would severely impede many useful optimizations.
If a language were to specify that integer overflows would instead set a latching error flag, were to limit how actions on that flag within a function could affect its value within calling code, and were to provide that the flag need not be set in circumstances where an overflow could not result in erroneous output or behavior, then compilers could generate more efficient code than any kind of manual overflow-checking programmers could use. As a simple example, if one had a function in C that would multiply two numbers and return a result, setting an error flag in case of overflow, a compiler would be required to perform the multiplication whether or not the caller would ever use the result. In a language with looser rules like I described, however, a compiler that determined that nothing ever uses the result of the multiply could infer that overflow could not affect a program's output, and skip the multiply altogether.
From a practical standpoint, most programs don't care about precisely when overflows occur, so much as they need to guarantee that they don't produce erroneous results as a consequence of overflow. Unfortunately, programming languages' integer-overflow-detection semantics have not caught up with what would be necessary to let compilers produce efficient code.
My understanding of why errors would not be raised by default at runtime boils down to the legacy of desiring to create programming languages with ACID-like behavior. Specifically, the tenet that anything that you code it to do (or don't code), it will do (or not do). If you didn't code some error handler, then the machine will "assume" by virtue of no error handler, that you really want to do the ridiculous, crash-prone thing you're telling it to do.
(ACID reference: http://en.wikipedia.org/wiki/ACID)