Should implicit octal encoding be removed or changed in programming languages?

Should implicit octal encoding be removed or changed in programming languages? - language-agnostic

I was looking at this question. Basically having a leading zero causes the number to be interpreted as octal. I've ran into this problem numerous times in multiple languages.
Why doesn't the language explicitly require you to specify octal with a function call or a type (in strong typed languages) like:
oct variable = 2;
I can understand why hexadecimal (0x0234) has this format. Hex is pretty useful. An integer from the database will never have an x in it.
But octal numbers 0123 look like ints and are a pain to deal with. I've never used octal for anything.
Can anyone explain the rationale behind this usage? Is it just a bit of historical cruft?

It's largely historic. The best solution I've seen is in the new version of Python, where octal is indicated with a special prefix character "o", much like hexadecimal's "x" prefix:
0o10 == 0x8 == 8

99.9% of the reason it exists is to support chmod() calls, i.e. chmod(fd, 0755).
It does rather seem like a format more like hex's would be superior.

It exists since working with 3-bit segments is almost as useful as working with 4-bit segments. This was more true in the past (e.g., seven-segment LEDs, chmod, etc.).
The real question is why haven't more languages adopted octal and binary notations in a more regular fashion:
10 == 0b1010 == 0o12 == 0x0A
I know that Python finally adopted the 0o8 notation... not sure if they have adopted the binary one as well. I guess a better question is Why does this still trip people up?

I hate this too, I don't know why it's been carried forward into so many modern languages. I once knew someone who had a zip code like "09827" when he lived in NYC. Sometimes he had to input his zip code as "9827," because the leading zero would lead to error messages (since 9's and 8's are illegal characters in octal numbers).

Yes, it's historical. C uses this way to specify literals in octal, and possibly it was used somewhere before that.
I've experienced it in Javascript, where parsing dates stops working in august. Up to july it works as '07' parsed as octal is still seven, but '08' is not a valid number... (The solution is to specify the number base in the parseInt call.)
In C# there are no binary or octal literals, perhaps the reasoning is that you shouldn't do as much bit fiddling that the language needs it...

Personally, I blame the programmer in this case. Why are you formatting an integer by zero padding? Zero padding is for strings, not numeric types.

Related

how to determine signed binary number encoding?

I understand how two's-complement works. I also understand how signed magnitude and one's-complement work, and the advantages that two's-complement has over the other encoding methods.
What I can't figure out, is if I'm asked to convert a signed hex number to dec, e.g. 0xF3C645AC, how do I figure out which encoding method it's using?

You can't.
Those interpretation schemes are not encoded themselves in the data.
Machine don't usually implement two kinds of integer representation on hardware level thoug, so you can safely assume the number is being represented the same way all other integers in the context are.
...in the case it's some exercise/homework, well, interpret for all the possibilities, the teacher will be glad :)

Why are leading zeroes used to represent octal numbers?

I've always wondered why leading zeroes (0) are used to represent octal numbers, instead of — for example — 0o. The use of 0o would be just as helpful, but would not cause as many problems as leading 0es (e.g. parseInt('08'); in JavaScript). What are the reason(s) behind this design choice?

All modern languages import this convention from C, which imported it from B, which imported it from BCPL.
Except BCPL used #1234 for octal and #x1234 for hexadecimal. B has departed from this convention because # was an unary operator in B (integer to floating point conversion), so #1234 could not be used, and # as a base indicator was replaced with 0.
The designers of B tried to make the syntax very compact. I guess this is the reason they did not use a two-character prefix.

Worth noting that in Python 3.0, they decided that octal literals must be prefixed with '0o' and the old '0' prefix became a SyntaxError, for the exact reasons you mention in your question
https://www.python.org/dev/peps/pep-3127/#removal-of-old-octal-syntax

"0b" is often used for binary rather than for octal. The leading "0" is, I suspect for "O -ctal".
If you know you are going to be parsing octal then use parseInt('08', 10); to make it treat the number as base ten.

Real number arithmetic in a general purpose language?

As (hopefully) most of you know, floating point arithmetic is different from real number arithmetic. It's for starters imprecise. Many numbers, especially decimals (0.1, 0.3) cannot be represented, leading to problems like this. A more thorough list can be found here.
Are there any general purpose languages that have built-in support for something closer to real number arithmetic? If not, what are good libraries that support this?
EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i as well.

Though I hate to say it, Fortran. It has extensive support for arbitrary-precision arithmetic and tons of support for big-number calculations. It's ancient and gross, but it gets the job done.

All the numbers used in your examples are algebraic numbers, and can be represented
finitely as roots of polynomials with integer coefficients.
The same cannot be said of real numbers in general, which is easily seen when one
considers that the reals are uncountable, but the set of computer programs is
countable. Therefore most reals will not have a finite representation in code.

What you are looking for is symbolic calculation (MATLAB and other tools used in math and engineering are good at it).
If you want a general purposed language, I think expression tree in C# is good point to start with. In the essence, the ability to store the expression (instead of evaluate the expression into real values) is the key to be able to perform symbolic calculation. Note that expression tree does not provide symbolic calculation, it just provides the data structure that supports symbolic calculation.

This question is interesting, but raises some issues. First, you will never be able to represent all the real numbers using a (even theoretically infinite) computer, for cardinality reasons.
What you are looking for is a "symbolic numbers" datatype. You can imagine some sort of expression tree, with predefined constants, arithmetical operations, and perhaps algebraic (roots of polynomials) and transcendantal (exp, sin, cos, log, etc) functions.
Now the fun part of the story: you cannot find an algorithm which tells whether two such trees are representing the same number (or equivalently, which test whether such a tree is zero). I won't state anything precise, but as a hint, this is similar to the Halting Problem (for computer scientists) or the Gödel Incompleteness Theorem (for mathematicians).
This renders such a class pretty useless.
For some subfields of the reals, you have canonical forms, like a/b for the rationals, or finite algebraic extensions of the rationals (a/b + ic/d for complex rationals, a/b + sqrt(2) * a/b for Q[sqrt(2)], etc). These can be used to represent some particular sets of algebraic numbers.
In practice, this is the most complicated thing you will need. If you have a particular necessity, like ranges of floating point numbers (to prove some result is whithin a specified interval, this is probably the closest you can get to real numbers), or arbitrary precision numbers, you have freely available classes everywhere. Google boost::range for the former, and gmp for the latter.

There are several languages with support for rational and complex numbers. Scheme, for instance, has support built in for arbitrarily precise rational numbers, and complex numbers with either rational, floating point, or integral coefficients:
> (+ 1/2 1/3)
5/6
> (* 3 1+1/2i)
3+3/2i
> (+ 1/2 .5)
1.0
If you want to go beyond rational numbers or complex numbers with rational coefficients, to algebraic numbers such as sqrt(2) or closed-form numbers like e, you will probably have to look beyond general purpose programming languages, and use a special purpose mathematical language like Mathematica or Maxima.

To cover the real numbers with any flair you'll need a symbolic package.
Boost, the C++ project, has a Rational library, but that's only part of the story.
You have irrational numbers in all sorts of forms (pi, base of the natural logarithm, square and cube roots, the Champernowne constant, to name only a few). The only way I know of to handle arithmetic operations is a symbolic package with smarts as to the relationship amongst all of these numbers. Assuming you could express e^pi, how would you add one to it? Or take the square root of it?
Mathematica might handle these cases.

Java: java.math.BigDecimal
C#: decimal

A lot of languages have support for that: Java has BigDecimal, Perl has Math::BigFloat and Math::BigRat, Haskell has Integer and a lot of libraries and languages are listed in the wikipedia.

Ada natively supports fixed-point math as well as floating-point. Fixed-point can be much more exact than floating-point, as long as the number's exponents remain in range.
If you need floating-points, but more precision than IEEE gives, there are bignum packages around for just about every language.
I think that's about the best you can do. Neither scheme can exactly represent repeating decimals (like 1/3). It would probably be possible to come up with a scheme that does, but I know of no language that supports such a thing with a built-in type. Even that won't help you with irrational numbers (like pi and e). I believe there's even a theorem that says there will always be unrepresentable numbers, no matter what scheme you come up with.

EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i
as well.
Ruby has a Rational class, so 1/3 can be expressed exactly as Rational(1,3). It also has a Complex class.

Scheme defines rationals, bignums, floating point and complex numbers. An implementation is not required to support them all, but if they are present, you can mix them and they will to "the right thing".

While its not "built-in", I think C++ (maybe C#) is your best bet. There are classes out there that have been written for this purpose.
http://www.oonumerics.org/oon/

Why do most languages not allow binary numbers?

Why do most computer programming languages not allow binary numbers to be used like decimal or hexadecimal?
In VB.NET you could write a hexadecimal number like &H4
In C you could write a hexadecimal number like 0x04
Why not allow binary numbers?
&B010101
0y1010
Bonus Points!... What languages do allow binary numbers?
Edit
Wow! - So the majority think it's because of brevity and poor old "waves" thinks it's due to the technical aspects of the binary representation.

Because hexadecimal (and rarely octal) literals are more compact and people using them usually can convert between hexadecimal and binary faster than deciphering a binary number.
Python 2.6+ allows binary literals, and so do Ruby and Java 7, where you can use the underscore to make byte boundaries obvious. For example, the hexadedecimal value 0x1b2a can now be written as 0b00011011_00101010.

In C++0x with user defined literals binary numbers will be supported, I'm not sure if it will be part of the standard but at the worst you'll be able to enable it yourself
int operator "" _B(int i);
assert( 1010_B == 10);

In order for a bit representation to be meaningful, you need to know how to interpret it.
You would need to specify what the type of binary number you're using (signed/unsigned, twos-compliment, ones-compliment, signed-magnitude).
The only languages I've ever used that properly support binary numbers are hardware description languages (Verilog, VHDL, and the like). They all have strict (and often confusing) definitions of how numbers entered in binary are treated.

See perldoc perlnumber:
NAME
perlnumber - semantics of numbers and numeric operations in Perl
SYNOPSIS
$n = 1234; # decimal integer
$n = 0b1110011; # binary integer
$n = 01234; # octal integer
$n = 0x1234; # hexadecimal integer
$n = 12.34e-56; # exponential notation
$n = "-12.34e56"; # number specified as a string
$n = "1234"; # number specified as a string

Slightly off-topic, but newer versions of GCC added a C extension that allows binary literals. So if you only ever compile with GCC, you can use them. Documenation is here.

Common Lisp allows binary numbers, using #b... (bits going from highest-to-lowest power of 2). Most of the time, it's at least as convenient to use hexadecimal numbers, though (by using #x...), as it's fairly easy to convert between hexadecimal and binary numbers in your head.

Hex and octal are just shorter ways to write binary. Would you really want a 64-character long constant defined in your code?

Common wisdom holds that long strings of binary digits, eg 32 bits for an int, are too difficult for people to conveniently parse and manipulate. Hex is generally considered easier, though I've not used either enough to have developed a preference.
Ruby which, as already mentioned, attempts to resolve this by allowing _ to be liberally inserted in the literal , allowing, for example:
irb(main):005:0> 1111_0111_1111_1111_0011_1100
=> 111101111111111100111100

D supports binary literals using the syntax 0[bB][01]+, e.g. 0b1001. It also allows embedded _ characters in numeric literals to allow them to be read more easily.

Java 7 now has support for binary literals. So you can simply write 0b110101. There is not much documentation on this feature. The only reference I could find is here.

While C only have native support for 8, 10 or 16 as base, it is actually not that hard to write a pre-processor macro that makes writing 8 bit binary numbers quite simple and readable:
#define BIN(d7,d6,d5,d4, d3,d2,d1,d0) \
( \
((d7)<<7) + ((d6)<<6) + ((d5)<<5) + ((d4)<<4) + \
((d3)<<3) + ((d2)<<2) + ((d1)<<1) + ((d0)<<0) \
)
int my_mask = BIN(1,1,1,0, 0,0,0,0);
This can also be used for C++.

for the record, and to answer this:
Bonus Points!... What languages do allow binary numbers?
Specman (aka e) allows binary numbers. Though to be honest, it's not quite a general purpose language.

Every language should support binary literals. I go nuts not having them!
Bonus Points!... What languages do allow binary numbers?
Icon allows literals in any base from 2 to 16, and possibly up to 36 (my memory grows dim).

It seems the from a readability and usability standpoint, the hex representation is a better way of defining binary numbers. The fact that they don't add it is probably more of user need that a technology limitation.

I expect that the language designers just didn't see enough of a need to add binary numbers. The average coder can parse hex just as well as binary when handling flags or bit masks. It's great that some languages support binary as a representation, but I think on average it would be little used. Although binary -- if available in C, C++, Java, C#, would probably be used more than octal!

In Smalltalk it's like 2r1010. You can use any base up to 36 or so.

Hex is just less verbose, and can express anything a binary number can.
Ruby has nice support for binary numbers, if you really want it. 0b11011, etc.

In Pop-11 you can use a prefix made of number (2 to 32) + colon to indicate the base, e.g.
2:11111111 = 255
3:11111111 = 3280
16:11111111 = 286331153
31:11111111 = 28429701248
32:11111111 = 35468117025

Forth has always allowed numbers of any base to be used (up to size limit of the CPU of course). Want to use binary: 2 BASE ! octal: 8 BASE ! etc. Want to work with time? 60 BASE ! These examples are all entered from base set to 10 decimal. To change base you must represent the base desired from the current number base. If in binary and you want to switch back to decimal then 1010 BASE ! will work. Most Forth implementations have 'words' to shift to common bases, e.g. DECIMAL, HEX, OCTAL, and BINARY.

Although it's not direct, most languages can also parse a string. Java can convert "10101000" into an int with a method.
Not that this is efficient or anything... Just saying it's there. If it were done in a static initialization block, it might even be done at compile time depending on the compiler.
If you're any good at binary, even with a short number it's pretty straight forward to see 0x3c as 4 ones followed by 2 zeros, whereas even that short a number in binary would be 0b111100 which might make your eyes hurt before you were certain of the number of ones.
0xff9f is exactly 4+4+1 ones, 2 zeros and 5 ones (on sight the bitmask is obvious). Trying to count out 0b1111111110011111 is much more irritating.
I think the issue may be that language designers are always heavily invested in hex/octal/binary/whatever and just think this way. If you are less experienced, I can totally see how these conversions wouldn't be as obvious.
Hey, that reminds me of something I came up with while thinking about base conversions. A sequence--I didn't think anyone could figure out the "Next Number", but one guy actually did, so it is solvable. Give it a try:
10
11
12
13
14
15
16
21
23
31
111
?
Edit:
By the way, this sequence can be created by feeding sequential numbers into single built-in function in most languages (Java for sure).

Are digits represented in sequence in all text encodings?

This question is language agnostic but is inspired by these c/c++ questions.
How to convert a single char into an int
Char to int conversion in C
Is it safe to assume that the characters for digits (0123456789) appear contigiously in all text encodings?
i.e. is it safe to assume that
'9'-'8' = 1
'9'-'7' = 2
...
'9'-'0' = 9
in all encodings?
I'm looking forward to a definitive answer to this one :)
Thanks,
Update: OK, let me limit all encodings to mean anything as old as ASCII and/or EBCDIC and afterwards. Sandscrit I'm not so worried about . . .

I don't know about all encodings, but at least in ASCII and <shudder> EBCDIC, the digits 0-9 all come consecutively and in increasing numeric order. Which means that all ASCII- and EBCDIC-based encodings should also have their digits in order. So for pretty much anything you'll encounter, barring Morse code or worse, I'm going to say yes.

You're going to find it hard to prove a negative. Nobody can possibly know every text encoding ever invented.
All encodings in common use today (except EBCDIC, is it still in common use?) are supersets of ASCII. I'd say you're more likely to win the lottery than you are to find a practical environment where the strict ordering of '0' to '9' doesn't hold.

Both the C++ Standard and the C standard require that this be so, for C++ and C program text.

According to K&R ANSI C it is.
Excerpt:
..."This particular program relies on the properties of the character representation of the digits. For example, the test
if (c >= '0' && c <= '9') ...
determines whether the character in c is a digit. If it is, the numeric value of that
digit is
c - '0'
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets...."

All text encodings I know of typically order each representation of digits sequentially. However, your question becomes a lot broader when you include all of the other representations of digits in other encodings, such as Japanese: １２３４５６７８９０. Notice how the characters for the numbers are different? Well, they are actually different code points. So, I really think the answer to your question is a hard maybe, since there are so many encodings out there and they have multiple representations of digits in them.
A better question is to ask yourself, why do I need to count on digits to be in sequential code points in the first place?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008