how to determine signed binary number encoding? - binary

I understand how two's-complement works. I also understand how signed magnitude and one's-complement work, and the advantages that two's-complement has over the other encoding methods.
What I can't figure out, is if I'm asked to convert a signed hex number to dec, e.g. 0xF3C645AC, how do I figure out which encoding method it's using?

You can't.
Those interpretation schemes are not encoded themselves in the data.
Machine don't usually implement two kinds of integer representation on hardware level thoug, so you can safely assume the number is being represented the same way all other integers in the context are.
...in the case it's some exercise/homework, well, interpret for all the possibilities, the teacher will be glad :)

Related

What exactly is a datatype?

I understand what a datatype is (intuitively). But I need the formal definition. I don't understand if it is a set or it's the names 'int' 'float' etc. The formal definition found on wikipedia is confusing.
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
Can anyone help me with that?
Yep. What that's saying is that a data type has three pieces:
The various possible values. So, for example, an eight bit signed integer might have -127..128. This of that as a set of values V.
The operations: so an 8-bit signed integer might have +, -, * (multiply), and / (divide). The full definition would define those as functions from V into V, or possible as a function from V into float for division.
The way it's stored -- I sort of gave it away when I said "eight bit signed integer". The other detail is that I'm assuming a specific representation by the way I showed the range of values.
You might, if you're into object oriented programming, notice that this is very much like the definition of a class, which is defined by the storage used by each object, adn the methods of the class. Providing those parts for some arbitrary thing, but not inheritance rules, gives you what's called an abstract data type.
Update
#Appy, there's some room for differences in the formalities. I was a little subtle because it was late and I was suddenly uncertain if I'd assumed one's complement or two's complement -- of course it's two's complement. So interpretation is included in my description. Abstractly, though, you'd say it is a algebraic structure T=(V,O) where V is a set of values, O a set of functions from V into some arbitrary type -- remember '==' for example will be a function eq:V × V → {0,1} so you can't expect every operation to be into V.
I can define it as a classification of a particular type of information. It is easy for humans to distinguish between different types of data. We can usually tell at a glance whether a number is a percentage, a time, or an amount of money. We do this through special symbols %, :, and $.
Basically it's the concept that I am sure you grock. For computers however a data type is defined and has various associated attributes, like size, like a definition keywork (sometimes), the values it can take (numbers or characters for example) and operations that can be done on it like add subtract for numbers and append on string or compare on a character, etc. These differ from language to language and even from environment to env. (16 - 32 bit ints/ 32 - 64 envs./ etc).
If there is anything I am missing or needs refining please ask as this is fairly open ended.

Real number arithmetic in a general purpose language?

As (hopefully) most of you know, floating point arithmetic is different from real number arithmetic. It's for starters imprecise. Many numbers, especially decimals (0.1, 0.3) cannot be represented, leading to problems like this. A more thorough list can be found here.
Are there any general purpose languages that have built-in support for something closer to real number arithmetic? If not, what are good libraries that support this?
EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i as well.
Though I hate to say it, Fortran. It has extensive support for arbitrary-precision arithmetic and tons of support for big-number calculations. It's ancient and gross, but it gets the job done.
All the numbers used in your examples are algebraic numbers, and can be represented
finitely as roots of polynomials with integer coefficients.
The same cannot be said of real numbers in general, which is easily seen when one
considers that the reals are uncountable, but the set of computer programs is
countable. Therefore most reals will not have a finite representation in code.
What you are looking for is symbolic calculation (MATLAB and other tools used in math and engineering are good at it).
If you want a general purposed language, I think expression tree in C# is good point to start with. In the essence, the ability to store the expression (instead of evaluate the expression into real values) is the key to be able to perform symbolic calculation. Note that expression tree does not provide symbolic calculation, it just provides the data structure that supports symbolic calculation.
This question is interesting, but raises some issues. First, you will never be able to represent all the real numbers using a (even theoretically infinite) computer, for cardinality reasons.
What you are looking for is a "symbolic numbers" datatype. You can imagine some sort of expression tree, with predefined constants, arithmetical operations, and perhaps algebraic (roots of polynomials) and transcendantal (exp, sin, cos, log, etc) functions.
Now the fun part of the story: you cannot find an algorithm which tells whether two such trees are representing the same number (or equivalently, which test whether such a tree is zero). I won't state anything precise, but as a hint, this is similar to the Halting Problem (for computer scientists) or the Gödel Incompleteness Theorem (for mathematicians).
This renders such a class pretty useless.
For some subfields of the reals, you have canonical forms, like a/b for the rationals, or finite algebraic extensions of the rationals (a/b + ic/d for complex rationals, a/b + sqrt(2) * a/b for Q[sqrt(2)], etc). These can be used to represent some particular sets of algebraic numbers.
In practice, this is the most complicated thing you will need. If you have a particular necessity, like ranges of floating point numbers (to prove some result is whithin a specified interval, this is probably the closest you can get to real numbers), or arbitrary precision numbers, you have freely available classes everywhere. Google boost::range for the former, and gmp for the latter.
There are several languages with support for rational and complex numbers. Scheme, for instance, has support built in for arbitrarily precise rational numbers, and complex numbers with either rational, floating point, or integral coefficients:
> (+ 1/2 1/3)
5/6
> (* 3 1+1/2i)
3+3/2i
> (+ 1/2 .5)
1.0
If you want to go beyond rational numbers or complex numbers with rational coefficients, to algebraic numbers such as sqrt(2) or closed-form numbers like e, you will probably have to look beyond general purpose programming languages, and use a special purpose mathematical language like Mathematica or Maxima.
To cover the real numbers with any flair you'll need a symbolic package.
Boost, the C++ project, has a Rational library, but that's only part of the story.
You have irrational numbers in all sorts of forms (pi, base of the natural logarithm, square and cube roots, the Champernowne constant, to name only a few). The only way I know of to handle arithmetic operations is a symbolic package with smarts as to the relationship amongst all of these numbers. Assuming you could express e^pi, how would you add one to it? Or take the square root of it?
Mathematica might handle these cases.
Java: java.math.BigDecimal
C#: decimal
A lot of languages have support for that: Java has BigDecimal, Perl has Math::BigFloat and Math::BigRat, Haskell has Integer and a lot of libraries and languages are listed in the wikipedia.
Ada natively supports fixed-point math as well as floating-point. Fixed-point can be much more exact than floating-point, as long as the number's exponents remain in range.
If you need floating-points, but more precision than IEEE gives, there are bignum packages around for just about every language.
I think that's about the best you can do. Neither scheme can exactly represent repeating decimals (like 1/3). It would probably be possible to come up with a scheme that does, but I know of no language that supports such a thing with a built-in type. Even that won't help you with irrational numbers (like pi and e). I believe there's even a theorem that says there will always be unrepresentable numbers, no matter what scheme you come up with.
EDIT: Arbitrary precision decimal
datatypes are not what I am looking
for. I want to be able to represent
numbers like 1/3, sqrt(3), or 1 + 2i
as well.
Ruby has a Rational class, so 1/3 can be expressed exactly as Rational(1,3). It also has a Complex class.
Scheme defines rationals, bignums, floating point and complex numbers. An implementation is not required to support them all, but if they are present, you can mix them and they will to "the right thing".
While its not "built-in", I think C++ (maybe C#) is your best bet. There are classes out there that have been written for this purpose.
http://www.oonumerics.org/oon/

Fortran: Binary Subtraction (is there a binary type?)

I have a homework question regarding operator precedence in Fortran. In order to understand the question I need to know how to use binary numbers in Fortran. Can someone give me an example of how to use binary numbers in fortran? (Specifically with subtraction).
You need to be a bit clearer about what you mean by 'binary numbers in fortran'. In one sense, not terribly useful, all Fortran numbers are binary, as indeed most numbers in most programming languages are binary once they get onto the computer.
Fortran, in the standard at least, does not have the concept of a binary intrinsic data type, it has integers, reals, complex numbers, logicals and characters. Of course, your compiler might implement other types as well, but you don't tell us what that compiler is.
Standard Fortran does have the concept of binary input and output formats -- look for the 'B edit descriptor' in your documentation. This can be used on input and output to read and write binary representations of integers. But the numbers are, to Fortran, integers. So, if you were to read a, b as binary numbers, you would subtract them with the statement a-b.
Fortran does have a set of bit intrinsic procedures, which go by the names iand, ibclr, ieor and so forth but these are really for bit-twiddling.
If you can clarify your questions, I, or some other SOer, might be able to clarify an answer.
Finally, I think it's rather odd that you think you need to know about Fortran 'binary' numbers in order to understand operator precedence. Perhaps you could explain a bit more.

Should implicit octal encoding be removed or changed in programming languages?

I was looking at this question. Basically having a leading zero causes the number to be interpreted as octal. I've ran into this problem numerous times in multiple languages.
Why doesn't the language explicitly require you to specify octal with a function call or a type (in strong typed languages) like:
oct variable = 2;
I can understand why hexadecimal (0x0234) has this format. Hex is pretty useful. An integer from the database will never have an x in it.
But octal numbers 0123 look like ints and are a pain to deal with. I've never used octal for anything.
Can anyone explain the rationale behind this usage? Is it just a bit of historical cruft?
It's largely historic. The best solution I've seen is in the new version of Python, where octal is indicated with a special prefix character "o", much like hexadecimal's "x" prefix:
0o10 == 0x8 == 8
99.9% of the reason it exists is to support chmod() calls, i.e. chmod(fd, 0755).
It does rather seem like a format more like hex's would be superior.
It exists since working with 3-bit segments is almost as useful as working with 4-bit segments. This was more true in the past (e.g., seven-segment LEDs, chmod, etc.).
The real question is why haven't more languages adopted octal and binary notations in a more regular fashion:
10 == 0b1010 == 0o12 == 0x0A
I know that Python finally adopted the 0o8 notation... not sure if they have adopted the binary one as well. I guess a better question is Why does this still trip people up?
I hate this too, I don't know why it's been carried forward into so many modern languages. I once knew someone who had a zip code like "09827" when he lived in NYC. Sometimes he had to input his zip code as "9827," because the leading zero would lead to error messages (since 9's and 8's are illegal characters in octal numbers).
Yes, it's historical. C uses this way to specify literals in octal, and possibly it was used somewhere before that.
I've experienced it in Javascript, where parsing dates stops working in august. Up to july it works as '07' parsed as octal is still seven, but '08' is not a valid number... (The solution is to specify the number base in the parseInt call.)
In C# there are no binary or octal literals, perhaps the reasoning is that you shouldn't do as much bit fiddling that the language needs it...
Personally, I blame the programmer in this case. Why are you formatting an integer by zero padding? Zero padding is for strings, not numeric types.

Why do most languages not allow binary numbers?

Why do most computer programming languages not allow binary numbers to be used like decimal or hexadecimal?
In VB.NET you could write a hexadecimal number like &H4
In C you could write a hexadecimal number like 0x04
Why not allow binary numbers?
&B010101
0y1010
Bonus Points!... What languages do allow binary numbers?
Edit
Wow! - So the majority think it's because of brevity and poor old "waves" thinks it's due to the technical aspects of the binary representation.
Because hexadecimal (and rarely octal) literals are more compact and people using them usually can convert between hexadecimal and binary faster than deciphering a binary number.
Python 2.6+ allows binary literals, and so do Ruby and Java 7, where you can use the underscore to make byte boundaries obvious. For example, the hexadedecimal value 0x1b2a can now be written as 0b00011011_00101010.
In C++0x with user defined literals binary numbers will be supported, I'm not sure if it will be part of the standard but at the worst you'll be able to enable it yourself
int operator "" _B(int i);
assert( 1010_B == 10);
In order for a bit representation to be meaningful, you need to know how to interpret it.
You would need to specify what the type of binary number you're using (signed/unsigned, twos-compliment, ones-compliment, signed-magnitude).
The only languages I've ever used that properly support binary numbers are hardware description languages (Verilog, VHDL, and the like). They all have strict (and often confusing) definitions of how numbers entered in binary are treated.
See perldoc perlnumber:
NAME
perlnumber - semantics of numbers and numeric operations in Perl
SYNOPSIS
$n = 1234; # decimal integer
$n = 0b1110011; # binary integer
$n = 01234; # octal integer
$n = 0x1234; # hexadecimal integer
$n = 12.34e-56; # exponential notation
$n = "-12.34e56"; # number specified as a string
$n = "1234"; # number specified as a string
Slightly off-topic, but newer versions of GCC added a C extension that allows binary literals. So if you only ever compile with GCC, you can use them. Documenation is here.
Common Lisp allows binary numbers, using #b... (bits going from highest-to-lowest power of 2). Most of the time, it's at least as convenient to use hexadecimal numbers, though (by using #x...), as it's fairly easy to convert between hexadecimal and binary numbers in your head.
Hex and octal are just shorter ways to write binary. Would you really want a 64-character long constant defined in your code?
Common wisdom holds that long strings of binary digits, eg 32 bits for an int, are too difficult for people to conveniently parse and manipulate. Hex is generally considered easier, though I've not used either enough to have developed a preference.
Ruby which, as already mentioned, attempts to resolve this by allowing _ to be liberally inserted in the literal , allowing, for example:
irb(main):005:0> 1111_0111_1111_1111_0011_1100
=> 111101111111111100111100
D supports binary literals using the syntax 0[bB][01]+, e.g. 0b1001. It also allows embedded _ characters in numeric literals to allow them to be read more easily.
Java 7 now has support for binary literals. So you can simply write 0b110101. There is not much documentation on this feature. The only reference I could find is here.
While C only have native support for 8, 10 or 16 as base, it is actually not that hard to write a pre-processor macro that makes writing 8 bit binary numbers quite simple and readable:
#define BIN(d7,d6,d5,d4, d3,d2,d1,d0) \
( \
((d7)<<7) + ((d6)<<6) + ((d5)<<5) + ((d4)<<4) + \
((d3)<<3) + ((d2)<<2) + ((d1)<<1) + ((d0)<<0) \
)
int my_mask = BIN(1,1,1,0, 0,0,0,0);
This can also be used for C++.
for the record, and to answer this:
Bonus Points!... What languages do allow binary numbers?
Specman (aka e) allows binary numbers. Though to be honest, it's not quite a general purpose language.
Every language should support binary literals. I go nuts not having them!
Bonus Points!... What languages do allow binary numbers?
Icon allows literals in any base from 2 to 16, and possibly up to 36 (my memory grows dim).
It seems the from a readability and usability standpoint, the hex representation is a better way of defining binary numbers. The fact that they don't add it is probably more of user need that a technology limitation.
I expect that the language designers just didn't see enough of a need to add binary numbers. The average coder can parse hex just as well as binary when handling flags or bit masks. It's great that some languages support binary as a representation, but I think on average it would be little used. Although binary -- if available in C, C++, Java, C#, would probably be used more than octal!
In Smalltalk it's like 2r1010. You can use any base up to 36 or so.
Hex is just less verbose, and can express anything a binary number can.
Ruby has nice support for binary numbers, if you really want it. 0b11011, etc.
In Pop-11 you can use a prefix made of number (2 to 32) + colon to indicate the base, e.g.
2:11111111 = 255
3:11111111 = 3280
16:11111111 = 286331153
31:11111111 = 28429701248
32:11111111 = 35468117025
Forth has always allowed numbers of any base to be used (up to size limit of the CPU of course). Want to use binary: 2 BASE ! octal: 8 BASE ! etc. Want to work with time? 60 BASE ! These examples are all entered from base set to 10 decimal. To change base you must represent the base desired from the current number base. If in binary and you want to switch back to decimal then 1010 BASE ! will work. Most Forth implementations have 'words' to shift to common bases, e.g. DECIMAL, HEX, OCTAL, and BINARY.
Although it's not direct, most languages can also parse a string. Java can convert "10101000" into an int with a method.
Not that this is efficient or anything... Just saying it's there. If it were done in a static initialization block, it might even be done at compile time depending on the compiler.
If you're any good at binary, even with a short number it's pretty straight forward to see 0x3c as 4 ones followed by 2 zeros, whereas even that short a number in binary would be 0b111100 which might make your eyes hurt before you were certain of the number of ones.
0xff9f is exactly 4+4+1 ones, 2 zeros and 5 ones (on sight the bitmask is obvious). Trying to count out 0b1111111110011111 is much more irritating.
I think the issue may be that language designers are always heavily invested in hex/octal/binary/whatever and just think this way. If you are less experienced, I can totally see how these conversions wouldn't be as obvious.
Hey, that reminds me of something I came up with while thinking about base conversions. A sequence--I didn't think anyone could figure out the "Next Number", but one guy actually did, so it is solvable. Give it a try:
10
11
12
13
14
15
16
21
23
31
111
?
Edit:
By the way, this sequence can be created by feeding sequential numbers into single built-in function in most languages (Java for sure).