I'm reading Concepts of Programming Languages (Sebesta 10th edition) and within the text it defines Orthogonality as meaning "that a relatively small set of primitive constructs can be combined in a relatively small number of ways to build the control and data structures of the language". But I'm confused as to what makes a certain language feature orthogonal. Because I probably won't explain the text in all its glory :), I've included below the appropriate text.
We can illustrate the use of orthogonality as a design concept by comparing
one aspect of the assembly languages of the IBM mainframe computers
and the VAX series of minicomputers. We consider a single simple situation:
adding two 32-bit integer values that reside in either memory or registers and
replacing one of the two values with the sum. The IBM mainframes have two
instructions for this purpose, which have the forms
A Reg1, memory_cell
AR Reg1, Reg2
where Reg1 and Reg2 represent registers. The semantics of these are:
Reg1 ← contents(Reg1) + contents(memory_cell)
Reg1 ← contents(Reg1) + contents(Reg2)
The VAX addition instruction for 32-bit integer values is
ADDL operand_1, operand_2
whose semantics is
operand_2 ← contents(operand_1) + contents(operand_2)
In this case, either operand can be a register or a memory cell.
The VAX instruction design is orthogonal in that a single instruction can
use either registers or memory cells as the operands. There are two ways to
specify operands, which can be combined in all possible ways. The IBM design
is not orthogonal. Only two out of four operand combinations possibilities are
legal, and the two require different instructions, A and AR.>
So my main question is why is it that the VAX instruction is orthogonal. Is it just because it's construct takes less instructions and everything can be completed in a single line? What if the VAX series computers took two instructions and the IBM took three, would the VAX language be more orthogonal? Any help would be appreciated.
Well. Text does not decide if it's about orhogonal languages or instruction sets. With instruction sets distinct feature is addressing modes you can use (or can not use). If you can use all addressing mode wih all (or most) instructions the instruction set is said to be orthogonal (see http://en.m.wikipedia.org/wiki/Orthogonal_instruction_set#section_1).
Related
This Flash game has a lot of players including me and some friends. We noticed the same thing can run differently for different people. The math in the simulation is definitely to blame. Whether the cause is in hardware, OS, browser, 32-bit/64-bit, etc. is not really known. But with the combinations we have to test with, we've gotten 5 distinct end results from the same simulation starting conditions, and can likely get more.
This makes me wonder, does Actionscript have a floating point math specification? If so, what does it say about the accuracy and determinism of the computations?
I compare to Java, which differentiates between regular floating point math with the Math class and deterministic floating point with the StrictMath class and strictfp keyword. Both are always within 1 ulp of the exact result, this also implies the regular math and strict math always give results within 1 ulp of each other for a single operation or function call. The docs are very clear about this. I'd expect other respectable languages to have something similar, saying how accurate their floating point computations are and if they give the same results everywhere.
Update since some people have been saying the game is dishonest:
Some others have taken apart the swf and even made mods for it, they've seen the game engine and can confirm there is no randomness. Box2d is used for its physics. If a design ever does run differently on subsequent runs, it has actually changed due to some bug, usually this is a visible difference, but if not, you can check the raw data with this tool and see it is different. Different starting conditions as expected get different end results.
As for what we know so far, this is results on a test level:
For example, if I am running 32-bit Chrome on my desktop (AMD A10-5700 as CPU), I will always get that result of "946 ticks". But if I run on Firefox or Internet Explorer instead I always get the result of "794 ticks".
Actionscript doesn't really have a math specification in that sense. This is the closest you'll get:
https://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/Math.html
It says at the bottom of the top section:
The Math functions acos, asin, atan, atan2, cos, exp, log, pow, sin, and sqrt may result in slightly different values depending on the algorithms used by the CPU or operating system. Flash runtimes call on the CPU (or operating system if the CPU doesn't support floating point calculations) when performing the calculations for the listed functions, and results have shown slight variations depending upon the CPU or operating system in use.
So to answer our two questions:
What does it say about accuracy? Nothing, actually. At no point does it mention a limit to how inaccurate a result can be.
What does it say about determinism? Hardware and operating system are definitely factors, so it is platform-dependent. No confirmation for other factors.
If you want to look any deeper, you're on your own.
According to the docs, Actionscript has a catch-all Number data type in addition to int and uint types:
The Number data type uses the 64-bit double-precision format as specified by the IEEE Standard for Binary Floating-Point Arithmetic (IEEE-754). This standard dictates how floating-point numbers are stored using the 64 available bits. One bit is used to designate whether the number is positive or negative. Eleven bits are used for the exponent, which is stored as base 2. The remaining 52 bits are used to store the significand (also called mantissa), the number that is raised to the power indicated by the exponent.
By using some of its bits to store an exponent, the Number data type can store floating-point numbers significantly larger than if it used all of its bits for the significand. For example, if the Number data type used all 64 bits to store the significand, it could store a number as large as 265 – 1. By using 11 bits to store an exponent, the Number data type can raise its significand to a power of 21023.
Although this range of numbers is enormous, it comes at the cost of precision. Because the Number data type uses 52 bits to store the significand, numbers that require more than 52 bits for accurate representation, such as the fraction 1/3, are only approximations. If your application requires absolute precision with decimal numbers, use software that implements decimal floating-point arithmetic as opposed to binary floating-point arithmetic.
This could account for the varying results you're seeing.
I was wondering why a computer would need binary code converters to convert from BCD to Excess-3 for example. Why is this necessary can't computers just use one form of binary code.
Some older forms of binary representation persist even after a newer, "better" form comes into use. For example, legacy hardware that is still in use running legacy code that would be too costly to rewrite. Word lengths were not standardized in the early years of computing, so machines with words varying from 5 to 12 bits in length naturally will require different schemes for representing the same numbers.
In some cases, a company might persist in using a particular representation for self-compatibility (i.e., with the company's older products) reasons, or because it's an ingrained habit or "the company way." For example, the use of big-endian representation in Motorola and PowerPC chips vs. little-endian representation in Intel chips. (Though note that many PowerPC processors support both types of endian-ness, even if manufacturers typically only use one in a product.)
The previous paragraph only really touches upon byte ordering, but that can still be an issue for data interchange.
Even for BCD, there are many ways to store it (e.g., 1 BCD digit per word, or 2 BCD digits packed per byte). IBM has a clever representation called zoned decimal where they store a value in the high-order nybble which, combined with the BCD value in the low-order nybble, forms an EBCDIC character representing the value. This is pretty useful if you're married to the concept of representing characters using EBCDIC instead of ASCII (and using BCD instead of 2's complement or unsigned binary).
Tangentially related: IBM mainframes from the 1960s apparently converted BCD into an intermediate form called qui-binary before performing an arithmetic operation, then converted the result back to BCD. This is sort of a Rube Goldberg contraption, but according to the linked article, the intermediate form gives some error detection benefits.
The IBM System/360 (and probably a bunch of newer machines) supported both packed BCD and pure binary representations, though you have to watch out for IBM nomenclature — I have heard an old IBMer refer to BCD as "binary," and pure binary (unsigned, 2's complement, whatever) as "binary coded hex." This provides a lot of flexibility; some data may naturally be best represented in one format, some in the other, and the machine provides instructions to convert between forms conveniently.
In the case of floating point arithmetic, there are some values that cannot be represented exactly in binary floating point, but can be with BCD or a similar representation. For example, the number 0.1 has no exact binary floating point equivalent. This is why BCD and fixed-point arithmetic are preferred for things like representing amounts of currency, where you need to exactly represent things like $3.51 and can't allow floating point error to creep in when adding.
Intended application is important. Arbitrary precision arithmetic will require a different representation strategy compared to the fixed-width registers in your CPU (e.g., Java's BigDecimal class). Many languages support arbitrary precision (e.g., Scheme, Haskell), though the underlying implementation of arbitrary precision numbers varies. I'm honestly not sure what is preferable for arbitrary precision, a BCD-type scheme or a denser pure binary representation. In the case of Java's BigDecimal, conversion from binary floating point to BigDecimal is best done by first converting to a String — this makes such conversions potentially inefficient, so you really need to know ahead of time whether float or double is good enough, or whether you really need arbitrary precision, and when.
Another tangent: Groovy, a JVM language, quietly treats all floating point numeric literals in code as BigDecimal values, and uses BigDecimal arithmetic in preference to float or double. That's one reason Groovy is very popular with the insurance industry.
tl;dr There is no one-size-fits-all numeric data type, and as long as that remains the case (probably the heat death of the universe), you'll need to convert between representations.
I've seen quite a few examples where binary numbers are being used in code, like 32,64,128 and so on (for instance, very well known example - minecraft)
I want to ask, does using binary numbers in such high level languages as Java / C++ help anything?
I know assembly and that you would always rather use these because in low level language it overcomplicates things if you go above register limit.
Will programs run any faster/save up more memory if you use binary numbers?
As with most things, "it depends".
In compiled languages, the better compilers will deduce that slow machine instructions can sometimes be done with different faster machine instructions (but only for special values, such as powers of two). Sometimes coders know this and program accordingly. (e.g. multiplying by a power of two is cheap)
Other times, algorithms are suited towards representations involving powers of two (e.g. many divide and conquer algorithms like the Fast Fourier Transform or a merge sort).
Yet other times, it's the most compact way to represent boolean values (like a bitmask).
And on top of that, other times it's more efficiency for memory purposes (typically because it's so fast do to multiply and divide logic with powers of two, the OS/hardware/etc will use cache line / page sizes / etc that are powers of two, so you'd do well to have nice power of two sizes for your important data structures).
And then, on top of that, other times.. programmers are just so used to using powers of two that they simply do it because it seems like a nice number.
There are some benefits of using powers of two numbers in your programs. Bitmasks are one application of this, mainly because bitwise operators (&, |, <<, >>, etc) are incredibly fast.
In C++ and Java, this is done a fair bit- especially with GUI applications. You could have a field of 32 different menu options (such as resizable, removable, editable, etc), and apply each one without having to go through convoluted addition of values.
In terms of raw speedup or any performance improvement, that really depends on the application itself. GUI packages can be huge, so getting any speedup out of those when applying menu/interface options is a big win.
From the title of your question, it sounds like you mean, "Does it make your program more efficient if you write constants in binary?" If that's what you meant, the answer is emphatically, No. The compiler translates all your constants to binary at compile time, so by the time the program runs, it makes no difference. I don't know if the compiler can interpret binary constants faster than decimal, but the difference would surely be trivial.
But the body of your question seems to indicate that you mean, "use constants that are round number in binary" rather than necessarily expressing them in binary digits.
For most purposes, the answer would be no. If, say, the computer has to add two numbers together, adding a number that happens to be a round number in binary is not going to be any faster than adding a not-round number.
It might be slightly faster for multiplication. Some compilers are smart enough to turn multiplication by powers of 2 into a bit shift operation rather than a hardware multiply, and bit shifts are usually faster than multiplies.
Back in my assembly-language days I often made elements in arrays have sizes that were powers of 2 so I could index into the array with a bit-shift rather than a multiply. But in a high-level language that would be hard to do, as you'd have to do some research to find out just how much space your primitives take in memory, whether the compiler adds padding bytes between them, etc etc. And if you did add some bytes to an array element to pad it out to a power of 2, the entire array is now bigger, and so you might generate an extra page fault, i.e. the operating system runs out of memory and has to write a chunck of your data to the hard drive and then read it back when it needs it. One extra hard drive right takes more time than 1000 multiplications.
In practice, (a) the difference is so trivial that it would almost never be worth worrying about; and (b) you don't normally know everything happenning at the low level, so it would often be hard to predict whether a change with its intendent ramifications would help or hurt.
In short: Don't bother. Use the constant values that are natural to the problem.
The reason they're used is probably different - e.g. bitmasks.
If you see them in array sizes, it doesn't really increase performance, but usually memory is allocated by power of 2. E.g. if you wrote char x[100], you'd probably get 128 allocated bytes.
No, your code will ran the same way, no matter what is the number you use.
If by binary numbers you mean numbers that are power of 2, like: 2, 4, 8, 16, 1024.... they are common due to optimization of space, normally. Example, if you have a 8 bit pointer it is capable of point to 256 (that is a power of 2), addresses, so if you use less than 256 you are wasting your pointer.... so normally you allocate a 256 buffer... this same works for all other power of 2 numbers....
In most cases the answer is almost always no, there is no noticeable performance difference.
However, there are certain cases (very few) when NOT using binary numbers for array/structure sizes/length will give noticeable performance benefits. These are cases when you're filling the cache and because you're looping over a structure that fills the cache in a such a way that you have cache collisions every time you loop through your array/structure. This case is very rare, and shouldn't be preoptimized unless you're having problems with your code performing much more slowly than theoretical limits say it should. Also, this case is very hardware dependent and will change from system to system.
I am doing some programming on MIPS which has a bunch of 32 bit registers but I also know that you can store 64 bit integers, how does this work? Does the integer take up two registers? If so how does the system know to combine the two registers into one long string of binary
According to Wikipedia, the 32-bit MIPS instruction set includes "Load Double Word" and "Store Double Word" instructions that load/store a pair of consecutive registers from/to memory.
For the actual arithmetic, it looks like you typically have to use multiple instructions.
You need to check the documentation for your platform, since it may vary. For example, for MIPS 32-bits, check something like this quick reference (see the "C calling convention" part).
For more details, though, you'd need a more complete reference (the quick one doesn't list any 64-bit arithmetic instructions that I could see, so if they don't exist, you'd have to implement them yourself, and then you can use your own convention for how to store the values).
I'm trying to understand the differences between these two systems and their impact on C programming.
From what I've learned from Wikipedia:
both systems are used to represent negative numbers
one's complement applies bitwise NOT to negative number (the system has +0 and -0)
two's complement does as in step 2 and adds 1 (eliminates +/-0)
Am I missing something else?
My questions:
which architectures support which system? What is the most common these days (1's or 2's complement)?
in what sense should we consider these systems when programming in C? Does it mainly make sense only in embedded world?
Thanks in advance!
Most systems nowadays use two's complement, since it lets the computer do the same exact operation for addition/subtraction without caring about the particular sign of the number.
When you're programming, the arithmetic works regardless of the system used -- the range of the data types are defined by the language, so if it says a type will work in the range -2^31 to +2^31 - 1, then it'll work regardless of the notation. You need to be careful when working with individual bits or bit shifts, though -- those won't behave like power-of-two arithmetic in non-two's complement systems (although you're not too likely to encounter such systems, and probably never will, if you're just working with PCs).
The only advantage of ones'-complement notation for integers is that it allows conversions to and from sign-magnitude form to be performed without a carry chain. Building a computer with a set of blinkenlights that show each register's value in sign-magnitude form will be much more convenient if the registers use ones'-complement form than if they use two's-complement form. If one wanted to use separate storage latches for the blinkenlights and the CPU's registers, the easiest way to accommodate things would be to have one circuit which translates two's-complement to one's-complement or sign-magnitude form, and then have each register write simultaneously store the two's-complement value in the register while updating the blinkenlight latches with the sign-magnitude value. Latching circuitry is sufficiently expensive, however, that if registers are being built out of discrete latches anyway, adding some circuitry to the ALU to make it use ones'-complement, and then feeding the lights from the CPU's "real" registers, may be cheaper than including an extra set of latches for the lights.
Over the last few decades, of course, the relative costs of different circuit elements have shifted to the point that it would be absurd to have lights wired to directly report the state of a CPU's registers. Consequently, the practical advantages that ones'-complement designs might have had in the past are no longer applicable.