How is a 64 bit decimal stored in 32 bit addresses? - binary

I am doing some programming on MIPS which has a bunch of 32 bit registers but I also know that you can store 64 bit integers, how does this work? Does the integer take up two registers? If so how does the system know to combine the two registers into one long string of binary

According to Wikipedia, the 32-bit MIPS instruction set includes "Load Double Word" and "Store Double Word" instructions that load/store a pair of consecutive registers from/to memory.
For the actual arithmetic, it looks like you typically have to use multiple instructions.

You need to check the documentation for your platform, since it may vary. For example, for MIPS 32-bits, check something like this quick reference (see the "C calling convention" part).
For more details, though, you'd need a more complete reference (the quick one doesn't list any 64-bit arithmetic instructions that I could see, so if they don't exist, you'd have to implement them yourself, and then you can use your own convention for how to store the values).

Related

Does Actionscript have a math specification?

This Flash game has a lot of players including me and some friends. We noticed the same thing can run differently for different people. The math in the simulation is definitely to blame. Whether the cause is in hardware, OS, browser, 32-bit/64-bit, etc. is not really known. But with the combinations we have to test with, we've gotten 5 distinct end results from the same simulation starting conditions, and can likely get more.
This makes me wonder, does Actionscript have a floating point math specification? If so, what does it say about the accuracy and determinism of the computations?
I compare to Java, which differentiates between regular floating point math with the Math class and deterministic floating point with the StrictMath class and strictfp keyword. Both are always within 1 ulp of the exact result, this also implies the regular math and strict math always give results within 1 ulp of each other for a single operation or function call. The docs are very clear about this. I'd expect other respectable languages to have something similar, saying how accurate their floating point computations are and if they give the same results everywhere.
Update since some people have been saying the game is dishonest:
Some others have taken apart the swf and even made mods for it, they've seen the game engine and can confirm there is no randomness. Box2d is used for its physics. If a design ever does run differently on subsequent runs, it has actually changed due to some bug, usually this is a visible difference, but if not, you can check the raw data with this tool and see it is different. Different starting conditions as expected get different end results.
As for what we know so far, this is results on a test level:
For example, if I am running 32-bit Chrome on my desktop (AMD A10-5700 as CPU), I will always get that result of "946 ticks". But if I run on Firefox or Internet Explorer instead I always get the result of "794 ticks".
Actionscript doesn't really have a math specification in that sense. This is the closest you'll get:
https://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/Math.html
It says at the bottom of the top section:
The Math functions acos, asin, atan, atan2, cos, exp, log, pow, sin, and sqrt may result in slightly different values depending on the algorithms used by the CPU or operating system. Flash runtimes call on the CPU (or operating system if the CPU doesn't support floating point calculations) when performing the calculations for the listed functions, and results have shown slight variations depending upon the CPU or operating system in use.
So to answer our two questions:
What does it say about accuracy? Nothing, actually. At no point does it mention a limit to how inaccurate a result can be.
What does it say about determinism? Hardware and operating system are definitely factors, so it is platform-dependent. No confirmation for other factors.
If you want to look any deeper, you're on your own.
According to the docs, Actionscript has a catch-all Number data type in addition to int and uint types:
The Number data type uses the 64-bit double-precision format as specified by the IEEE Standard for Binary Floating-Point Arithmetic (IEEE-754). This standard dictates how floating-point numbers are stored using the 64 available bits. One bit is used to designate whether the number is positive or negative. Eleven bits are used for the exponent, which is stored as base 2. The remaining 52 bits are used to store the significand (also called mantissa), the number that is raised to the power indicated by the exponent.
By using some of its bits to store an exponent, the Number data type can store floating-point numbers significantly larger than if it used all of its bits for the significand. For example, if the Number data type used all 64 bits to store the significand, it could store a number as large as 265 – 1. By using 11 bits to store an exponent, the Number data type can raise its significand to a power of 21023.
Although this range of numbers is enormous, it comes at the cost of precision. Because the Number data type uses 52 bits to store the significand, numbers that require more than 52 bits for accurate representation, such as the fraction 1/3, are only approximations. If your application requires absolute precision with decimal numbers, use software that implements decimal floating-point arithmetic as opposed to binary floating-point arithmetic.
This could account for the varying results you're seeing.

How can I implement BCD in Fortran?

Not sure if the title of my question makes sense, so bear with me. I'd like to find a system for representing single digit numbers with as few bits as possible. There is a method called "Densely packed decimal" (https://en.wikipedia.org/wiki/Densely_packed_decimal) which would be my ideal solution, but I wouldn't even know if that's possible or how I could implement it without further research or guidance from a guru.
The next best thing would be to be able to use a 4-bit addressing system to represent digits, but once again I'm not sure if that is even possible.
So! Barring implementations of the above methods/systems, I could settle for a 1-byte data type which I could use to represent pairs of two integers. Is there a 1-byte data-type in Fortran, or does it not allow for that level of control?
There is a 1 byte datatype in (almost) every programming language. It is the character. It is actually the definition of a byte, that it can hold a default character.
There is also a 1-byte (strictly speaking 1-octet) integer type in Fortran, accessible as integer(int8) where int8 is a constant from the iso_fortran_env module (Fortran 2008).
Both can be used to implement such things. Whether you will use division by other numbers, xoring, or Fortran bit manipulation intrinsic functions https://www.nsc.liu.se/~boein/f77to90/a5.html#section10 (probably the best option) is up to you.

Are floats guaranteed to be 4 bytes with CUDA?

I want to use cudaMemcpy to create a ones vector (1,...,1) so that I can do things like sum the rows/columns of a matrix or add a vector to a matrix with CUBLAS. The code will be run on different platforms, so
how can I guarantee that I'm always going to be working with 4-byte floats?
is there a sizeof function for data types on the GPU, or does the device always use the same data type specifications as the host?
Answering your second question first, the device always uses the same data type specification as the host compiler. So you can use sizeof(...) on the host to determine relevant sizes. Many things would be broken if this were not the case.
To answer your first question, then, we need only ask, amongst the supported host-side compilers for CUDA, is the float representation always 32 bits? The answer is yes.
As an aside, note that this is generally true for most platforms. Finding a system which has other than 32-bit floats is difficult. But as far as I know, there is no general C or C++ requirement that float be 32 bits. Someone else may prove me wrong.

How can you handle absurdly large numbers?

There are some scenarios where programmers need or want to find grossly large numbers. These are often so large that they defy the programmer's comprehension. I'm talking about things like the largest known prime number (with 12978189 digits) and the recently calculated 10 trillion digits of pi.
How can you create a program that handles these? This far exceeds an integer, a long, a double, a BigInteger, a BigDecimal, or anything of the sort. How do these kinds of programs for discovering these numbers get created? How can you even store them in memory when no appropriate datatypes exist, and they would likely consume gigabytes of data each?
To address your specific examples:
A 12 million digit integer isn't terribly large for a typical "large integer" class to handle. This should be able to be stored in memory.
To store 10 trillion digits of π, you could use a disk file and memory-map it. You'll need a 64 bit OS and application, but you can simply create a 10 terabyte file on disk (you'll probably need a few disks and a filesystem like ZFS that can store it across disks), and map it into CPU address space. The algorithms that calculate π (such as BBP) conveniently calculate one hex digit at a time which fits well into half a byte of memory.
The (abstract) answer is to write algorithms using the machine's native types that produce the results you want. For instance, when you do addition by hand on paper of two very large integers, the biggest actual calculation you need is only 9+9+1 (nine plus nine plus one for the carry). Of course you need paper large enough to write the two numbers down in the first place and the answer down as well. So as long as the two numbers and the answer can be stored in a computer's harddisk (the paper), an algorithm can be written that does it with variables that only need a value up to 19; so even a char variable is more than capable of handling this let alone an int variable.
The (concrete) answer is that really good programmers have already done this and there even FOSS libraries for it. One good one is the GNU Project's GMP library which has loads of functions to handle arbitrary size integer arithmetic and arbitrary precision floating point arithmetic. So as long as your computer can store the information needing during the calculation, it can be done. You'll need to invest the time to read the documentation of course.

1's and 2's complement systems

I'm trying to understand the differences between these two systems and their impact on C programming.
From what I've learned from Wikipedia:
both systems are used to represent negative numbers
one's complement applies bitwise NOT to negative number (the system has +0 and -0)
two's complement does as in step 2 and adds 1 (eliminates +/-0)
Am I missing something else?
My questions:
which architectures support which system? What is the most common these days (1's or 2's complement)?
in what sense should we consider these systems when programming in C? Does it mainly make sense only in embedded world?
Thanks in advance!
Most systems nowadays use two's complement, since it lets the computer do the same exact operation for addition/subtraction without caring about the particular sign of the number.
When you're programming, the arithmetic works regardless of the system used -- the range of the data types are defined by the language, so if it says a type will work in the range -2^31 to +2^31 - 1, then it'll work regardless of the notation. You need to be careful when working with individual bits or bit shifts, though -- those won't behave like power-of-two arithmetic in non-two's complement systems (although you're not too likely to encounter such systems, and probably never will, if you're just working with PCs).
The only advantage of ones'-complement notation for integers is that it allows conversions to and from sign-magnitude form to be performed without a carry chain. Building a computer with a set of blinkenlights that show each register's value in sign-magnitude form will be much more convenient if the registers use ones'-complement form than if they use two's-complement form. If one wanted to use separate storage latches for the blinkenlights and the CPU's registers, the easiest way to accommodate things would be to have one circuit which translates two's-complement to one's-complement or sign-magnitude form, and then have each register write simultaneously store the two's-complement value in the register while updating the blinkenlight latches with the sign-magnitude value. Latching circuitry is sufficiently expensive, however, that if registers are being built out of discrete latches anyway, adding some circuitry to the ALU to make it use ones'-complement, and then feeding the lights from the CPU's "real" registers, may be cheaper than including an extra set of latches for the lights.
Over the last few decades, of course, the relative costs of different circuit elements have shifted to the point that it would be absurd to have lights wired to directly report the state of a CPU's registers. Consequently, the practical advantages that ones'-complement designs might have had in the past are no longer applicable.