How to store a very small float type number in MIPS? (I.e 1x2^-18) - floating

I want to store 1.23211e-18 to a float variable. Here is how I would do it for a value that can be written without an exponent:
.data
number: .float 2.43221
But how to do it for a very small value like 1.23211e-18.

Chapter 4 of the MIPS Assembly Language Programmer's Guide (a web search will find many copies of a PDF of that book if you don't already have a copy) describes the format of floating-point constants. It says you should be able to write that value exactly as you did in the question:
.float 1.23211e-18

Related

How does a computer turn a string of ASCII into a signed or unsigned number?

For example if I type:
-6
Through what mechanism is that turned into:
1010
Would it be hardware based or somewhere in the kernel?
Would it be hardware based or somewhere in the kernel?
Usually no and no.
The kernel in a mainstream OS like Linux will usually just pass along bytes of text to user-space.
So a user-space program gets a string, i.e. a sequence of characters. (In simple cases, e.g. the ASCII subset of UTF-8, each character is a single byte.) A program would typically use a function like atoi() to convert a sequence of characters (representing ASCII codes for digits) to a binary integer. It's a standard library function because many programs need to deal with strings that represent integers, but it's a software function just like any other.
A simple implementation would have a loop like
int sum = 0;
for (auto d: digits) { // look at digits in MSB-first order
sum = 10*sum + d;
}
// the first digit ends up being multiplied by 10 n times
// the 2nd by 10 n-1 times, and so on. Each digit is multiplied by its place value.
This C++ source would be compiled to multiple asm instructions that implement it. Handling an optional - by negating is also a separate instruction. There's typically a neg instruction of some sort, or a way to subtract from zero, to get the 2's complement inverse. (Assuming 2's complement hardware).
You can speed this up by using fancier instructions that do more work per instruction / per clock cycle. On x86 for example you can convert a multi-digit string of digits to a binary integer with a few SIMD instructions, but that's still just using multiply and add instructions. See How to implement atoi using SIMD? for a nice use of pmaddwd to multiply by a vector of place-values and horizontally add. Also Fastest way to get IPv4 address from string is a cool examples of what you can do with packed-compare and looking up a pshufb shuffle-control vector from a table based on that compare result.
A function like scanf("%d", &num) that reads input as a number is implemented in user-space, but under the hood it uses a system call like read() to get data. (If the C stdio input buffer was empty.)
Some "toy" / teaching systems like the MARS and SPIM MIPS simulators have system calls that get get or print integers (with the input or result in an integer register). In that case, yes, the kernel does it in software.
Or depending on the implementation, there isn't actually a kernel at all, and the syscall instruction escapes to the emulator / simulator's input/output function, so from the POV of software running inside this virtual simulated machine, there really is hardware support for integer conversion. But no real hardware does the entire thing in microcode or actual hardware, at least not any mainstream architectures.

How do interpreters load their values?

I mean, interpreters work on a list of instructions, which seem to be composed more or less by sequences of bytes, usually stored as integers. Opcodes are retrieved from these integers, by doing bit-wise operations, for use in a big switch statement where all operations are located.
My specific question is: How do the object values get stored/retrieved?
For example, let's (non-realistically) assume:
Our instructions are unsigned 32 bit integers.
We've reserved the first 4 bits of the integer for opcodes.
If I wanted to store data in the same integer as my opcode, I'm limited to a 24 bit integer. If I wanted to store it in the next instruction, I'm limited to a 32 bit value.
Values like Strings require lots more storage than this. How do most interpreters get away with this in an efficient manner?
I'm going to start by assuming that you're interested primarily (if not exclusively) in a byte-code interpreter or something similar (since your question seems to assume that). An interpreter that works directly from source code (in raw or tokenized form) is a fair amount different.
For a typical byte-code interpreter, you basically design some idealized machine. Stack-based (or at least stack-oriented) designs are pretty common for this purpose, so let's assume that.
So, first let's consider the choice of 4 bits for op-codes. A lot here will depend on how many data formats we want to support, and whether we're including that in the 4 bits for the op code. Just for the sake of argument, let's assume that the basic data types supported by the virtual machine proper are 8-bit and 64-bit integers (which can also be used for addressing), and 32-bit and 64-bit floating point.
For integers we pretty much need to support at least: add, subtract, multiply, divide, and, or, xor, not, negate, compare, test, left/right shift/rotate (right shifts in both logical and arithmetic varieties), load, and store. Floating point will support the same arithmetic operations, but remove the logical/bitwise operations. We'll also need some branch/jump operations (unconditional jump, jump if zero, jump if not zero, etc.) For a stack machine, we probably also want at least a few stack oriented instructions (push, pop, dupe, possibly rotate, etc.)
That gives us a two-bit field for the data type, and at least 5 (quite possibly 6) bits for the op-code field. Instead of conditional jumps being special instructions, we might want to have just one jump instruction, and a few bits to specify conditional execution that can be applied to any instruction. We also pretty much need to specify at least a few addressing modes:
Optional: small immediate (N bits of data in the instruction itself)
large immediate (data in the 64-bit word following the instruction)
implied (operand(s) on top of stack)
Absolute (address specified in 64 bits following instruction)
relative (offset specified in or following instruction)
I've done my best to keep everything about as minimal as is at all reasonable here -- you might well want more to improve efficiency.
Anyway, in a model like this, an object's value is just some locations in memory. Likewise, a string is just some sequence of 8-bit integers in memory. Nearly all manipulation of objects/strings is done via the stack. For example, let's assume you had some classes A and B defined like:
class A {
int x;
int y;
};
class B {
int a;
int b;
};
...and some code like:
A a {1, 2};
B b {3, 4};
a.x += b.a;
The initialization would mean values in the executable file loaded into the memory locations assigned to a and b. The addition could then produce code something like this:
push immediate a.x // put &a.x on top of stack
dupe // copy address to next lower stack position
load // load value from a.x
push immediate b.a // put &b.a on top of stack
load // load value from b.a
add // add two values
store // store back to a.x using address placed on stack with `dupe`
Assuming one byte for each instruction proper, we end up around 23 bytes for the sequence as a whole, 16 bytes of which are addresses. If we use 32-bit addressing instead of 64-bit, we can reduce that by 8 bytes (i.e., a total of 15 bytes).
The most obvious thing to keep in mind is that the virtual machine implemented by a typical byte-code interpreter (or similar) isn't all that different from a "real" machine implemented in hardware. You might add some instructions that are important to the model you're trying to implement (e.g., the JVM includes instructions to directly support its security model), or you might leave out a few if you only want to support languages that don't include them (e.g., I suppose you could leave out a few like xor if you really wanted to). You also need to decide what sort of virtual machine you're going to support. What I've portrayed above is stack-oriented, but you can certainly do a register-oriented machine if you prefer.
Either way, most of object access, string storage, etc., comes down to them being locations in memory. The machine will retrieve data from those locations into the stack/registers, manipulate as appropriate, and store back to the locations of the destination object(s).
Bytecode interpreters that I'm familiar with do this using constant tables. When the compiler is generating bytecode for a chunk of source, it is also generating a little constant table that rides along with that bytecode. (For example, if the bytecode gets stuffed into some kind of "function" object, the constant table will go in there too.)
Any time the compiler encounters a literal like a string or a number, it creates an actual runtime object for the value that the interpreter can work with. It adds that to the constant table and gets the index where the value was added. Then it emits something like a LOAD_CONSTANT instruction that has an argument whose value is the index in the constant table.
Here's an example:
static void string(Compiler* compiler, int allowAssignment)
{
// Define a constant for the literal.
int constant = addConstant(compiler, wrenNewString(compiler->parser->vm,
compiler->parser->currentString, compiler->parser->currentStringLength));
// Compile the code to load the constant.
emit(compiler, CODE_CONSTANT);
emit(compiler, constant);
}
At runtime, to implement a LOAD_CONSTANT instruction, you just decode the argument, and pull the object out of the constant table.
Here's an example:
CASE_CODE(CONSTANT):
PUSH(frame->fn->constants[READ_ARG()]);
DISPATCH();
For things like small numbers and frequently used values like true and null, you may devote dedicated instructions to them, but that's just an optimization.

Parsing base 2^32 numbers to decimal (For theorically unlimited numbers)

I am working on a C++ problem where I have to print my class.
My class stores and does arithmetic and logic operations on theorically unlimited long numbers. It has an array of unsigned ints to hold the number. For example:
If the number is {a*(2^32) + b} , the class stores it as {array[0]=b , array[1]=a}.
So it is like a number of base (2^32). The problem is how do i convert this number to decimal so i can print it? Simply {a*(2^32) + b} will not do because it doesnt fit into unsigned int. I do not have to store the decimal number but just print it.
What i have got so far
I have thought of firstly converting the number to binary (which is an easy task) then printing it. But same problem arises because there is still no big enough variable to hold the multiplication.
Wild thought
I wonder if I can use my own class to hold the multiplication and with some iterative method do the printing?
I also wonder if this can be solved with some use of logarithmics?
Note: I am not allowed to use other libraries or other long types like double and longer.
Although I say this is for theorically unlimited numbers it would help if I could just find the way to print array of size 2. Then I can think about longer numbers.

What exactly is a datatype?

I understand what a datatype is (intuitively). But I need the formal definition. I don't understand if it is a set or it's the names 'int' 'float' etc. The formal definition found on wikipedia is confusing.
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored.
Can anyone help me with that?
Yep. What that's saying is that a data type has three pieces:
The various possible values. So, for example, an eight bit signed integer might have -127..128. This of that as a set of values V.
The operations: so an 8-bit signed integer might have +, -, * (multiply), and / (divide). The full definition would define those as functions from V into V, or possible as a function from V into float for division.
The way it's stored -- I sort of gave it away when I said "eight bit signed integer". The other detail is that I'm assuming a specific representation by the way I showed the range of values.
You might, if you're into object oriented programming, notice that this is very much like the definition of a class, which is defined by the storage used by each object, adn the methods of the class. Providing those parts for some arbitrary thing, but not inheritance rules, gives you what's called an abstract data type.
Update
#Appy, there's some room for differences in the formalities. I was a little subtle because it was late and I was suddenly uncertain if I'd assumed one's complement or two's complement -- of course it's two's complement. So interpretation is included in my description. Abstractly, though, you'd say it is a algebraic structure T=(V,O) where V is a set of values, O a set of functions from V into some arbitrary type -- remember '==' for example will be a function eq:V × V → {0,1} so you can't expect every operation to be into V.
I can define it as a classification of a particular type of information. It is easy for humans to distinguish between different types of data. We can usually tell at a glance whether a number is a percentage, a time, or an amount of money. We do this through special symbols %, :, and $.
Basically it's the concept that I am sure you grock. For computers however a data type is defined and has various associated attributes, like size, like a definition keywork (sometimes), the values it can take (numbers or characters for example) and operations that can be done on it like add subtract for numbers and append on string or compare on a character, etc. These differ from language to language and even from environment to env. (16 - 32 bit ints/ 32 - 64 envs./ etc).
If there is anything I am missing or needs refining please ask as this is fairly open ended.

Emulating IBM floating point multiplication/addition in VBA

I am attempting to emulate a (no longer existing) mainframe report generator in an Access 2003 or Access 2010 environment. The data it generates must match exactly with paper reports from the early 70s. Unfortunately, the earliest years data were run on hardware that used IBM floating point representation instead of IEEE. With the help of Google, I've found a library of VBA functions that will convert a float from decimal to the IEEE 754 32bit binary format. I had to modify the library to accept either 32bit or 64bit floats, so I have a modest working knowledge of floating point formats, however, I'm having trouble making the conversion from IEEE to IBM binary format, as well as trouble multiplying and adding either the IBM or the IEEE numbers.
I haven't turned up any other libraries for performing this conversion and arithmetic operations in VBA - is there an easier way to go about this, or an existing library that I'm not finding? Failing that, a clear and straightforward explanation of the relevant algorithms?
Thanks in advance.
To be honest you'd probably do better to start by looking at the Hercules emulator.
http://www.hercules-390.org/ Other than that in theory with VBA you can use the Decimal type to get good results (note you have to CDec to create these) it uses 12 bits with a variable power of ten scalar.
A quick google shows this post from the hercules group, which confirms Alberts point about needing to know the hardware:
---Snip--
In theory, but rather less so in practice. S/360 and S/370 had a
choice of Scientific or Commercial instruction sets. The former added
the FP instructions and registers to the base; the latter the decimal
instructions, including Edit and Edit & Mark. But larger 360 (iirc /65
and up) and 370 (/155 and up) models had the union of the two, called
the Universal instruction set, and at some point the S/370 dropped the
option.
---snip---
I have to say that having looked at the hercules source code you'll probably need to figure out exactly which floating point operation codes (in terms of precision single,long, extended) are being performed.
The problem is here's your confusing the issue of decimal type in access, and that of single and double type floating point values available in access.
If you use the currency data type in access, this is a scaled integer, and will not produce rounding (that is what most of us use for financial calculations and reports). You can also use decimal values in access, and again they don't round at all as they are packed decimals.
However, both the single and double values available inside of access are in fact the same format and conform to the IEEE floating point standard.
For an access single variable, this is a 32bit number, and the range is:
-3.402823E38
to
-1.401298E-45 for negative values
and
1.401298E-45
to
3.402823E38 for positive values
That looks to be the same to me as the IEEE 754 standard.
So, if you add up values in access as a single, you should get the rouding same results.
So, Intel based, and Access single and doubles I believe are the same as this IEEE standard.
The only real issue it and here is what is the format of the original data you're pulling into access, and what kinds of text or string or conversion process is occurring when that data is pulled in and stored?
Access can convert numbers. Try typing these values at the access command line prompt (debug window)
? hex(255)
Above will show FF
? csng(&hFF)
Above will show 255
Edit:
Ah, ok, I see now I have this reversed, my wrong here. The problem here is assuming you convert a number to the older IBM format (Excess 64?), you will THEN have to get your hands on their code that they used for adding those numbers. In fact, even back then, different IBM models depending on what you purchased actually produced different results (more money = more precision).
So, not only do you need conversion routines to convert to the internal representation, you THEN need the routines that add/subtract/multiply those numbers. So, just having conversion routines is not going to get you very far, since you also have to duplicate their exact routines that do math. Those types of routines are likely not all created equal in terms of how they round numbers etc.