Reading a CSV file of varying precision in Fortran - csv

I am using an external program to run a simulation which returns to me a csv file containing output data. I need to read the data from this file into my fortran program, which analyses and optimizes the input conditions to rerun the external program.
The CSV file has say 20 columns and 70 rows. Each column contains output data for a specific parameter. Now since that program is not written by me, I cannot control the precision of the output values. So in many cases the external program truncates the number of digits after the decimal it they are zero. So it is possible in run number 1, a certain field has 3 digits after the decimal, but has only 2 digits after the decimal in run number 2.
What am I supposed to do for this? I cannot use the read command since in that I need to specify in advance the number of digits my program has to read.
I basically need a way for my program to identify data between commas and read a value or varying precision between the commas.

For input, the decimal part of a format specifier is only used if the input field does not contain a decimal point.
For the last few decades (since the demise of punched cards), users typically expect that a numeric value that doesn't contain a decimal point is an integer value. Consequently, for input, format specifications for real numbers should always have .0 for their decimal part.
For example, after:
CHARACTER(4) :: input
REAL :: a, b
input = '1 '
READ (input, "(F4.0)") a
READ (input, "(F4.1)") b
a will have the value 1.0, and b will have the value 0.1.
(For input, it doesn't particularly matter which particular real data descriptor is used (F, E, D, or G) - they all behave the same regardless of the nature of the input.)
So, for input, all you have to worry about is getting the field width right. Once you have read a record into a string this is easy enough to do by using the INDEX intrinsic.

Related

Encoder number of outpus for opcode within a MIPS machine instruction

If I have an encoder with 8 data inputs, what is its maximum number of outputs?
I know that an encoder is a combinational circuit that performs the reverse operation of a decoder. It has a maximum of 2^n input lines and ‘n’ output lines, hence it encodes the information from 2^n inputs into an n-bit code. Since I have 8 data input, the output will be 3, since 2^3 = 8. Is that the correct assumption?
Let's try to tease apart the concepts of one hot (decoded) lines and an encoding using a number of bits.  Both these concepts are a way to represent information, but their form and typical usage is different.
One hot is a technique wherein at most one line is 1/true and all the other lines are 0/false.  These one hot lines are not considered digits in a number, but rather individual signals or conditions (only one of which is can be true at any given time).  This form is particularly useful in certain circuits, as each of the one hot lines can activate some other hardware.  (A hardware lookup table (LUT), a RAM or ROM may use one-hot within its internal array indexing.)
Encoding is a technique where we use N lines as digits in an N-bit number, as would be found in a CPU register holding a number, or as we might write normal binary numbers in text.  By contrast, in this form any of the N bits can be 1 (or 0).
Simple encoders & decoders translate between encoded form (N-bit numbers) and one hot form (2N lines).
... encoder ... has a maximum of 2^n input lines and ‘n’ output lines
In your statement, the 2^n input lines are in one hot form, while the output lines are normal numbers in binary (i.e. encoded).
Both the inputs (2^n lines) and the outputs (n lines) are capable of representing exactly 2^n different values!  As a result, decode/encode is a 1:1 mapping, back & forth.  (It would be an error to have multiple hots on the input side of such a decoder, and bad things would happen in a system that allowed that.)
In the formulas you're speaking to:  2N = V,  and   N = log2 ( V )  —  N stands for number of bits (a bit is a binary digit), and V stands for number of values that can be represented in N bits.
(While the 2's in these formulas are for binary — substitute 2 with 10 for the same relationships for number of decimal digits vs. number of values those number of digits can represent/store/communicate).
In one hot form we need V number of lines, whereas in encoded form we need N lines (as bits/digits) to represent the same information (one of V different values).
Consider whether a number you're looking is a digit count (as with N) or a value count (as with V).
And bear in mind that in one hot form, we need one line for each possible value, V (whereas in encoded form we need N bits for V possible values).
A MIPS processor will feed the 6 bit opcode field into a lookup table of some sort, in order to determine which set of control signals to activate for any given instruction.  (The opcode field is not one hot, but rather a bit field of N=6 bits).
These control signals are (also) not one hot, and the MIPS instruction decoder is not using a simple decoder, but rather a mapper that goes between encoded opcode values and effectively encoded control signals — this mapping is accomplished by lookup in a table.
These control signals are individual boolean values rather than as a set either one-hot or an encoded number.  One hot may be used internally in indexing of this mapping.  This mapping is basically an array lookup where the index is the opcode and each array element has all the individual control signal values appropriate its index.
(R-Type instructions all share a common opcode value, so when the R-Type opcode value is present, then additional lookup/mapping is done on the func bit field to generate the proper control signals.)

Is there error propagation when serializing floating point values to strings?

Say I have a float (or double) in my favorite language. Say that in memory this value is stored according to IEEE 754, say that I serialize this value in XML or JSON or plain text using base 10. When serializing and de-serializing this value will I lose precision of my number? When should I care about this precision loss?
Would converting the number to base64 prevent the loss of precision?
It depends on the binary-to-decimal conversion function that you use. Assuming this function is not botched (it has no reason to be):
Either it converts to a fixed precision. Old-fashioned languages such as C offer this kind of conversion to decimal. In this case, you should use a format with 17 significant decimal digits. A common format is D.DDDDDDDDDDDDDDDDEXXX where D and X are decimal digits, and there are 16 digits after the dot. This would be specified as %.16e in C-like languages. Converting back such a decimal value to the nearest double produces the same double that was originally printed.
Or convert it to the shortest decimal representation that converts back to the same double. This is what some modern programming languages (e.g. Java) offer by default as printing function. In this case, the property that parsing back the decimal representation will return the original double is automatic.
In either case loss of accuracy should not happen. This is not because you get the exact decimal representation of the original binary64 number with either method 1. or 2. above: in the general case, you don't. Such an exact representation always exists (because 10 is a multiple of 2), but can be up to ~750 digits long for a binary64 number.
What you get with method 1. or 2. above is a decimal number that is closer to the original binary64 number than to any other binary64 number. This means that the opposite conversion, from decimal to binary64, will “round back” to the original.
This is where the “non-botched” assumption is necessary: in order for the successive conversions to return to the original number they must respectively produce the closest decimal to the binary64 number passed and the closest binary64 to the decimal number passed. In these conditions, and with the appropriate number of decimal digits for the first conversion, the round-trip is lossless.
I should point out that (non-botched) conversions to and from decimal are expensive operations. Unless human-readability of the result is important for you, you should consider a simpler format to convert to. The C99-style hexadecimal representation for floating-point numbers is a good compromise between conversion cost and readability. It is not the most compact but it contains only printable characters.
The approach of converting to the shortest form which converts back the same is dangerous (the "round-trip" string formatting mode in .NET uses such an approach, and is buggy as a result). There is probably no reason not to have a decimal-to-binary conversion method yield a result which is more than 0.75lsb from the exact specified numerical value, guaranteeing that a conversion will always yield a perfectly-rounded numerical value is expensive and in most cases not particularly helpful. It would be better to ensure that the precise arithmetic value of the decimal expression will be less than 0.25lsb from the double value to be represented. If a that's less than 0.25lsb away from a double is fed to a routine which returns a double within 0.75lsb of it, the latter routine can be guaranteed to yield the same double as was given to the former.
The approach of simply finding the shortest form that yields the same double assumes that any string representation will always be parsed the same way, even if the value represented falls almost exactly halfway between two adjacent double values. Since obtaining a perfectly-rounded result could require reading an arbitrary number of digits (e.g. 1125899906842624.125000...1 should round up to 1125899906842624.25) few implementations are apt to bother; if an implementation is going to ignore digits beyond a certain point, even when that might yield a result that was e.g. more than .056lsb way from the correct one, it shouldn't be trusted to be accurate to 0.50000lsb in any case.

How is JSON number encoded?

How is number represented in JSON internally and how many bytes of data does it take to store a JSON number?
I can't find any info specifying this internal detail.
According to the ECMA standard (PDF), §8:
A number is represented in base 10 with no superfluous leading zero. It may have a preceding minus sign (U+002D). It may have a (U+002E) prefixed fractional part. It may have an exponent of ten, prefixed by e (U+0065) or E (U+0045) and optionally + (U+002B) or – (U+002D). The digits are the code points U+0030 through U+0039.
So, pretty much text, except that (later on the page) NaN and Infinity aren't acceptable values.
BSON, however, has int32, int64, and double types that are a bit more traditional.
JSON is a data interchange format. It is just text. There is no "internal" representation of JSON, unless you are referring to how your particular system encodes and stores text data.
The number of bytes it takes to store a JSON number would be the length of the number, in characters, multiplied by the number of bytes required to store a character in your particular system.

Parsing base 2^32 numbers to decimal (For theorically unlimited numbers)

I am working on a C++ problem where I have to print my class.
My class stores and does arithmetic and logic operations on theorically unlimited long numbers. It has an array of unsigned ints to hold the number. For example:
If the number is {a*(2^32) + b} , the class stores it as {array[0]=b , array[1]=a}.
So it is like a number of base (2^32). The problem is how do i convert this number to decimal so i can print it? Simply {a*(2^32) + b} will not do because it doesnt fit into unsigned int. I do not have to store the decimal number but just print it.
What i have got so far
I have thought of firstly converting the number to binary (which is an easy task) then printing it. But same problem arises because there is still no big enough variable to hold the multiplication.
Wild thought
I wonder if I can use my own class to hold the multiplication and with some iterative method do the printing?
I also wonder if this can be solved with some use of logarithmics?
Note: I am not allowed to use other libraries or other long types like double and longer.
Although I say this is for theorically unlimited numbers it would help if I could just find the way to print array of size 2. Then I can think about longer numbers.

MySQL FLOAT & decimals

Datatype of field in the DB is FLOAT and the value is 18.7. I'd like to store and display this on page as 18.70. Whenever I enter the extra 0 it still only stores it as 18.7
How can I store the extra 0 ? I can change the data type of the field.
In a FLOAT column, what MySQL stores for 18.7, is actually:
01000001 10010101 10011001 10011010
which, being retrieved from the DB and converted back into your display format, is 18.7.
In reality, the stored value is a binary fraction represented by the decimal number 18.70000076293945 which you can see by issuing this query:
CREATE TABLE t_f (value FLOAT);
INSERT
INTO t_f
VALUES (18.7);
SELECT CAST(value AS DECIMAL(30, 16))
FROM t_f;
IEEE-754 representation of number stores them as binary fractions, so a value like 0.1 can only be represented with continued fraction and hence be not exact.
DECIMAL, on the other hand, stores decimal digits, packing 9 digits into 4 bytes.
Floating point types do not store the number of insignificant zeros on the left side of a number before decimal digit or on the right side of the number after the decimal digit. You'll need to use a string-based type (or store the precision in a separate field) if you want to store the exact numeric string entered by the user and be able to distinguish 12.7 from 12.70. You can, however, round things that you display by two digits in your application.
if two decimal points needed use:
decimal(n,2); where n>=2
the decimal data type will persist the decimal points formatting and gives more accurate results than float and double data types.
Are you attempting to store a currency as a float? If so, please use a decimal with more decimal digits than 2.
You really want fixed-point arithmetic on currencies.
This is just very broad rule of thumb and my own observation, but in regular business logic as serialized in a database, you almost never want floating point. I know there are lots of exceptions, but I'm suspicious whenever I see a float typed column in a table because of this. I'd be interested in what others have found.