What is this floating point format? - binary

I am trying to figure out how to read a historical binary data file. I believe it came from an older 32 bit Solaris system. I am looking at a section of the file that I believe contains 32 bit floating numbers (not IEEE floats). The format appears to be (as a hex dump):
xx41 xxxx
xx42 xxxx
The 41 and 42 in those positions appear consistently through the floating point numbers. I'm afraid that I do not have any additional information to add to this. So the first part of my question is, what format is this? If the first part can not be answered directly, a list of likely possibilities would be great. Lastly, how would you suggest going about determining what format this is? Thank you for your input.

Could this be PDP-11 format? The giveaway for me is that the second byte is mostly constant, which suggests that the exponent of the floating-point format is ending up in the second byte rather than the first (as you'd expect for a big-endian machine) or the last (for a little-endian machine). The PDP-11 is notorious for its funny byte order for floats and integers; see the material near the bottom of this Floating-Point Formats page.
The values of 41 and 42 would appear to be consistent with positive values of roughly unit-magnitude: the exponent bias for the PDP-11 format appears to be 128, so with the unusual byte-order I'd expect the 2nd byte that you list to contain the sign and the topmost 7 bits of the exponent; that would make the unbiased exponent for a second byte of 41 be either 2 or 3 depending on the 8th exponent bit (which should appear as the MSB of the first byte).
See also this page for a brief description of the PDP-11 format.
[EDIT] Here's some Python code to convert from a 4-byte string in the form you describe to a Python float, assuming that the 4-byte string represents a float in PDP-11 format.
import struct
def pdp_to_float(xs):
"""Convert a 4-byte PDP-11 single-precision float to a Python float."""
ordered_bytes = ''.join(xs[i] for i in [1, 0, 3, 2])
n = struct.unpack('>I', ordered_bytes)[0]
fraction = n & 0x007fffff
exponent = (n & 0x7f800000) >> 23
sign = (n & 0x80000000) >> 31
hidden = 1 if exponent != 0 else 0
return (-1)**sign * 2**(exponent-128) * (hidden + fraction / 2.0**23)
Example:
>>> pdp_to_float('\x00\x00\x00\x00')
0.0
>>> pdp_to_float('\x23\x41\x01\x00')
5.093750476837158
>>> pdp_to_float('\x00\x42\x00\x00')
16.0

The data described is consistent with the usual IEEE 754 format, stored in big-endian order, then displayed by a little-endian dump program two bytes at a time.
32-bit floats in the interval [8, 128) have first bytes of 0x41 or 0x42. Consider such a number, perhaps 0x41010203. Stored big end first, it would appear in memory as the four bytes 0x41, 0x01, 0x02, and 0x03. When the dump program reads 16-byte integers, little end first, it will read and display 0x0141 and 0x0302.

Related

How computers convert decimal to binary integers

This is surely a duplicate, but I was not able to find an answer to the following question.
Let's consider the decimal integer 14. We can obtain its binary representation, 1110, using e.g. the divide-by-2 method (% represents the modulus operand):
14 % 2 = 0
7 % 2 = 1
3 % 2 = 1
1 % 2 = 1
but how computers convert decimal to binary integers?
The above method would require the computer to perform arithmetic and, as far as I understand, because arithmetic is performed on binary numbers, it seems we would be back dealing with the same issue.
I suppose that any other algorithmic method would suffer the same problem. How do computers convert decimal to binary integers?
Update: Following a discussion with Code-Apprentice (see comments under his answer), here is a reformulation of the question in two cases of interest:
a) How the conversion to binary is performed when the user types integers on a keyboard?
b) Given a mathematical operation in a programming language, say 12 / 3, how does the conversion from decimal to binary is done when running the program, so that the computer can do the arithmetic?
There is only binary
The computer stores all data as binary. It does not convert from decimal to binary since binary is its native language. When the computer displays a number it will convert from the binary representation to any base, which by default is decimal.
A key concept to understand here is the difference between the computers internal storage and the representation as characters on your monitor. If you want to display a number as binary, you can write an algorithm in code to do the exact steps that you performed by hand. You then print out the characters 1 and 0 as calculated by the algorithm.
Indeed, like you mention in one of you comments, if compiler has a small look-up table to associate decimal integers to binary integers then it can be done with simple binary multiplications and additions.
Look-up table has to contain binary associations for single decimal digits and decimal ten, hundred, thousand, etc.
Decimal 14 can be transformed to binary by multipltying binary 1 by binary 10 and added binary 4.
Decimal 149 would be binary 1 multiplied by binary 100, added to binary 4 multiplied by binary 10 and added binary 9 at the end.
Decimal are misunderstood in a program
let's take an example from c language
int x = 14;
here 14 is not decimal its two characters 1 and 4 which are written together to be 14
we know that characters are just representation for some binary value
1 for 00110001
4 for 00110100
full ascii table for characters can be seen here
so 14 in charcter form actually written as binary 00110001 00110100
00110001 00110100 => this binary is made to look as 14 on computer screen (so we think it as decimal)
we know number 14 evntually should become 14 = 1110
or we can pad it with zero to be
14 = 00001110
for this to happen computer/processor only need to do binary to binary conversion i.e.
00110001 00110100 to 00001110
and we are all set

JSON.parse big numbers result in incorrect numbers [duplicate]

See this code:
var jsonString = '{"id":714341252076979033,"type":"FUZZY"}';
var jsonParsed = JSON.parse(jsonString);
console.log(jsonString, jsonParsed);
When I see my console in Firefox 3.5, the value of jsonParsed is the number rounded:
Object id=714341252076979100 type=FUZZY
Tried different values, the same outcome (number rounded).
I also don't get its rounding rules. 714341252076979136 is rounded to 714341252076979200, whereas 714341252076979135 is rounded to 714341252076979100.
Why is this happening?
You're overflowing the capacity of JavaScript's number type, see §8.5 of the spec for details. Those IDs will need to be strings.
IEEE-754 double-precision floating point (the kind of number JavaScript uses) can't precisely represent all numbers (of course). Famously, 0.1 + 0.2 == 0.3 is false. That can affect whole numbers just like it affects fractional numbers; it starts once you get above 9,007,199,254,740,991 (Number.MAX_SAFE_INTEGER).
Beyond Number.MAX_SAFE_INTEGER + 1 (9007199254740992), the IEEE-754 floating-point format can no longer represent every consecutive integer. 9007199254740991 + 1 is 9007199254740992, but 9007199254740992 + 1 is also 9007199254740992 because 9007199254740993 cannot be represented in the format. The next that can be is 9007199254740994. Then 9007199254740995 can't be, but 9007199254740996 can.
The reason is we've run out of bits, so we no longer have a 1s bit; the lowest-order bit now represents multiples of 2. Eventually, if we keep going, we lose that bit and only work in multiples of 4. And so on.
Your values are well above that threshold, and so they get rounded to the nearest representable value.
As of ES2020, you can use BigInt for integers that are arbitrarily large, but there is no JSON representation for them. You could use strings and a reviver function:
const jsonString = '{"id":"714341252076979033","type":"FUZZY"}';
// Note it's a string −−−−^−−−−−−−−−−−−−−−−−−^
const obj = JSON.parse(jsonString, (key, value) => {
if (key === "id" && typeof value === "string" && value.match(/^\d+$/)) {
return BigInt(value);
}
return value;
});
console.log(obj);
(Look in the real console, the snippets console doesn't understand BigInt.)
If you're curious about the bits, here's what happens: An IEEE-754 binary double-precision floating-point number has a sign bit, 11 bits of exponent (which defines the overall scale of the number, as a power of 2 [because this is a binary format]), and 52 bits of significand (but the format is so clever it gets 53 bits of precision out of those 52 bits). How the exponent is used is complicated (described here), but in very vague terms, if we add one to the exponent, the value of the significand is doubled, since the exponent is used for powers of 2 (again, caveat there, it's not direct, there's cleverness in there).
So let's look at the value 9007199254740991 (aka, Number.MAX_SAFE_INTEGER):
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− sign bit
/ +−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− exponent
/ / | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+− significand
/ / | / |
0 10000110011 1111111111111111111111111111111111111111111111111111
= 9007199254740991 (Number.MAX_SAFE_INTEGER)
That exponent value, 10000110011, means that every time we add one to the significand, the number represented goes up by 1 (the whole number 1, we lost the ability to represent fractional numbers much earlier).
But now that significand is full. To go past that number, we have to increase the exponent, which means that if we add one to the significand, the value of the number represented goes up by 2, not 1 (because the exponent is applied to 2, the base of this binary floating point number):
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− sign bit
/ +−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− exponent
/ / | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+− significand
/ / | / |
0 10000110100 0000000000000000000000000000000000000000000000000000
= 9007199254740992 (Number.MAX_SAFE_INTEGER + 1)
Well, that's okay, because 9007199254740991 + 1 is 9007199254740992 anyway. But! We can't represent 9007199254740993. We've run out of bits. If we add just 1 to the significand, it adds 2 to the value:
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− sign bit
/ +−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− exponent
/ / | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+− significand
/ / | / |
0 10000110100 0000000000000000000000000000000000000000000000000001
= 9007199254740994 (Number.MAX_SAFE_INTEGER + 3)
The format just cannot represent odd numbers anymore as we increase the value, the exponent is too big.
Eventually, we run out of significand bits again and have to increase the exponent, so we end up only being able to represent multiples of 4. Then multiples of 8. Then multiples of 16. And so on.
What you're seeing here is actually the effect of two roundings. Numbers in ECMAScript are internally represented double-precision floating-point. When id is set to 714341252076979033 (0x9e9d9958274c359 in hex), it actually is assigned the nearest representable double-precision value, which is 714341252076979072 (0x9e9d9958274c380). When you print out the value, it is being rounded to 15 significant decimal digits, which gives 14341252076979100.
It is not caused by this json parser. Just try to enter 714341252076979033 to fbug's console. You'll see the same 714341252076979100.
See this blog post for details:
http://www.exploringbinary.com/print-precision-of-floating-point-integers-varies-too
JavaScript uses double precision floating point values, ie a total precision of 53 bits, but you need
ceil(lb 714341252076979033) = 60
bits to exactly represent the value.
The nearest exactly representable number is 714341252076979072 (write the original number in binary, replace the last 7 digits with 0 and round up because the highest replaced digit was 1).
You'll get 714341252076979100 instead of this number because ToString() as described by ECMA-262, §9.8.1 works with powers of ten and in 53 bit precision all these numbers are equal.
The problem is that your number requires a greater precision than JavaScript has.
Can you send the number as a string? Separated in two parts?
JavaScript can only handle exact whole numbers up to about 9000 million million (that's 9 with 15 zeros). Higher than that and you get garbage. Work around this by using strings to hold the numbers. If you need to do math with these numbers, write your own functions or see if you can find a library for them: I suggest the former as I don't like the libraries I've seen. To get you started, see two of my functions at another answer.

What is the difference between 65 and the letter A in binary?

What is the difference between 65 and the letter A in binary as both represent same bit level information?
Basically, a computer only understand numbers, and not every numbers: it only understand binary represented numbers, ie. which can be represented using only two different states (for example, 1 and 2, 0V and 5V, open and close, true or false, etc.).
Unfortunately, we poor humans doesn't really like reading zeros and ones... So, we have created some codes, to use number like if they were characters: one of them is called ASCII (American Standard Code for Information Interchange), but there is also some others, such as Unicode. The principle is simple: all the program have to do is manipulating numbers, what any CPU does very well, but, when it comes to displaying these data, the display represent them as real characters, such as 'A', '4', '#', or even a space or a newline.
Now, as soon as you are using ASCII, the number 65 will represent the letter 'A'. All is a question of representation: for example, the binary number 0bOOOO1111, the hexadecimal one 0x0F, the octal one 017 and the decimal number 15 all represent the same number. It's the same for letter 'A': think of ASCII as a base, but instead of using the base 2 (binary), 8(octal), 10(decimal) or 16(hexadecimal), to display numbers, it's used in a complete different manner.
To answer your question: ASCII 'A' is hexadecimal 0x41 is decimal 65 is octal 0101 is binary 0b01000001.
Every character is represented by a number. The mapping between numbers and characters is called encoding. Many encodings use for the letter A the number 65. Since in memory there are no special cells for characters or numbers, they are represented the same way, but the interpretation in any program could be very different.
I may be misunderstanding the question and if so I apologise for getting it wrong
But if I'm right I believe your asking what's the difference between a char and int in binary representation of the value 65 which is the ascii decimal value for the letter A (in capital form)
First off we need to appreciate data types which reserve blocks of memory in the ram modules
An interget is usually 16 bits or more if a float or long (in c# this declaration is made by stating uint16, int16, or int32, uint32 so on, so forth)
A character is an 8 bit memory block
Therefore the binary would appear as follows
A byte (8 bits) - char
Decimal: 128, 64, 32, 16, 8, 4, 2, 1
Binary: 01000001
2 bytes (16 bit) - int16
Binary; 0000000001000001
Its all down to the size of the memory block reserved based on the data type in the variable declaration
I'd of done the decimal calculations for the 2 bit but I'm on the bus at the moment
First of all, the difference can be in size of the memory (8bits, 16bits or 32bits). This question: bytes of a string in java
Secondly, to store letter 'A' you can have different encodings and different interpretation of memory. The ASCII character of 'A' in C can occupy exact one byte (7bits + an unused sign bit) and it has exact same binary value as 65 in char integer. But the bitwise interpretation of numbers and characters are not always the same. Just consider that you can store signed values in 8bits. This question: what is an unsigned char

Verilog floating point to binary conversion

I am trying convert a signed floating number in Verilog to a signed 24 bit binary value. For example
-0.0065 would become: 24'b100000000110101001111110
0.0901 would become: 24'b000001011100010000110010
Is this correct?
Thank you
Taking the fractional decimal 0.0901 and converting to a fixed-point value with 2 integer bits and 22 fractional bits (24 bit total):
Ruby syntax used for the maths (you can cut and paste into irb (interactive ruby) a command line tool):
i = (0.0901 * 2**20) # 377906.7904
# truncat to integer value
i = i.to_i # 377906
# Convert to string using binary (base 2)
i.to_s(2) # "1011100010000110010"
To add the leading zeros (24 bit length), right justify and pad with 0's
i.to_s(2).rjust(24, '0') # "000001011100010000110010"
# Convert to Hex (base 16)
i.to_s(16) # "5c432"
Signed numbers are a bit more problematic, easiest way is to calculate positive value then perform twos complement :
(0.0065 * 2**22).to_i.to_s(2).rjust(24, '0')
=> "000000000110101001111110"
Twos complement
"000000000110101001111110"
"111111111001010110000001" # Ones complement (bit invert)
"111111111001010110000001" + 1
"111111111001010110000010" #Twos complement
You had 24'b100000000110101001111110 which is just the positive number with the MSB set to 1 which is not how signed numbers normally work. The format you have used is Sign Magnitude, but you can not just feed that into a multiplier (as per your previous question).
NB: I have also skipped over the quantisation effect of converting to fixed point. Your coefficient when scaled by your fractional bit was 377906.7904. but we just take the integer part giving you an error of 0.7904 which will may effect your filter performance.

What transformations are used by little-endian systems to convert data to network order?

What are the underlying transformations that are necessary to convert data in a little-endian system into network byte order? For 2 byte and 4 byte data there are well-known functions (such as htons, ntohl, etc.) to encapsulate the changes, what happens for strings of 1 byte data (if anything)?
Also, Wikipedia implies that little-endian is the mirror image of big-endian, but if that were true why would we need specific handling for 2 and 4 byte data?
The essay "On Holy Wars and a Plea for Peace" seems to imply that there are many different flavors of little-endian -- it's an old essay -- does that still apply? Are byte order markers like the ones found at the beginning of Java class files still necessary?
And finally, is 4-byte alignment necessary for network-byte order?
Let's say you have the ASCII text "BigE" in an array b of bytes.
b[0] == 'B'
b[1] == 'i'
b[2] == 'g'
b[3] == 'E'
This is network order for the string as well.
If it was treated as a 32 bit integer, it would be
'B' + ('i' << 8) + ('g' << 16) + ('E' << 24)
on a little endian platform and
'E' + ('g' << 8) + ('i' << 16) + ('B' << 24)
on a big endian platform.
If you convert each 16-bit work separately, you'd get neither of these
'i' + ('B' << 8) + ('E' << 16) + ('g' << 24)
which is why ntohl and ntohs are both required.
In other words, ntohs swaps bytes within a 16-bit short, and ntohl reverses the order of the four bytes of its 32-bit word.
Specific handling functions for 2 and 4 byte data take advantage of the fact that there are processor instructions that operate on specific data sizes. Running a 1-byte reversing function four times is certainly less efficient than using wider instructions to perform the same (albeit increased in scale) operations on all four bytes at once.
1 byte data doesn't require any conversion between endians (it's an advantage of UTF-8 over UTF-16 and UTF-32 for string encoding).
is 4-byte alignment necessary for network-byte order?
No specific alignment is necessary for bytes going over a network. Your processor may demand a certain alignment in memory, but it's up to you to resolve the discrepancy. The x86 family usually doesn't make such demands.
The basic idea is that all multi-byte types have to have the order of their bytes reversed. A four byte integer would have bytes 0 and 3 swapped, and bytes 1 and 2 swapped. A two byte integer would have bytes 0 and 1 swapped. A one byte character does not get swapped.
There are two very important implications of this that non-practicioners and novices don't always realise:
(ASCII) Character strings are not touched.
There is no possible blind algorithm to byte swap generic "data". You have to know the type of all your data and swap each item in the manner required for its type.