I've got a MySQL database with a large amount of 2048-bit binary strings (e.g '0111001...0101'). One calculation I'll need is the Hamming Distance (the total count of 1's in the XOR'd result) of these strings compared to some externally generated bitstring. In order to get an idea of how to write this query, I tried writing it for smaller bitstrings. Here's an example:
select BIT_COUNT(bin((b'0011100000') ^ (b'1111111111')))
The inner portion that computes the XOR works correctly, but BIT_COUNT returns strange results. This example returns 14, which is longer than the string itself.
So I have a few questions:
First, why is BIT_COUNT returning such strange results. Is it operating on a string rather than the binary string I'd like it to operate on? If so, how do I deal with this?
Second, notice that I'm casting (is that the right word here?) the strings as binary by prepending with a b. How would I do this with column names and variables? Clearly I can't simply prepend a b to a variable name, and I can't insert a space between. Any ideas?
Thanks,
EDIT:
So here's a solution to the first problem:
select BIT_COUNT(b'0011100000' ^ b'1111111111')
There seems to be a problem when using this for larger strings (2048 bits). I tried:
select BIT_COUNT(b'001110...00011')
and it gives me results like 28, when the actual bitcount should be around 1024. If I remove the b, then it appears to max-out at 64. Any ideas on how to resolve this problem?
Just remove bin function. With it BIN_COUNT treats its argument as a chars string, not as a set of bits. So
select BIT_COUNT(b'0011100000' ^ b'1111111111')
will do the work
Related
I recently read this article.
The problem is about finding the smallest binary digit multiple of a given number x and the solution given there uses a binary tree starting from '1'. At each stage, it considers to topmost string and if it is not a multiple of x, it generates two more strings by concatenating a '0' and a '1' to the end of the string.
However, to optimize the solution the each possible solution s is checked for int(s) mod x.
If the remainder has occurred previously then that path is abandoned. I cannot figure out why this is happening.
I am inserting data from one table into another in a MariaDB database, where the column in the first table is FLOAT, and in the second it's DOUBLE. The data can have values of any size, precision and decimal places.
Here is what happens to the values when I do a straight-forward copy:
INSERT INTO data2 (value) SELECT value FROM data1
The values are given random extra significant figures:
FLOAT in data1 DOUBLE in data2
-0.000000000000454747 -0.0000000000004547473508864641
-122.319 -122.31932830810547
14864199700 14864220160
CAST(value AS DECIMAL(65,30)) generates exactly the same values as col 2 above, except I see trailing zeroes.
Yet when I just do
UPDATE data2 SET value = 14867199700 WHERE id = 133025046;
the DOUBLE value is accepted.
Do I have to export all the value to an SQL script and re-import them? Isn't there a better way?
Despite hours trying to experimenting with the issue, I'm not much closer to a solution, despite its limited nature. I can see this is problem that besets all technologies, not just MariaDB or databases, so I have probably just missed the answer somewhere. Stackoverflow is desperately trying to guide to a solution with new suggestion features I hadn't seen before, but unfortunately they are no help, like the other suggested answers.
Your test case is flawed. You are feeding in decimal digits, and not testing just the transfer of FLOAT to DOUBLE.
UPDATE tbl SET double_col = float_col will always copy exactly the same value. This because the DOUBLE representation is a superset of the FLOAT representation (53 vs 24 bits of precision; etc).
Literal, with decimal places: UPDATE tbl SET double_col = 123.456 will mangle the number because of rounding from decimal to DOUBLE. Ditto for float_col. Furthermore, the mangled results will be different!
Hole number literal: UPDATE tbl SET double_col = 14867199700 will be stored exactly. But if you put that same literal into a FLOAT, it will be rounded to 24 bits, so it cannot be stored exactly. You lose exactness at about 7 significant digits for FLOAT and about 16 for DOUBLE. The literal in this example has 9 significant digits (after ignoring trailing zeros).
That's just a sampling of the nightmares you can get into.
You must consider FLOAT and DOUBLE to be approximate. You should never compare for equality; you don't know what might have messed with the last bit of the value.
Also, you should not try to guess when MySQL will perform expressions in DECIMAL instead of DOUBLE.
And, keep in mind that division is usually imprecise due to rounding to some number of bits or decimals.
The "mantissa" of 14864199700 is
1.10111010111111001101100 (binary of FLOAT : 24 bits including 'hidden' leading bit)
1.1011101011111100110110000000101000000000000000000000 (binary of DOUBLE)
^ ^ (lost in FLOAT)
Each of those is multiplied by the same power of 2. The DOUBLE gets exactly 14864199700. The FLOAT lost the bits pointed to.
You can play around with such at https://gregstoll.dyndns.org/~gregstoll/floattohex/
Believe it or not, things used to be worse. People would be billed for $0.00 -- due to rounding errors. Or results of what should have been 1+1 showed as 1.99999999.
I have had to work in a project where we have an identifier in HEX.
Example, B900001752F10001, is received in a parser developed in JAVA in a SIGNED LONG variable. We store that variable in a SIGNED BIGINT variable in MySQL DB.
Every time we need the HEX Chain we use HEX(code) function and we get what is expected.
But when we have to provision the master table, we need to input valid codes, to achieve that we used something like:
Update employee set code=0xB900001752F10001 where main_employee_id=1002;
it worked in the past producing code to be stored in DB as
13330654997192441857
but now we are using the same exact instruction and we are getting code stored in DB as
-5116089076517109759
So Comparing those two numbers by using HEX function, those provide the same HEX NUMBER.
select HEX(-5116089076517109759), HEX(13330654997192441857)
0xB900001752F10001, 0xB900001752F10001
Could someone please provide ideas why this is happening? How we should handle this from the provisioning perspective we need to assure storing as 13330654997192441857 so when an authentication event happen codes match.
I have run without any other idea, I appreciate any help.
I think you have overflowed the datatype.
According to MySQL manual, signed bigint is in the range of
-9,223,372,036,854,775,808
to
9,223,372,036,854,775,807
Your number
18,446,744,073,709,551,615
has exceeded the above positive bound so it overflows and is
interpreted as a negative number.
Having that said, I think you may still be okay with your command -- it is only when you try to interpret the hex pattern as a number the result looks confusing.
Update employee set code=0xB900001752F10001 where main_employee_id=1002;
64bite machineļ¼top digit is sign bit,so the biggest num is 9,223,372,036,854,775,807,but if ur num more than it,the top digit will transform to be 1,so the num will be negative and its overflowed.
so ur 13330654997192441857 will become to 5116089076517109759.
I have a query
SELECT MAX(CAST(user_name as SIGNED)) as max_id FROM (`users`)
it returns
2.01303045556E+12
but actually the maximum value is 2013030455555
Anybody know how it happens??
That is correct.
2.01303045556E+12 actually IS 2013030455555.
x E+12 means x*10 ^ 12
2*10^12=2000000000000 (2 followed by 12 zeros).
This is expotential (usually floating point) number representation. See Scientific notation at wikipedia (scroll down to "E notation").
To get rid of it you may cast that data to decimal or integer, instead of float. Maybe there are better methods, but I dont know them.
Example:
-- example for 16 digits
SELECT MAX(CAST(user_name as DECIMAL(16,0)) as max_id FROM (`users`)
Another solution: change format of the number in SQL or maybe PHP if you are using it.
I have the following sql query in mysql:
SELECT *
FROM _t_test
WHERE pret NOT
IN ( 2.6700, 2.6560, 1.8200 )
I would expect the rows with the value 1.8200 not to be shown, yet I still get them.
Am I missing something?
The field "pret" is double(16,4).
This is a rounding error. A double is not an exact value, so 1.8200 isn't represented exactly, so the values are not exactly the same.
For MYSQL floating points, see http://dev.mysql.com/doc/refman/5.0/en/problems-with-float.html
The correct way to do floating-point number comparison is to first
decide on an acceptable tolerance for differences between the numbers
and then do the comparison against the tolerance value. For example,
if we agree that floating-point numbers should be regarded the same if
they are same within a precision of one in ten thousand (0.0001), the
comparison should be written to find differences larger than the
tolerance value
See http://en.wikipedia.org/wiki/Double_precision_floating-point_format