Best datatype to store a long number made of 0 and 1 - mysql

I want to know what's the best datatype to store these:
null
0
/* the length of other numbers is always 7 digits */
0000000
0000001
0000010
0000011
/* and so on */
1111111
I have tested, INT works as well. But there is a better datatype. Because all my numbers are made of 0 or 1 digits. Is there any better datatype?

What you are showing are binary numbers
0000000 = 0
0000001 = 2^0 = 1
0000010 = 2^1 = 2
0000011 = 2^0 + 2^1 = 3
So simply store these numbers in an integer data type (which is internally stored with bits as shown of course). You could use BIGINT for this, as recommended in the docs for bitwise operations (http://dev.mysql.com/doc/refman/5.7/en/bit-functions.html).
Here is how to set flag n:
UPDATE mytable
SET bitmask = POW(2, n-1)
WHERE id = 12345;
Here is how to add a flag:
UPDATE mytable
SET bitmask = bitmask | POW(2, n-1)
WHERE id = 12345;
Here is how to check a flag:
SELECT *
FROM mytable
WHERE bitmask & POW(2, n-1)
But as mentioned in the comments: In a relational database you usually use columns and tables to show attributes and relations rather than an encoded flag list.

As you've said in a comment, the values 01 and 1 should not be treated as equivalent (which rules out binary where they would be), so you could just store as a string.
It actually might be more efficient than storing as a byte + offset since that would take up 9 characters, whereas you need a maximum of 7 characters
Simply store as a varchar(7) or whatever the equivalent is in MySql. No need to be clever about it, especially since you are interested in extracting positional values.
Don't forget to bear in mind that this takes up a lot more storage than storing as a bit(7), since you are essentially storing 7 bytes (or whatever the storage unit is for each level of precision in a varchar), not 7 bits.
If that's not an issue then no need to over-engineer it.

You could convert the binary number to a string, with an additional byte to specify the number of leading zeros.
Example - the representation of 010:
The numeric value in hex is 0x02.
There is one leading zero, so the first byte is 0x01.
The result string is 0x01,0x02.
With the same method, 1010010 should be represented as 0x00,0x52.
Seems to me pretty efficient.

Not sure if it is the best datatype, but you may want to try BIT:
MySQL, PostgreSQL
There are also some useful bit functions in MySQL.

Related

How can I make my select statement deterministically match only 1/n of my dataset?

I'm processing data from a MySQL table where each row has a UUID associated with it. EDIT: the "UUID" is in fact an MD5 hash (VARCHAR) of the job text.
My select query looks something like:
SELECT * FROM jobs ORDER BY priority DESC LIMIT 1
I am only running one worker node right now, but would like to scale it out to several nodes without altering my schema.
The issue is that the jobs take some time, and scaling out beyond one right now would introduce a race condition where several nodes are working on the same job before it completes and the row is updated.
Is there an elegant way to effectively "shard" the data on the client-side, by specifying some modifier config value per worker node? My first thought was to use the MOD function like this:
SELECT * FROM jobs WHERE UUID MOD 2 = 0 ORDER BY priority DESC LIMIT 1
and SELECT * FROM jobs WHERE UUID MOD 2 = 1 ORDER BY priority DESC LIMIT 1
In this case I would have two workers configured as "0" and "1". But this isn't giving me an even distribution (not sure why) and feels clunky. Is there a better way?
The problem is you're storing the ID as a hex string like acbd18db4cc2f85cedef654fccc4a4d8. MySQL will not convert the hex for you. Instead, if it starts with a letter you get 0. If it starts with a number, you get the starting numbers.
select '123abc' + 0 = 123
select 'abc123' + 0 = 0
6 out of 16 will start with a letter so they will all be 0 and 0 mod anything is 0. The remaining 10 of 16 will be some number so will be distributed properly, 5 of 16 will be 0, 5 of 16 will be 1. 6/16 + 5/16 = 69% will be 0 which is very close to your observed 72%.
To do this right we need to convert the 128 hex string into a 64 bit unsigned integer.
Slice off 64 bits with either left(uuid, 16) or right(uuid, 16).
Convert the hex (base 16) into decimal (base 10) using conv.
cast the result to an unsigned bigint. If we skip this step MySQL appears to use a float which loses accurracy.
select cast(conv(right(uuid, 16), 16, 10) as unsigned) mod 2
Beautiful.
That will only use 64 bits of the 128 bit checksum, but for this purpose that should be fine.
Note this technique works with an MD5 checksum because it is pseudorandom. It will not work with the default MySQL uuid() function which is a UUID version 1. UUIDv1 is a timestamp + a fixed ID and will always mod the same.
UUIDv4, which is a random number, will work.
Convert the hex string to decimal before modding:
where CONV(substring(uuid, 1, 8), 16, 10) mod 2 = 1
A reasonable hashing function should distribute evenly enough for this purpose.
Use substring to convert only a small part so the conv doesn't overflow decimal range and maybe behave badly. Any subset of bits should also be well distributed.

Convert string to int in MySQL

I have a column with names like:
Ernest Hemingway
Jackson Pollock
I want to convert them to numbers and store them in an INT field. Maybe getting the position of each letter in the alphabet or something like this, resulting a number:
23764283456
23984623746
Is there any function to do something like this? I don't mind the length of the INT or if the result is one number or another. The important thing is that every time I apply the function to a name, the result is the same.
Thanks!
Try this:
crc32('Ernest Hemingway');
will always give you 2479642411
as #Gordon_Linoff said in the comments large number can't be store on filed of type int
but I will show you how to convert string to the ascii of the chars
you can use HEX
SELECT HEX('test')
+-------------+
| HEX('test') |
+-------------+
| 74657374 |
+-------------+
This is a one-way hash, but with an important concern: the integer should be representable on the platform.
PHP code, assuming 32-bit compatibility is desired:
$hash = sha1('Ernest Hemingway');
// last 6 characters, represent 3 bytes
$hash = substr($hash, -6);
$result = hexdec($hash); // integer: 1331016
Keep in mind this has a very low entropy: 2^24 = 16777216 possibilities
4 bytes is too large, because signed/unsigned integer discrepancies would lead to float with some inputs, and floats really can't be casted to integers with perfect determinism.
SELECT field,CONVERT(SUBSTRING_INDEX(field,'-',-1),UNSIGNED INTEGER) AS num
FROM table
ORDER BY num;

MySQL: compare a mixed field containing letters and numbers

I have a field in the mysql database that contains data like the following:
Q16
Q32
L16
Q4
L32
L64
Q64
Q8
L1
L4
Q1
And so forth. What I'm trying to do is pull out, let's say, all the values that start with Q which is easy:
field_name LIKE 'Q%'
But then I want to filter let's say all the values that have a number higher than 32. As a result I'm supposed to get only 'Q64', however, I also get Q4, Q8 and so for as I'm comparing them as strings so only 3 and the respective digit are compared and the numbers are in general taken as single digits, not as integers.
As this makes perfect sense, I'm struggling to find a solution on how to perform this operation without pulling all the data out of the database, stripping out the Qs and parsing it all to integers.
I did play around with the CAST operator, however, it only works if the value is stored as string AND it contains only digits. The parsing fails if there's another character in there..
Extract the number from the string and cast it to a number with *1 or cast
select * from your_table
where substring(field_name, 1, 1) = 'Q'
and substring(field_name, 2) * 1 > 32

MySQL bitwise AND 256-bit binary values

I'm intending on storing a 256-bit long binary value in a MySQL table column.
Which column type should I be using (blob?) such that I can run bitwise operations against it (example of an AND would be ideal).
I don't think you could find some way to perform bit-wise operation on 256-bit values at SQL level as the doc clearly state that:
MySQL uses BIGINT (64-bit) arithmetic for bit operations, so these operators have a maximum range of 64 bits.
http://dev.mysql.com/doc/refman/5.5/en/bit-functions.html#operator_bitwise-and
As for storing those values, TINYBLOB is possible, but my personal preference would go to simply BINARY(32) (a binary string of 32 bytes -- 256-bits).
While writing this, one trick came to my mind. If we are limited to 64-bit values (BIGINT UNSIGNED), why not store your 256-bit as 4 words of 64-bits. Not very elegant but that would work. Especially here since you only need bitwise operations:
ABCD32 & WXYZ32 == A8 & W8, B8 & X8, C8 & Y8, D8 & Z8
Very basically:
create table t (a bigint unsigned,
b bigint unsigned,
c bigint unsigned,
d bigint unsigned);
While inserting, 256-bit values has to be "split" on 4 words:
-- Here I use hexadecimal notation for conciseness. you may use b'010....000' if you want
insert into t values (0xFFFFFFFF,
0xFFFF0000,
0xFF00FF00,
0xF0F0F0F0);
You could easily query the 256-bit value:
mysql> select CONCAT(LPAD(HEX(a),8,'0'),
LPAD(HEX(b),8,'0'),
LPAD(HEX(c),8,'0'),
LPAD(HEX(d),8,'0')) from t;
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| CONCAT(LPAD(HEX(a),8,'0'),
LPAD(HEX(b),8,'0'),
LPAD(HEX(c),8,'0'),
LPAD(HEX(d),8,'0')) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| FFFFFFFFFFFF0000FF00FF00F0F0F0F0 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
I used hexadecimal here again, but you could display as binary by replacing ̀HEX() by BIN()
And last but not least you could perform binary operation on them. Once again, you just have to "split" the operand. Assuming I want to apply the 256 bits mask 0xFFFFFFFFFFFFFFFF0000000000000000 to all values in the table:
update t set a = a & 0xFFFFFFFF,
b = b & 0xFFFFFFFF,
c = c & 0x00000000,
d = d & 0x00000000;
Looks like blob works with a query like this for the bitwise and:
select id,bin(label & b'01000000010000001000000000000000000') from projects;

Single quotes affecting the calculations in Select query

SELECT COUNT(*) FROM area
WHERE ROUND(SQRT(POWER(('71' - coords_x), 2) +
POWER(('97' - coords_y), 2))) <= 17
==> 51
SELECT COUNT(*) FROM area
WHERE ROUND(SQRT(POWER((71 - coords_x), 2) +
POWER((97 - coords_y), 2))) <= 17
==> 22
coords_x and coords_y are both TINYINT fields containing values in the range [1, 150]. Usually MySQL doesn't care if numbers are quoted or not.. but apparently it does in this case.
The question is just: Why?
MySQL always cares about data types. What happens is that your code relies in automatic type casting and performs math on strings (which can hold a number or not). This can lead to all sort of unpredictable results:
SELECT POW('Hello', 'World') -- This returns 1
To sum up: you need to learn and use the different data types MySQL offers. Otherwise, your application will never do reliable calculations.
Update:
One more hint:
TINYINT[(M)] [UNSIGNED] [ZEROFILL]
A very small integer. The signed range
is -128 to 127. The unsigned range is
0 to 255.
URL:
http://dev.mysql.com/doc/refman/5.1/en/numeric-type-overview.html
I hope you are not trying to store 150 in a signed tinyint column.