MySQL Order By doesn't work on Concat(enum) - mysql

Currently we have an interessting problem regarding the sort order of MySQL in an enum-field. The fields enum entries have been sorted in the order we want it. Just to be save, we added a CONCAT around it, so it would be cast to char and ordered in alphabetical order, just as suggested by the MySQL-reference (MySQL Reference - Enum)
Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col).
But that didn't produce the expected results, so we started to investigate further. It seems that the order by statement doesn't work on a combination of enum and the concat function. I've wrote the following sample script, which should show my point:
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO test
VALUES ('b'), ('c'), ('a');
SELECT * FROM test; -- b, c, a
SELECT * FROM test ORDER BY col1 ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS CHAR) ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS BINARY) ASC; -- a, b, c
SELECT * FROM test ORDER BY CONCAT(col1) ASC; -- b, c, a - This goes wrong
I am currently suspecting some kind of problem with the collation/encoding, but I'm not sure. My databases default encoding is also utf8. The MySQL version is 5.6.12 but it seems to be reproduceable with MySQL 5.1. The storage engine is MyIsam but it also occurs with the memory engine.
Any help would be appreciated.
Update:
As it seems the problem is produced only in MySQL 5.6 and by the collation of the column. With the first CREATE TABLE statement, the queries work fine.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_general_ci DEFAULT NULL
)
With the second they don't.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
)
The collation of the table and/or database don't seem to affect the queries. The queries can be tested in this SQL Fiddle

Strange,it works in this fiddle.Do you have a trigger or something?
http://sqlfiddle.com/#!2/0976a/2
BUT,in 5.6 goes haywire:
http://sqlfiddle.com/#!9/0976a/1
Mysql bug,probably.
More,if you input the values in the enum in the "proper" order it works:
http://sqlfiddle.com/#!9/a3784/1
IN the doc:
ENUM values are sorted based on their index numbers, which depend on
the order in which the enumeration members were listed in the column
specification. For example, 'b' sorts before 'a' for ENUM('b', 'a').

As per the document:
Under the Handling of Enumeration Literals section, it states that:
If you store a number into an ENUM column, the number is treated as
the index into the possible values, and the value stored is the
enumeration member with that index. (However, this does not work with
LOAD DATA, which treats all input as strings.) If the numeric value is
quoted, it is still interpreted as an index if there is no matching
string in the list of enumeration values. For these reasons, it is not
advisable to define an ENUM column with enumeration values that look
like numbers, because this can easily become confusing.
For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3:
numbers ENUM('0','1','2')
If you store 2, it is interpreted as an
index value, and becomes '1' (the value with index 2). If you store
'2', it matches an enumeration value, so it is stored as '2'. If you
store '3', it does not match any enumeration value, so it is treated
as an index and becomes '2' (the value with index 3).
mysql> INSERT INTO t (numbers) VALUES(2),('2'),('3');
mysql> SELECT * FROM t;
+---------+
| numbers |
+---------+
| 1 |
| 2 |
| 2 |
+---------+
In your case:
INSERT INTO test
VALUES ('2'), ('3'), ('1');
Index value of '2' is 2, '3' is 3 and '1' is 1.
So the output is 2,3,1

Related

search comma separated values from column contains comma separated string mysql

I am trying to search comma separated values from database table column contains comma separated string.
MY DB
id interest status
------------------------
1 1,2,3 1
2 4 1
My search combination contains 1,2, 3,2, 3, 1,4 etc. Any combination will occure.
I want to show all the id that contains any digit from comma separated search combination.
For example, search for 1,4 should return
id
--
1
2
For example, search for 3,2 should return
id
--
1
I have tried using IN and FIND_IN_SET but none of them achieved my result. Is there any other option.
SELECT * FROM `tbl_test` WHERE interest IN (3)
The above code return empty set.
Like Jens has pointed out in the comments, it is highly recommended to normalize your schema.
If you wish to continue with string and comma separated values, you should then be looking at complex regex matching (which I leave it to you to explore).
However, one more alternative is to convert your column interest as JSON datatype. MYSQL 5.7 and above supports these datatypes.
CREATE TABLE IF NOT EXISTS `tbl` (
`id` int(6) unsigned NOT NULL,
`interest` JSON DEFAULT NULL,
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `tbl` (`id`, `interest`,`status`) VALUES
(1, '[1,2,3,4]',1),
(2, '[1,2]',1),
(3, '[3]',1);
And then query it as follows :
select id from tbl where JSON_CONTAINS( interest ,'[1,2]')
select id from tbl where JSON_CONTAINS( interest ,'[3,4]');
...
You can see it action in this sql fiddle.

How can I calculate the difference between two hashes in a MySQL query?

I'm attempting to calculate the Hamming distance between an input hash and database-stored hashes. These are perceptual hashes, so the Hamming distance between them are important to me and tell me how similar two different images are (see http://en.wikipedia.org/wiki/Perceptual_hashing, http://jenssegers.com/61/perceptual-image-hashes, http://stackoverflow.com/questions/21037578/). Hashes are 16 hexadecimal characters long, and look like this:
b1d0c44a4eb5b5a9
1f69f25228ed4a31
751a0b19f0c2783f
My database looks like this:
CREATE TABLE `hashes` (
`id` int(11) NOT NULL,
`hash` binary(8) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
INSERT INTO `hashes` (`id`, `hash`) VALUES
(1, 0xb1d0c44a4eb5b5a9),
(2, 0x1f69f25228ed4a31),
(3, 0x751a0b19f0c2783f);
Now, I know I can query for a Hamming distance like so:
SELECT BIT_COUNT(0xb1d0c44a4eb5b5a9 ^ 0x751a0b19f0c2783f)
Which will output 38, as expected. However, I can't seem to reference a column name for this comparison. The following does not work as expected.
SELECT BIT_COUNT(hash ^ 0x751a0b19f0c2783f) FROM hashes
Does anyone know how I can calculate a Hamming distance like in my first SELECT query above using the columns in my database? I've tried a myriad of scenarios using hex(), unhex(), conv(), and cast() in different ways. This is in MySQL.
Update My query above appears to work as expected when running in MySQL v8 (thanks to #LukStorms for pointing this out). You can use my fiddle below and change the version in the top left. My question now is: how can I ensure the behavior works in all versions of MySQL?
Fiddle: https://www.db-fiddle.com/f/mpqsUpZ1sv2kmvRwJrK5xL/0
The problem seems to be related to your choice of datatype which is a string type. Using a numeric datatype works in MySQL 5.7 as well as 8.0:
CREATE TABLE `hashes` (
`id` int(11) NOT NULL,
`hash` bigint unsigned NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
INSERT INTO `hashes` (`id`, `hash`) VALUES
(1, 0xb1d0c44a4eb5b5a9),
(2, 0x1f69f25228ed4a31),
(3, 0x751a0b19f0c2783f);
SELECT id, HEX(hash), BIT_COUNT(hash ^ 0x751a0b19f0c2783f)
FROM hashes;
Output:
id HEX(hash) BIT_COUNT(hash ^ 0x751a0b19f0c2783f)
1 B1D0C44A4EB5B5A9 38
2 1F69F25228ED4A31 34
3 751A0B19F0C2783F 0
Demo on dbfiddle
The difference in treatment between MySQL 5.7 and 8.0 of using a string type can be seen with this query:
SELECT id, hash, HEX(hash), HEX(hash ^ 0x751a0b19f0c2783f)
FROM hashes;
MySQL 5.7:
id hash HEX(hash) HEX(hash ^ 0x751a0b19f0c2783f)
1 {"type":"Buffer","data":[177,208,196,74,78,181,181,169]} B1D0C44A4EB5B5A9 751A0B19F0C2783F
2 {"type":"Buffer","data":[31,105,242,82,40,237,74,49]} 1F69F25228ED4A31 751A0B19F0C2783F
3 {"type":"Buffer","data":[117,26,11,25,240,194,120,63]} 751A0B19F0C2783F 751A0B19F0C2783F
MySQL 8.0
id hash HEX(hash) HEX(hash ^ 0x751a0b19f0c2783f)
1 {"type":"Buffer","data":[177,208,196,74,78,181,181,169]} B1D0C44A4EB5B5A9 C4CACF53BE77CD96
2 {"type":"Buffer","data":[31,105,242,82,40,237,74,49]} 1F69F25228ED4A31 6A73F94BD82F320E
3 {"type":"Buffer","data":[117,26,11,25,240,194,120,63]} 751A0B19F0C2783F 0000000000000000
MySQL 8.0 is performing the XOR correctly, returning a variable, while MySQL 5.7 is returning the value being XOR'ed, indicating that it is treating the BINARY string as 0 in a numeric context.
This is not a number, so it can't used for mathematic calculations:
`hash` binary(8) NOT NULL
Use bigint instead:
`hash` bigint unsigned NOT NULL
Try this:
SELECT id, HEX(hash), CAST(CONV(HEX(hash),16,10) AS UNSIGNED), BIT_COUNT(CAST(CONV(HEX(hash),16,10) AS UNSIGNED) ^ 0x751a0b19f0c2783f) FROM hashes;

find_in_set return different value

I have a table with this declaration:
CREATE TABLE foobar (
id int(11) NOT NULL AUTO_INCREMENT,
dow set('q','w','e','r','t','y', 'u') NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT;
With those values inside:
id, dow
'1', '1,3,6'
'1', '2,4,7'
Those query returns different values.
SELECT dow, FIND_IN_SET('4', dow) FROM (SELECT * from pippo.pluto) as B;
SELECT dow, FIND_IN_SET('4', dow) FROM pippo.pluto as B;
The first query returns those results:
'1,3,6', '0'
'2,4,7', '2'
The second query returns those results:
'1,3,6', '0'
'2,4,7', '4'
Why?
Tested against 5.6 and 5.7 mysql version.
Edit:
This behaviour remains the same if I use the mysql views.
CREATE VIEW selectInner AS SELECT dow, FIND_IN_SET('r', dow) FROM (SELECT * from pippo.foobar) as B;
CREATE VIEW selectDirect AS SELECT dow, FIND_IN_SET('r', dow) FROM pippo.foobar as B;
There are two things interacting here:
"If the first argument is a constant string and the second is a column of type SET, the FIND_IN_SET() function is optimized to use bit arithmetic." per FIND_IN_SET docs
One of the two queries operates directly on the table, and the second on a derived table.
The behaviour demonstrates this:
When operating directly on the table (where dow refers to the defined column in that table's metadata), FIND_IN_SET returns the index of the entry in the column's definition
When operating on a derived table (where dow refers to a derived column), FIND_IN_SET returns the index of the entry in the derived value
This is clear if you search for e.g. y and u in a column containing q,e,y,u: you'd get 6 and 7 when querying the table directly, where dow is a SET containing q,w,e,r,t,y,u and FIND_IN_SET uses bitwise optimizations; but 3 and 4 when searching the derived table, where dow is the string containing q,e,y,u for that row.

INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.
My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR does some conversion, it finds ë in é.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.
Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive
This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.
The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.
To pervert the reports example, if you create the following table:
create table t ( needle varchar(10), haystack varchar(10)
) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");
then run this query, you can see the same behaviour demonstrated:
select needle
, haystack
, needle=haystack as `=`
, haystack LIKE CONCAT('%',needle,'%') as `like`
, instr(needle, haystack) as `instr`
from t;
SQL Fiddle

Strange behavior when query for varchar filed

I came across this strange behavior when I was hunting for a bug in a system. Consider following.
We have a mysql table which have varchar(100) column. See the following sql script.
create table user(`id` bigint(20) NOT NULL AUTO_INCREMENT,`user_id` varchar(100) NOT NULL,`username` varchar(255) DEFAULT NULL,PRIMARY KEY (`id`),UNIQUE KEY `user_id` (`user_id`)) ENGINE=InnoDB AUTO_INCREMENT=129 DEFAULT CHARSET=latin1;
insert into user(user_id, username) values('20120723145614834', 'user1');
insert into user(user_id, username) values('20120723151128642', 'user1');
When I execute following query I received 0 results.
select * from user where user_id=20120723145614834;
But When I execute following I get the result(note the single quote).
select * from user where user_id='20120723145614834';
This is expected since user_id field is varchar. Strange thing is that both following queries yield result.
select * from user where user_id=20120723151128642;
select * from user where user_id='20120723151128642';
Can anybody explain me the reason for this strange behavior. My MySql version is 5.1.63-0ubuntu0.11.10.1
Check mysql document 12.2. Type Conversion in Expression Evaluation
Comparisons that use floating-point numbers (or values that are
converted to floating-point numbers) are approximate because such
numbers are inexact. This might lead to results that appear
inconsistent:
mysql> SELECT '18015376320243458' = 18015376320243458;
-> 1
mysql> SELECT '18015376320243459' = 18015376320243459;
-> 0
So we better use always right data type for SQL.