String compare exact in query MySQL - mysql

I created table like that in MySQL:
DROP TABLE IF EXISTS `barcode`;
CREATE TABLE `barcode` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(40) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `barcode` VALUES ('1', 'abc');
INSERT INTO `barcode` VALUES ('2', 'abc ');
Then I query data from table barcode:
SELECT * FROM barcode WHERE `code` = 'abc ';
The result is:
+-----+-------+
| id | code |
+-----+-------+
| 1 | abc |
+-----+-------+
| 2 | abc |
+-----+-------+
But I want the result set is only 1 record. I workaround with:
SELECT * FROM barcode WHERE `code` = binary 'abc ';
The result is 1 record. But I'm using NHibernate with MySQL for generating query from mapping table. So that how to resolve this case?

There is no other fix for it. Either you specify a single comparison as being binary or you set the whole database connection to binary. (doing SET NAMES binary, which may have other side effects!)
Basically, that 'lazy' comparison is a feature of MySQL which is hard coded. To disable it (on demand!), you can use a binary compare, what you apparently already do. This is not a 'workaround' but the real fix.
from the MySQL Manual:
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces
Of course there are plenty of other possiblities to achieve the same result from a user's perspective, i.e.:
WHERE field = 'abc ' AND CHAR_LENGTH(field) = CHAR_LENGTH('abc ')
WHERE field REGEXP 'abc[[:space:]]'
The problem with these is that they effectively disable fast index lookups, so your query always results in a full table scan. With huge datasets that makes a big difference.
Again: PADSPACE is default for MySQLs [VAR]CHAR comparison. You can (and should) disable it by using BINARY. This is the indended way of doing this.

You can try with a regular expression matching :
SELECT * FROM barcode WHERE `code` REGEXP 'abc[[:space:]]'

i was just working on case just like that when using LIKE with wildcard (%) resulting in an unexpected result. While searching i also found STRCMP(text1, text2) under string comparison feature of mysql which compares two string. however using BINARY with LIKE solved the problem for me.
SELECT * FROM barcode WHERE `code` LIKE BINARY 'abc ';

You could do this:
SELECT * FROM barcode WHERE `code` = 'abc '
AND CHAR_LENGTH(`code`)=CHAR_LENGTH('abc ');

I am assuming you only want one result, you could use LIMIT
SELECT * FROM barcode WHERE `code` = 'abc ' LIMIT 1;
To do exact string matching you could use Collation
SELECT *
FROM barcode
WHERE code COLLATE utf8_bin = 'abc';

The sentence right after the one quoted by Kaii basically says "use LIKE" :
“Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant
and the example below shows that 'Monty' = 'Monty ' is true, but not 'Monty' LIKE 'Monty '.
However, if you use LIKE, beware of literal strings containing the '%', '_' or '\' characters : '%' and '_' are wildcard characters, '\' is used to escape sequences.

Related

how to make default text column where comparision binary (case sensitive and trim)

Sorry if this is duplicated, but I don't know how to find about the question.
Hi, this my table:
CREATE TABLE `log_Valor` (
`idLog_Valor` int(11) NOT NULL AUTO_INCREMENT,
`Valor` text binary NOT NULL,
PRIMARY KEY (`idLog_Valor`)
)
ENGINE=InnoDB;
INSERT INTO `log_Valor` (Valor) VALUES ('teste');
INSERT INTO `log_Valor` (Valor) VALUES ('teste ');
I have 2 rows:
1 | 'teste'
2 | 'teste '
When I run:
SELECT * FROM log_Valor where valor = 'teste'
It returns the two rows.
How do I make default comparison case sensitive and to not trim without having to specify in the query BINARY?
Use LIKE instead of =.
SELECT * FROM log_Valor WHERE valor LIKE 'teste';
From the documentation
In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator
DEMO

MySQL, Search for data containing hex character

I've got a table my_table with a varchar column col1. utf8
If I was looking for all rows containing the letter a in col1 (balloon, aardvark, etc) then I'd do:
select col1
from my_table
where col1 like "%a%" -- But how search for special hex character?
But what should I put instead of "%a%" if I'm looking for a special hex character, in this case 0xFFFC?
(This is the character: http://www.fileformat.info/info/unicode/char/fffc/index.htm)
Note that I am looking for a way to specify this character in the WHERE clause. I've seen this https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html as well as Stackoverflow questions/answers that also use hex characters in the Select part. I need it in the WHERE clause. I have seen this How to find certain Hex values and Char() Values in a MySQL SELECT but that uses char(128), but I haven't got an equivalent char number in my case.
use: 0x61 == 'a'
select col1
from my_table
where col1 LIKE concat('%',0x61,'%');
Her is a Sample
CREATE TABLE `tmptable` (
`image` varchar(250) DEFAULT NULL,
UNIQUE KEY `d` (`image`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tmptable` (`image`)
VALUES
('äöüß');
> SELECT image,hex(image) FROM tmptable WHERE image LIKE concat ('%',0xC39F,'%');
+--------------+--------------------------+
| image | hex(image) |
+--------------+--------------------------+
| äöüß`´' | C3A4C3B6C3BCC39F60C2B427 |
+--------------+--------------------------+
1 row in set (0.00 sec)
You can write your 'select' as this:
select col1
from my_table
where col1 LIKE CONCAT('%',X'FFFC','%');
You can read how use hexadecimal, as you say, in documentation https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html
and using concat you use the character resolved.

SELECT LIKE with binary column returns no results when hex is too long for a specific uuid

I have a table with a UUID column (code below, shortened for demo sake). When I was demoing my application yesterday, I had an issue where the an item I had created in the table with a uuid4 value would not pull up when searching for the full uuid; but eventually pulled up when I tried shortening the UUID I was searching for. When I created a second item with a new uuid4, I was able to search for that one just fine.
I'm not sure what the issue might be nor how to even debug it on the MySQL side. 'HEX'ing the column and doing a string comparison is not an option as I need to be able to use the index on the uuid column.
Since using UUID's is the primary and only method of looking items up (business requirements dictate not using the integer PK), I need to determine a solution that works 100% of the time.
CREATE TABLE `ItemTable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uuid` binary(16) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uuid` (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `ItemTable` (`uuid`)
VALUES (UNHEX('0BFADD4EEFC14C05A7CA83245C37EEB7'));
INSERT INTO `ItemTable` (`uuid`)
VALUES (UNHEX('6C52ACF3864E49BCBC4E7A7B2CBB90C5'));
SELECT * FROM `ItemTable` where uuid LIKE CONCAT(UNHEX('0BFADD4EEFC14C05A7CA83245C37EEB7'), '%'); -- No Results
SELECT * FROM `ItemTable` where uuid LIKE CONCAT(UNHEX('0BFADD4EEFC14C05A7CA83245C37EE'), '%'); -- No Results
SELECT * FROM `ItemTable` where uuid LIKE CONCAT(UNHEX('0BFADD4EEFC14C05A7CA83245C37'), '%'); -- No Results
SELECT * FROM `ItemTable` where uuid LIKE CONCAT(UNHEX('0BFADD4EEFC14C05A7CA83245C'), '%'); -- No Results
SELECT * FROM `ItemTable` where uuid LIKE CONCAT(UNHEX('0BFADD4EEFC14C05A7CA8324'), '%'); -- Theres the first one!!
SELECT * FROM `ItemTable` where uuid LIKE CONCAT(UNHEX('6C52ACF3864E49BCBC4E7A7B2CBB90C5'), '%'); -- Works right off the bat with the full UUID.
Here's a few notes to aid you in debugging this. Somewhere below may be an "answer" to a "question" you might have had, but didn't ask. (I'm assuming that you meant to ask a question, and weren't just giving a status report.)
x'5C' evaluates to a backslash character '\'. And the backslash is the standard character that MySQL uses for escape sequences.
As an example, '\n' is not interpreted as two separate characters (a backslash and an n). It's evaluated as a single newline character. To get a backslash character returned, we normally have to escape the backslash itself with another backslash. As a demonstration, consider:
SELECT HEX('\n') --> '0A'
SELECT HEX('\\n') --> '5C6E'
The UNHEX function returns a binary string. Before MySQL 5.5, the CONCAT function returned a binary string, but with later versions, it produces a nonbinary string. I'd expect the LIKE comparison would work with BINARY datatype (that's my expectation, but I haven't tested that.)
But consider this: the effect of "doubling up" the backslash characters...
SELECT v
, v LIKE UNHEX('335C37')
, v LIKE UNHEX('335C5C37')
, v LIKE REPLACE(UNHEX('335C37'),'\\','\\\\')
FROM (
SELECT CAST(UNHEX('335C37') AS BINARY) AS v
) t
Returns:
v v LIKE UNHEX('335C37') v LIKE UNHEX('335C5C37') v LIKE REPLACE(UNHEX('335C37'),'\\','\\\\')
---- ---------------------- ------------------------ -------------------------------------------
3\7 0 1 1
NOTE: you would perform the REPLACE function after the UNHEX operation, a sequence of 5C could actually be from '35C4'. Doing the replacement on the hex digits (without assuring the byte boundary) would introduce a spurious C5 character
DON'T DO THIS:
UNHEX(REPLACE('35C4','5C','5C5C')) --> x'35C5C4' (wrong!)
DO THIS:
REPLACE(UNHEX('35C4'),'\\','\\\\') --> x'35C4' (right!)
Are there characters other than the backslash that present potential issues?
Consider x'5F', and x'25', which are the underscore and percent characters, respectively. Those have special meanings in a string on the right side of a LIKE operator. (Likely, you'll want to escape those as well.
Escape the backslashes first, then the other characters. I'm thinking you are going to need an expression like this:
REPLACE(REPLACE(REPLACE( x ,'\\','\\\\'),'_','\\_'),'%','\\%')
And do that before you concatenate on the final '%' character.

INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.
My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR does some conversion, it finds ë in é.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.
Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive
This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.
The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.
To pervert the reports example, if you create the following table:
create table t ( needle varchar(10), haystack varchar(10)
) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");
then run this query, you can see the same behaviour demonstrated:
select needle
, haystack
, needle=haystack as `=`
, haystack LIKE CONCAT('%',needle,'%') as `like`
, instr(needle, haystack) as `instr`
from t;
SQL Fiddle

Strange behavior when query for varchar filed

I came across this strange behavior when I was hunting for a bug in a system. Consider following.
We have a mysql table which have varchar(100) column. See the following sql script.
create table user(`id` bigint(20) NOT NULL AUTO_INCREMENT,`user_id` varchar(100) NOT NULL,`username` varchar(255) DEFAULT NULL,PRIMARY KEY (`id`),UNIQUE KEY `user_id` (`user_id`)) ENGINE=InnoDB AUTO_INCREMENT=129 DEFAULT CHARSET=latin1;
insert into user(user_id, username) values('20120723145614834', 'user1');
insert into user(user_id, username) values('20120723151128642', 'user1');
When I execute following query I received 0 results.
select * from user where user_id=20120723145614834;
But When I execute following I get the result(note the single quote).
select * from user where user_id='20120723145614834';
This is expected since user_id field is varchar. Strange thing is that both following queries yield result.
select * from user where user_id=20120723151128642;
select * from user where user_id='20120723151128642';
Can anybody explain me the reason for this strange behavior. My MySql version is 5.1.63-0ubuntu0.11.10.1
Check mysql document 12.2. Type Conversion in Expression Evaluation
Comparisons that use floating-point numbers (or values that are
converted to floating-point numbers) are approximate because such
numbers are inexact. This might lead to results that appear
inconsistent:
mysql> SELECT '18015376320243458' = 18015376320243458;
-> 1
mysql> SELECT '18015376320243459' = 18015376320243459;
-> 0
So we better use always right data type for SQL.