I have a MySQL query:
SELECT concat_ws(title,description) as concatenated HAVING concatenated LIKE '%SearchTerm%';
And my table is encoded utf8_general_ci with MyISAM.
Searches seem to be case sensitive.
I can't figure out how to fix it. What's going wrong and/or how do I fix it?
A much better solution in terms of performance:
SELECT .... FROM .... WHERE `concatenated` LIKE BINARY '%SearchTerm%';
String comparision is case-sensitive when any of the operands is a binary string.
Another alternative is to use COLLATE,
SELECT ....
FROM ....
WHERE `concatenated` like '%SearchTerm%' COLLATE utf8_bin;
Try this:
SELECT LOWER(CONCAT_WS(title,description)) AS concatenated
WHERE concatenated LIKE '%searchterm%'
or (to let you see the difference)
SELECT LOWER(CONCAT_WS(title,description)) AS concatenated
WHERE concatenated LIKE LOWER('%SearchTerm%')
In this method, you do not have to select the searched field:
SELECT table.id
FROM table
WHERE LOWER(table.aTextField) LIKE LOWER('%SearchAnything%')
Check CHARSET mentioned in the table schema:
show create table xyz;
Based on CHARSET, you can try the following.
select name from xyz where name like '%Man%' COLLATE latin1_bin;
select name from xyz where name like '%Man%' COLLATE utf8_bin;
Following are the cases which worked for me, CHARSET=latin1, MySQL version = 5.6.
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'Promo%' collate latin1_bin limit 1;
+-----------------------+
| installsrc |
+-----------------------+
| PromoBalance_SMS,null |
+-----------------------+
1 row in set (0.01 sec)
mysql>
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'PROMO%' collate latin1_bin limit 1;
+---------------------------+
| installsrc |
+---------------------------+
| PROMO_SMS_MISSEDCALL,null |
+---------------------------+
1 row in set (0.00 sec)
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'PROMO%' limit 1;
+-----------------------+
| installsrc |
+-----------------------+
| PromoBalance_SMS,null |
+-----------------------+
1 row in set (0.01 sec)
Just for completion, in case it helps:
As stated on https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html, for default character sets, nonbinary string comparisons are case insensitive by default.
Therefore, an easy way to perform case-insensitive comparisons is to cast the field to CHAR, VARCHAR or TEXT type.
Here is an example with a check against a single field:
SELECT * FROM table1 WHERE CAST(`field1` AS CHAR) LIKE '%needle%';
This problem is occurring in this case because of the collation used in the table. You have used utf8_general_ci as collation. If the collation is changed to utf8_general_ci then the searches will not be case sensitive.
So, one possible solution is to change the collation.
This is the working code:
SELECT title,description
FROM (
SELECT title,description, LOWER(CONCAT_WS(title,description)) AS concatenated
FROM table1
) AS Q
WHERE concatenated LIKE LOWER('%search%')
This works also:
SELECT LOWER(DisplayName) as DN
FROM Bidders
WHERE OrgID=45
HAVING DN like "cbbautos%"
LIMIT 10;
Related
I am using 5.5.65-MariaDB MariaDB Server.
I have a table with a column of type medium text, named "remoteData", where I store a json string.
String values in this json string are stored as escaped utf8 sequences, for example
"patientFirstName":"\u0395\u039b\u0395\u03a5\u0398\u0395\u03a1\u0399\u039f\u03a3"
The above value is the Greek Name "ΕΛΕΥΘΕΡΙΟΣ".
I am trying to search this column using the query
Select * from sync_details where remoteData like "%ΛΕΥΘΕΡ%"
but I get an empty set.
I assume this is because of the values being escaped but I don't know what to do.
EDIT: The query will run through php so we can use a solution that includes php functions.
Thank you in advance.
Christoforos
With a database defined to use CHARACTER SET utf8and a utf8_general_ci collation it should just work like this:
CREATE DATABASE IF NOT EXISTS `test` CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `test`.`sync_details` (`remoteData` MEDIUMTEXT);
INSERT INTO `test`.`sync_details` (`remoteData`) VALUES ('{"patientFirstName":"\\u0395\\u039b\\u0395\\u03a5\\u0398\\u0395\\u03a1\\u0399\\u039f\\u03a3"}');
SELECT `remoteData` FROM `test`.`sync_details` WHERE `remoteData` LIKE '%ΛΕΥΘΕΡ%';
+----------------------------------------------+
| remoteData |
+----------------------------------------------+
| {"patientFirstName": "ΕΛΕΥΘΕΡΙΟΣ"} |
+----------------------------------------------+
1 row in set (0,00 sec)
You could also try JSON_EXTRACT to get structured data from the stored JSON object. I just tested it like this:
SELECT JSON_EXTRACT(`remoteData`, "$.patientFirstName")
FROM `test`.`sync_details`
WHERE JSON_EXTRACT(`remoteData`, "$.patientFirstName")
LIKE '%ΛΕΥΘΕΡ%';
+--------------------------------------------------+
| JSON_EXTRACT(`remoteData`, "$.patientFirstName") |
+--------------------------------------------------+
| "ΕΛΕΥΘΕΡΙΟΣ" |
+--------------------------------------------------+
1 row in set (0,00 sec)
To index data in the JSON object you could add a "Generated Column" to your table using the GENERATED ALWAYS syntax
ALTER TABLE `test`.`sync_details` ADD COLUMN `firstName` VARCHAR(100) GENERATED ALWAYS AS (`remoteData` ->> '$.patientFirstName');
CREATE INDEX `firstnames_idx` ON `test`.`sync_details`(`firstName`);
SELECT `firstName` FROM `test`.`sync_details` WHERE `firstName` LIKE '%ΛΕΥΘΕΡ%';
+----------------------+
| firstName |
+----------------------+
| ΕΛΕΥΘΕΡΙΟΣ |
+----------------------+
1 row in set (0,00 sec)
This will only work with MariaDB >= 10.2 and with a utf8 encoded db and a utf8_general_ci collation.
Desired result :
Have an accent sensitive primary key in MySQL.
I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).
I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé' and 'demande', two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.
Error :
When trying to create the 'demandé' row with the following query:
INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');
I got this error :
ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'
Questions :
Where in the process should a make a modification in order to have two different unique primary keys for "demande" and "demandé" in that table ?
SOLUTION using 'collate utf8_general_ci' in table declaration
How can i make accent sensitive queries ? Is the following the right way :
SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
SOLUTION using 'collate utf8_bin' with WHERE statement
I found that i can achieve this point by using the BINARY Keyword (see this sqlFiddle). What is the difference between collate and binary?
Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)
I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?
Step by step :
Database creation :
CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
Table of unique words :
CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
Queries
SELECT * FROM corpus.token WHERE name_token = 'demande';
SELECT * FROM corpus.token WHERE name_token = 'demandé';
both returns the same row:
demande
Collations. You have two choices, not three:
utf8_bin treats all of these as different: demandé and demande and Demandé.
utf8_..._ci (typically utf8_general_ci or utf8_unicode_ci) treats all of these as the same: demandé and demande and Demandé.
If you want only case sensitivity (demandé = demande, but neither match Demandé), you are out of luck.
If you want only accent sensitivity (demandé = Demandé, but neither match demande), you are out of luck.
Declaration. The best way to do whatever you pick:
CREATE TABLE (
name VARCHAR(...) CHARACTER SET utf8 COLLATE utf8_... NOT NULL,
...
PRIMARY KEY(name)
)
Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name:
WHERE name = ... COLLATE ...
BINARY. The datatypes BINARY, VARBINARY and BLOB are very much like CHAR, VARCHAR, and TEXT with COLLATE ..._bin. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin, but it will not be checked when storing into VARBINARY.... Comparisons (WHERE, ORDER BY, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.
May be you need this
_ci in a collation name=case insensitive
If your searches on that field are always going to be case-sensitive, then declare the collation of the field as utf8_bin... that'll compare for equality the utf8-encoded bytes.
col_name varchar(10) collate utf8_bin
If searches are normally case-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'demandé' collate utf8_bin
More here
Try this
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
-> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)
There is a table with 2 records - u and ù:
CREATE TABLE `tbl` (`text` text NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `tbl` (`text`) VALUES ('u'), ('ù');
I want to select row with ù:
SELECT * FROM `tbl` WHERE `text` = 'ù';
The result is
+------+
| text |
+------+
| u |
| ù |
+------+
What is the problem here? How can I work with such characters?
This is to do with the collation used when mysql compares values. If you run the following query, you'll see which collation is in effect:
show collation where Charset = 'utf8';
One of those should have a Default value of yes. In my case it's utf8_general_ci. This collation uses Unicode ordering to equate characters with accents and those without.
If you run the following query:
SELECT * FROM `tbl` WHERE `text` = 'ù' collate utf8_bin;
Then you'll only get one row back. There's a lot more information in the MySQL documentation.
It's frustrated with MySQL's pattern escaping used in LIKE operator.
root#dev> create table foo(name varchar(255));
Query OK, 0 rows affected (0.02 sec)
root#dev> insert into foo values('with\\slash');
Query OK, 1 row affected (0.00 sec)
root#dev> insert into foo values('\\slash');
Query OK, 1 row affected (0.00 sec)
root#dev> select * from foo where name like '%\\\\%';
Empty set (0.01 sec)
root#dev> select * from foo;
+------------+
| name |
+------------+
| with\slash |
| \slash |
+------------+
2 rows in set (0.00 sec)
root#dev> select * from foo where name like '%\\\\%';
Empty set (0.00 sec)
root#dev> select * from foo where name like binary '%\\\\%';
+------------+
| name |
+------------+
| with\slash |
| \slash |
+------------+
2 rows in set (0.00 sec)
According to MySQL docs: http://dev.mysql.com/doc/refman/5.5/en/string-comparison-functions.html#operator_like
%\\\\% is the right operand, but why it yields no result?
EDIT:
The database I'm testing that in has character_set_database set to utf8. To further my investigation, I created the same setup in a database that has character_set_database set to latin1, and guess what, '%\\\\%' works!
EDIT:
The problem can be reproduced and it's the field collation problem. Details: http://bugs.mysql.com/bug.php?id=63829
In MySQL 5.6.10, with the text field collation utf8mb4_unicode_520_ci this can be achieved by using 5 backslash characters instead of 4, i.e:
select * from foo where name like binary '%\\\\\%';
Somehow, against all expectations, this properly finds all rows with backslashes.
At least this should work until the MySQL field collation bug above is fixed. Considering it's been more than 5 years since the bug is discovered, any app designed with this may outlive its usefulness before MySQL is even fixed - so should be a pretty reliable workaround.
With MySQL 5.0.12 dev on Windows 10 I got the following results when I changed the query from
SELECT * FROM `foo` WHERE `name` LIKE '%http:\/\/%'
to
SELECT * FROM `foo` WHERE `name` LIKE '%http:\\\\\\\%'
it works and yet the first string with forward slashes was the original field content. It seems to have interpreted forward slashes as backslashes.
It seems it has some relation to that MySQL bug: http://bugs.mysql.com/bug.php?id=46659
I think you connect to mysql not specifying correct --character-set-server option (which defaults to latin1 with collation latin1_swedish_ci), and having utf-8 as the current charset of the console. That causes incorrect char conversions and comparisons when you deal with data which supposed to be converted to the utf8 from the charset of --character-set-server.
How can I see what collation a table has? I.E. I want to see:
+-----------------------------+
| table | collation |
|-----------------------------|
| t_name | latin_general_ci |
+-----------------------------+
SHOW TABLE STATUS shows information about a table, including the collation.
For example SHOW TABLE STATUS where name like 'TABLE_NAME'
The above answer is great, but it doesn't actually provide an example that saves the user from having to look up the syntax:
show table status like 'test';
Where test is the table name.
(Corrected as per comments below.)
Checking the collation of a specific table
You can query INFORMATION_SCHEMA.TABLES and get the collation for a specific table:
SELECT TABLE_SCHEMA
, TABLE_NAME
, TABLE_COLLATION
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 't_name';
that gives a much more readable output in contrast to SHOW TABLE STATUS that contains a lot of irrelevant information.
Checking the collation of columns
Note that collation can also be applied to columns (which might have a different collation than the table itself). To fetch the columns' collation for a particular table, you can query INFORMATION_SCHEMA.COLUMNS:
SELECT TABLE_SCHEMA
, TABLE_NAME
, COLUMN_NAME
, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 't_name';
For more details you can refer to the article How to Check and Change the Collation of MySQL Tables
Use this query:
SHOW CREATE TABLE tablename
You will get all information related to table.
Check collation of the whole database
If someone is looking here also for a way to check collation on the whole database:
use mydatabase; (where mydatabase is the name of the database you're going to check)
SELECT ##character_set_database, ##collation_database;
You should see the result like:
+--------------------------+----------------------+
| ##character_set_database | ##collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_unicode_ci |
+--------------------------+----------------------+
1 row in set (0.00 sec)