I am using 5.5.65-MariaDB MariaDB Server.
I have a table with a column of type medium text, named "remoteData", where I store a json string.
String values in this json string are stored as escaped utf8 sequences, for example
"patientFirstName":"\u0395\u039b\u0395\u03a5\u0398\u0395\u03a1\u0399\u039f\u03a3"
The above value is the Greek Name "ΕΛΕΥΘΕΡΙΟΣ".
I am trying to search this column using the query
Select * from sync_details where remoteData like "%ΛΕΥΘΕΡ%"
but I get an empty set.
I assume this is because of the values being escaped but I don't know what to do.
EDIT: The query will run through php so we can use a solution that includes php functions.
Thank you in advance.
Christoforos
With a database defined to use CHARACTER SET utf8and a utf8_general_ci collation it should just work like this:
CREATE DATABASE IF NOT EXISTS `test` CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `test`.`sync_details` (`remoteData` MEDIUMTEXT);
INSERT INTO `test`.`sync_details` (`remoteData`) VALUES ('{"patientFirstName":"\\u0395\\u039b\\u0395\\u03a5\\u0398\\u0395\\u03a1\\u0399\\u039f\\u03a3"}');
SELECT `remoteData` FROM `test`.`sync_details` WHERE `remoteData` LIKE '%ΛΕΥΘΕΡ%';
+----------------------------------------------+
| remoteData |
+----------------------------------------------+
| {"patientFirstName": "ΕΛΕΥΘΕΡΙΟΣ"} |
+----------------------------------------------+
1 row in set (0,00 sec)
You could also try JSON_EXTRACT to get structured data from the stored JSON object. I just tested it like this:
SELECT JSON_EXTRACT(`remoteData`, "$.patientFirstName")
FROM `test`.`sync_details`
WHERE JSON_EXTRACT(`remoteData`, "$.patientFirstName")
LIKE '%ΛΕΥΘΕΡ%';
+--------------------------------------------------+
| JSON_EXTRACT(`remoteData`, "$.patientFirstName") |
+--------------------------------------------------+
| "ΕΛΕΥΘΕΡΙΟΣ" |
+--------------------------------------------------+
1 row in set (0,00 sec)
To index data in the JSON object you could add a "Generated Column" to your table using the GENERATED ALWAYS syntax
ALTER TABLE `test`.`sync_details` ADD COLUMN `firstName` VARCHAR(100) GENERATED ALWAYS AS (`remoteData` ->> '$.patientFirstName');
CREATE INDEX `firstnames_idx` ON `test`.`sync_details`(`firstName`);
SELECT `firstName` FROM `test`.`sync_details` WHERE `firstName` LIKE '%ΛΕΥΘΕΡ%';
+----------------------+
| firstName |
+----------------------+
| ΕΛΕΥΘΕΡΙΟΣ |
+----------------------+
1 row in set (0,00 sec)
This will only work with MariaDB >= 10.2 and with a utf8 encoded db and a utf8_general_ci collation.
Related
I have a table with data already in it. I would like to change the character encoding for one of the columns. Currently the column seems to have two encodings. Even after changing it, I see the same results.
Current Encoding
mysql> SELECT character_set_name FROM information_schema.`COLUMNS`
-> WHERE table_name = "mytable"
-> AND column_name = "my_col";
+--------------------+
| character_set_name |
+--------------------+
| latin1 |
| utf8 |
+--------------------+
2 rows in set (0.02 sec)
Changing the encoding (0 rows are affected)
mysql> ALTER TABLE mytable MODIFY my_col LONGTEXT CHARACTER SET utf8;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
You probably have 2 rows because it is two different tables in two different databases.
Do SELECT * ... instead of SELECT character_set_name ....
ALTER TABLE mytable MODIFY my_col LONGTEXT CHARACTER SET utf8; is safe only if there are no values in mytable.my_col yet.
A table declared to be latin1, and containing latin1 bytes can be converted to utf8 via
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8;
Desired result :
Have an accent sensitive primary key in MySQL.
I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).
I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé' and 'demande', two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.
Error :
When trying to create the 'demandé' row with the following query:
INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');
I got this error :
ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'
Questions :
Where in the process should a make a modification in order to have two different unique primary keys for "demande" and "demandé" in that table ?
SOLUTION using 'collate utf8_general_ci' in table declaration
How can i make accent sensitive queries ? Is the following the right way :
SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
SOLUTION using 'collate utf8_bin' with WHERE statement
I found that i can achieve this point by using the BINARY Keyword (see this sqlFiddle). What is the difference between collate and binary?
Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)
I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?
Step by step :
Database creation :
CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
Table of unique words :
CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
Queries
SELECT * FROM corpus.token WHERE name_token = 'demande';
SELECT * FROM corpus.token WHERE name_token = 'demandé';
both returns the same row:
demande
Collations. You have two choices, not three:
utf8_bin treats all of these as different: demandé and demande and Demandé.
utf8_..._ci (typically utf8_general_ci or utf8_unicode_ci) treats all of these as the same: demandé and demande and Demandé.
If you want only case sensitivity (demandé = demande, but neither match Demandé), you are out of luck.
If you want only accent sensitivity (demandé = Demandé, but neither match demande), you are out of luck.
Declaration. The best way to do whatever you pick:
CREATE TABLE (
name VARCHAR(...) CHARACTER SET utf8 COLLATE utf8_... NOT NULL,
...
PRIMARY KEY(name)
)
Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name:
WHERE name = ... COLLATE ...
BINARY. The datatypes BINARY, VARBINARY and BLOB are very much like CHAR, VARCHAR, and TEXT with COLLATE ..._bin. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin, but it will not be checked when storing into VARBINARY.... Comparisons (WHERE, ORDER BY, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.
May be you need this
_ci in a collation name=case insensitive
If your searches on that field are always going to be case-sensitive, then declare the collation of the field as utf8_bin... that'll compare for equality the utf8-encoded bytes.
col_name varchar(10) collate utf8_bin
If searches are normally case-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'demandé' collate utf8_bin
More here
Try this
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
-> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)
I have the following script
set username utf8;
insert into tables values ('Active','活跃')
However after the script run, the inserted value for the chinese character is
活跃
What did I miss here ?
so that you have to change Collation to Chinese.
You can change that by here(below image)
CREATE TABLE big5 (BIG5 CHAR(1) CHARACTER SET BIG5);
mysql> INSERT INTO big5 VALUES (0xf9dc);
mysql> SELECT * FROM big5;
+------+
| big5 |
+------+
| 嫺 |
+------+
MySQL Big5 Chinese character set
Read this as well
In database,Also set collation of fields to utf8_general_ci. If you can set collation of databas and tables to utg8_general_ci as well, that will even be better.
It's frustrated with MySQL's pattern escaping used in LIKE operator.
root#dev> create table foo(name varchar(255));
Query OK, 0 rows affected (0.02 sec)
root#dev> insert into foo values('with\\slash');
Query OK, 1 row affected (0.00 sec)
root#dev> insert into foo values('\\slash');
Query OK, 1 row affected (0.00 sec)
root#dev> select * from foo where name like '%\\\\%';
Empty set (0.01 sec)
root#dev> select * from foo;
+------------+
| name |
+------------+
| with\slash |
| \slash |
+------------+
2 rows in set (0.00 sec)
root#dev> select * from foo where name like '%\\\\%';
Empty set (0.00 sec)
root#dev> select * from foo where name like binary '%\\\\%';
+------------+
| name |
+------------+
| with\slash |
| \slash |
+------------+
2 rows in set (0.00 sec)
According to MySQL docs: http://dev.mysql.com/doc/refman/5.5/en/string-comparison-functions.html#operator_like
%\\\\% is the right operand, but why it yields no result?
EDIT:
The database I'm testing that in has character_set_database set to utf8. To further my investigation, I created the same setup in a database that has character_set_database set to latin1, and guess what, '%\\\\%' works!
EDIT:
The problem can be reproduced and it's the field collation problem. Details: http://bugs.mysql.com/bug.php?id=63829
In MySQL 5.6.10, with the text field collation utf8mb4_unicode_520_ci this can be achieved by using 5 backslash characters instead of 4, i.e:
select * from foo where name like binary '%\\\\\%';
Somehow, against all expectations, this properly finds all rows with backslashes.
At least this should work until the MySQL field collation bug above is fixed. Considering it's been more than 5 years since the bug is discovered, any app designed with this may outlive its usefulness before MySQL is even fixed - so should be a pretty reliable workaround.
With MySQL 5.0.12 dev on Windows 10 I got the following results when I changed the query from
SELECT * FROM `foo` WHERE `name` LIKE '%http:\/\/%'
to
SELECT * FROM `foo` WHERE `name` LIKE '%http:\\\\\\\%'
it works and yet the first string with forward slashes was the original field content. It seems to have interpreted forward slashes as backslashes.
It seems it has some relation to that MySQL bug: http://bugs.mysql.com/bug.php?id=46659
I think you connect to mysql not specifying correct --character-set-server option (which defaults to latin1 with collation latin1_swedish_ci), and having utf-8 as the current charset of the console. That causes incorrect char conversions and comparisons when you deal with data which supposed to be converted to the utf8 from the charset of --character-set-server.
I have a MySQL query:
SELECT concat_ws(title,description) as concatenated HAVING concatenated LIKE '%SearchTerm%';
And my table is encoded utf8_general_ci with MyISAM.
Searches seem to be case sensitive.
I can't figure out how to fix it. What's going wrong and/or how do I fix it?
A much better solution in terms of performance:
SELECT .... FROM .... WHERE `concatenated` LIKE BINARY '%SearchTerm%';
String comparision is case-sensitive when any of the operands is a binary string.
Another alternative is to use COLLATE,
SELECT ....
FROM ....
WHERE `concatenated` like '%SearchTerm%' COLLATE utf8_bin;
Try this:
SELECT LOWER(CONCAT_WS(title,description)) AS concatenated
WHERE concatenated LIKE '%searchterm%'
or (to let you see the difference)
SELECT LOWER(CONCAT_WS(title,description)) AS concatenated
WHERE concatenated LIKE LOWER('%SearchTerm%')
In this method, you do not have to select the searched field:
SELECT table.id
FROM table
WHERE LOWER(table.aTextField) LIKE LOWER('%SearchAnything%')
Check CHARSET mentioned in the table schema:
show create table xyz;
Based on CHARSET, you can try the following.
select name from xyz where name like '%Man%' COLLATE latin1_bin;
select name from xyz where name like '%Man%' COLLATE utf8_bin;
Following are the cases which worked for me, CHARSET=latin1, MySQL version = 5.6.
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'Promo%' collate latin1_bin limit 1;
+-----------------------+
| installsrc |
+-----------------------+
| PromoBalance_SMS,null |
+-----------------------+
1 row in set (0.01 sec)
mysql>
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'PROMO%' collate latin1_bin limit 1;
+---------------------------+
| installsrc |
+---------------------------+
| PROMO_SMS_MISSEDCALL,null |
+---------------------------+
1 row in set (0.00 sec)
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'PROMO%' limit 1;
+-----------------------+
| installsrc |
+-----------------------+
| PromoBalance_SMS,null |
+-----------------------+
1 row in set (0.01 sec)
Just for completion, in case it helps:
As stated on https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html, for default character sets, nonbinary string comparisons are case insensitive by default.
Therefore, an easy way to perform case-insensitive comparisons is to cast the field to CHAR, VARCHAR or TEXT type.
Here is an example with a check against a single field:
SELECT * FROM table1 WHERE CAST(`field1` AS CHAR) LIKE '%needle%';
This problem is occurring in this case because of the collation used in the table. You have used utf8_general_ci as collation. If the collation is changed to utf8_general_ci then the searches will not be case sensitive.
So, one possible solution is to change the collation.
This is the working code:
SELECT title,description
FROM (
SELECT title,description, LOWER(CONCAT_WS(title,description)) AS concatenated
FROM table1
) AS Q
WHERE concatenated LIKE LOWER('%search%')
This works also:
SELECT LOWER(DisplayName) as DN
FROM Bidders
WHERE OrgID=45
HAVING DN like "cbbautos%"
LIMIT 10;