Select characters like ù - mysql

There is a table with 2 records - u and ù:
CREATE TABLE `tbl` (`text` text NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `tbl` (`text`) VALUES ('u'), ('ù');
I want to select row with ù:
SELECT * FROM `tbl` WHERE `text` = 'ù';
The result is
+------+
| text |
+------+
| u |
| ù |
+------+
What is the problem here? How can I work with such characters?

This is to do with the collation used when mysql compares values. If you run the following query, you'll see which collation is in effect:
show collation where Charset = 'utf8';
One of those should have a Default value of yes. In my case it's utf8_general_ci. This collation uses Unicode ordering to equate characters with accents and those without.
If you run the following query:
SELECT * FROM `tbl` WHERE `text` = 'ù' collate utf8_bin;
Then you'll only get one row back. There's a lot more information in the MySQL documentation.

Related

Mariadb query utf8 escaped string

I am using 5.5.65-MariaDB MariaDB Server.
I have a table with a column of type medium text, named "remoteData", where I store a json string.
String values in this json string are stored as escaped utf8 sequences, for example
"patientFirstName":"\u0395\u039b\u0395\u03a5\u0398\u0395\u03a1\u0399\u039f\u03a3"
The above value is the Greek Name "ΕΛΕΥΘΕΡΙΟΣ".
I am trying to search this column using the query
Select * from sync_details where remoteData like "%ΛΕΥΘΕΡ%"
but I get an empty set.
I assume this is because of the values being escaped but I don't know what to do.
EDIT: The query will run through php so we can use a solution that includes php functions.
Thank you in advance.
Christoforos
With a database defined to use CHARACTER SET utf8and a utf8_general_ci collation it should just work like this:
CREATE DATABASE IF NOT EXISTS `test` CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `test`.`sync_details` (`remoteData` MEDIUMTEXT);
INSERT INTO `test`.`sync_details` (`remoteData`) VALUES ('{"patientFirstName":"\\u0395\\u039b\\u0395\\u03a5\\u0398\\u0395\\u03a1\\u0399\\u039f\\u03a3"}');
SELECT `remoteData` FROM `test`.`sync_details` WHERE `remoteData` LIKE '%ΛΕΥΘΕΡ%';
+----------------------------------------------+
| remoteData |
+----------------------------------------------+
| {"patientFirstName": "ΕΛΕΥΘΕΡΙΟΣ"} |
+----------------------------------------------+
1 row in set (0,00 sec)
You could also try JSON_EXTRACT to get structured data from the stored JSON object. I just tested it like this:
SELECT JSON_EXTRACT(`remoteData`, "$.patientFirstName")
FROM `test`.`sync_details`
WHERE JSON_EXTRACT(`remoteData`, "$.patientFirstName")
LIKE '%ΛΕΥΘΕΡ%';
+--------------------------------------------------+
| JSON_EXTRACT(`remoteData`, "$.patientFirstName") |
+--------------------------------------------------+
| "ΕΛΕΥΘΕΡΙΟΣ" |
+--------------------------------------------------+
1 row in set (0,00 sec)
To index data in the JSON object you could add a "Generated Column" to your table using the GENERATED ALWAYS syntax
ALTER TABLE `test`.`sync_details` ADD COLUMN `firstName` VARCHAR(100) GENERATED ALWAYS AS (`remoteData` ->> '$.patientFirstName');
CREATE INDEX `firstnames_idx` ON `test`.`sync_details`(`firstName`);
SELECT `firstName` FROM `test`.`sync_details` WHERE `firstName` LIKE '%ΛΕΥΘΕΡ%';
+----------------------+
| firstName |
+----------------------+
| ΕΛΕΥΘΕΡΙΟΣ |
+----------------------+
1 row in set (0,00 sec)
This will only work with MariaDB >= 10.2 and with a utf8 encoded db and a utf8_general_ci collation.

Use accent senstive primary key in MySQL

Desired result :
Have an accent sensitive primary key in MySQL.
I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).
I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé' and 'demande', two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.
Error :
When trying to create the 'demandé' row with the following query:
INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');
I got this error :
ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'
Questions :
Where in the process should a make a modification in order to have two different unique primary keys for "demande" and "demandé" in that table ?
SOLUTION using 'collate utf8_general_ci' in table declaration
How can i make accent sensitive queries ? Is the following the right way :
SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
SOLUTION using 'collate utf8_bin' with WHERE statement
I found that i can achieve this point by using the BINARY Keyword (see this sqlFiddle). What is the difference between collate and binary?
Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)
I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?
Step by step :
Database creation :
CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
Table of unique words :
CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
Queries
SELECT * FROM corpus.token WHERE name_token = 'demande';
SELECT * FROM corpus.token WHERE name_token = 'demandé';
both returns the same row:
demande
Collations. You have two choices, not three:
utf8_bin treats all of these as different: demandé and demande and Demandé.
utf8_..._ci (typically utf8_general_ci or utf8_unicode_ci) treats all of these as the same: demandé and demande and Demandé.
If you want only case sensitivity (demandé = demande, but neither match Demandé), you are out of luck.
If you want only accent sensitivity (demandé = Demandé, but neither match demande), you are out of luck.
Declaration. The best way to do whatever you pick:
CREATE TABLE (
name VARCHAR(...) CHARACTER SET utf8 COLLATE utf8_... NOT NULL,
...
PRIMARY KEY(name)
)
Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name:
WHERE name = ... COLLATE ...
BINARY. The datatypes BINARY, VARBINARY and BLOB are very much like CHAR, VARCHAR, and TEXT with COLLATE ..._bin. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin, but it will not be checked when storing into VARBINARY.... Comparisons (WHERE, ORDER BY, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.
May be you need this
_ci in a collation name=case insensitive
If your searches on that field are always going to be case-sensitive, then declare the collation of the field as utf8_bin... that'll compare for equality the utf8-encoded bytes.
col_name varchar(10) collate utf8_bin
If searches are normally case-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'demandé' collate utf8_bin
More here
Try this
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
-> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)

How do I insert chinese characters into MySQL from a sript?

I have the following script
set username utf8;
insert into tables values ('Active','活跃')
However after the script run, the inserted value for the chinese character is
活跃
What did I miss here ?
so that you have to change Collation to Chinese.
You can change that by here(below image)
CREATE TABLE big5 (BIG5 CHAR(1) CHARACTER SET BIG5);
mysql> INSERT INTO big5 VALUES (0xf9dc);
mysql> SELECT * FROM big5;
+------+
| big5 |
+------+
| 嫺 |
+------+
MySQL Big5 Chinese character set
Read this as well
In database,Also set collation of fields to utf8_general_ci. If you can set collation of databas and tables to utg8_general_ci as well, that will even be better.

How to search in mysql so that accented character is same as non-accented?

I'd like to have:
piščanec = piscanec in mysql. I mean, I'd like to search for piscanec to find piščanec also.
So the č and c would be same, š and s etc...
I know it can be done using regexp, but this is slow :-( Any other way with LIKE? I am also using full text searches a lot.
UPDATE:
select CONVERT('čšćžđ' USING ascii) as text
does not work. Produces: ?????
Declare the column with the collation utf8_generic_ci. This collation considers š equal to s and č equal to c:
create temporary table t (t varchar(100) collate utf8_general_ci);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
If you don't want to or can't use the utf8_generic_ci collation for the column--maybe you have a unique index on the column and want to consider piščanec and piscanec distinct?--you can use collation in the query only:
create temporary table t (t varchar(100) collate utf8_bin);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piscanec |
+------------+
select * from t where t='piscanec' collate utf8_general_ci;
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
The FULLTEXT index is supposed to use the column collation directly; you don't need to define a new collation. Apparently the fulltext index can only be in the column's storage collation, so if you want to use utf8_general_ci for searches and utf8_slovenian_ci for sorting, you have to use use collate in the order by:
select * from tab order by col collate utf8_slovenian_ci;
It's not straightforward, but you'll probably best off creating your own collation for your fulltrext searches. Here is an example:
http://dev.mysql.com/doc/refman/5.5/en/full-text-adding-collation.html
with more info here:
http://dev.mysql.com/doc/refman/5.5/en/adding-collation.html
That way, you have your collation logic completely independent of your SQL and business logic, and you're not having to do any heavy-lifting yourself with SQL-workarounds.
EDIT: since collations are used for all string-matching operations, this may not be the best way to go: you will end up obfuscating differences between characters that are linguistically discrete.
If you want to suppress these differences for specific operations, then you might consider writing a function that takes a string and replaces - in a targetted way - characters which, for the purposes of the current operation, are to be considered identical.
You could define one table holding your base characters (š, č etc.) and another holding the equivalences. Then run a REPLACE over your string.
Another way is just to CAST your string to ASCII, thereby suppressing all non-ASCII characters.
e.g.
SELECT CONVERT('<your text here>' USING ascii) as as_ascii

Like Case Sensitive in MySQL

I have a MySQL query:
SELECT concat_ws(title,description) as concatenated HAVING concatenated LIKE '%SearchTerm%';
And my table is encoded utf8_general_ci with MyISAM.
Searches seem to be case sensitive.
I can't figure out how to fix it. What's going wrong and/or how do I fix it?
A much better solution in terms of performance:
SELECT .... FROM .... WHERE `concatenated` LIKE BINARY '%SearchTerm%';
String comparision is case-sensitive when any of the operands is a binary string.
Another alternative is to use COLLATE,
SELECT ....
FROM ....
WHERE `concatenated` like '%SearchTerm%' COLLATE utf8_bin;
Try this:
SELECT LOWER(CONCAT_WS(title,description)) AS concatenated
WHERE concatenated LIKE '%searchterm%'
or (to let you see the difference)
SELECT LOWER(CONCAT_WS(title,description)) AS concatenated
WHERE concatenated LIKE LOWER('%SearchTerm%')
In this method, you do not have to select the searched field:
SELECT table.id
FROM table
WHERE LOWER(table.aTextField) LIKE LOWER('%SearchAnything%')
Check CHARSET mentioned in the table schema:
show create table xyz;
Based on CHARSET, you can try the following.
select name from xyz where name like '%Man%' COLLATE latin1_bin;
select name from xyz where name like '%Man%' COLLATE utf8_bin;
Following are the cases which worked for me, CHARSET=latin1, MySQL version = 5.6.
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'Promo%' collate latin1_bin limit 1;
+-----------------------+
| installsrc |
+-----------------------+
| PromoBalance_SMS,null |
+-----------------------+
1 row in set (0.01 sec)
mysql>
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'PROMO%' collate latin1_bin limit 1;
+---------------------------+
| installsrc |
+---------------------------+
| PROMO_SMS_MISSEDCALL,null |
+---------------------------+
1 row in set (0.00 sec)
mysql> select installsrc from appuser where installsrc IS NOT NULL and installsrc like 'PROMO%' limit 1;
+-----------------------+
| installsrc |
+-----------------------+
| PromoBalance_SMS,null |
+-----------------------+
1 row in set (0.01 sec)
Just for completion, in case it helps:
As stated on https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html, for default character sets, nonbinary string comparisons are case insensitive by default.
Therefore, an easy way to perform case-insensitive comparisons is to cast the field to CHAR, VARCHAR or TEXT type.
Here is an example with a check against a single field:
SELECT * FROM table1 WHERE CAST(`field1` AS CHAR) LIKE '%needle%';
This problem is occurring in this case because of the collation used in the table. You have used utf8_general_ci as collation. If the collation is changed to utf8_general_ci then the searches will not be case sensitive.
So, one possible solution is to change the collation.
This is the working code:
SELECT title,description
FROM (
SELECT title,description, LOWER(CONCAT_WS(title,description)) AS concatenated
FROM table1
) AS Q
WHERE concatenated LIKE LOWER('%search%')
This works also:
SELECT LOWER(DisplayName) as DN
FROM Bidders
WHERE OrgID=45
HAVING DN like "cbbautos%"
LIMIT 10;