MySQL Returning 'ずる' When Searching for 'する' (Japanese)

MySQL Returning 'ずる' When Searching for 'する' (Japanese) - mysql

I have a database with Japanese words. I'm stumped because this query:
SELECT japanese
FROM my_table
where japanese = 'する'
returns two results:
ずる
する
I tried to look in the documentation but can't figure out what's going on or how to correct it. Here's some of the information about my setup using the queries recommended in the documentation:
SELECT CHARACTER_SET_NAME, DESCRIPTION
FROM INFORMATION_SCHEMA.CHARACTER_SETS
WHERE DESCRIPTION LIKE '%Japanese%'
ORDER BY CHARACTER_SET_NAME;
Returns:
'CHARACTER_SET_NAME','DESCRIPTION'
'cp932', 'SJIS for Windows Japanese'
'eucjpms', 'UJIS for Windows Japanese'
'sjis', 'Shift-JIS Japanese'
'ujis', 'EUC-JP Japanese'
And the following query:
SHOW VARIABLES LIKE 'char%'
Returns:
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8
And this really gets beyond my skill set. If someone can point me in the right direction that would be a big help.
Thank you.

You should use accent sensitive collation:
SELECT japanese
FROM my_table
where japanese = 'する' COLLATE utf8mb4_ja_0900_as_cs;
-- alternatively binary collation: COLLATE utf8mb4_bin
db<>fiddle demo
Introducing as_cs collations:
In MySQL 8.0.0 we improved our character set support with the addition of new accent and case insensitive (ai_ci) collations. In MySQL 8.0.1 the corresponding accent and case sensitive collations (as_cs) have also been added, as well as a Japanese collation

Related

Charset and collation priority in MySQL

In MySQL I can configure the charset and collation at both Server, Database and Table level.
Is this the same level of priority? From the least to the most specific?
I didn't manage to find it in the DOCS.
e.g.:
SERVER level
SHOW VARIABLES LIKE "char%";
RESULTS IN:
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
DB level
SELECT * FROM information_schema.SCHEMATA;
RESULTS IN:
name character_set_name collation_name
my_database latin1 latin1_swedish_ci
TABLE level
SELECT T.table_name, CCSA.character_set_name
FROM information_schema.`TABLES` T,
information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
AND T.table_schema = "my_database";
RESULTS IN:
table_name character_set_name
my_table1 latin1
my_table2 utf8

The Database setting is the default when creating a table.
The Table setting is the default when creating a column.
The Column setting is the only thing that matters.
Well, that is not completely true. You need to specify what encoding (CHARACTER SET) and collation is used by the client. That's where character_set_client/connection/results comes into play. Those 3 are normally set the same as each other, but they do not need to match the column's CHARACTER SET.
If the column does not match those settings, the MySQL will transcode the bytes on the fly as they go between client and server. Note that this lets you have different charsets for different columns in the same table.
The previous paragraph says nothing about the Table and Database settings -- because they are irrelevant. (Because they are only defaults.) Once a table has been CREATEd, each column's charset has been 'set in stone'.

You'll find the detail of the different levels (server,database, table and column) in this section of the manual 10.3 Specifying Character Sets and Collations and subsequent pages. At the database, table and column levels it states that the character set and collation from the previous level is used if you don't explicitly set it, which would equate to your notion of the least to the most specific.

Search result is case sensitive for German umlauts with utf8_general_ci collation and utf8 character-set-server

In MySQL 5.5,Search result is case sensitive for German umlauts with utf8_general_ci collation and utf8 character-set-server.I have 2 values in my Database table say öder and Öder.When I run a query to search for the strings using the keyword 'Öder',only Öder is retrieved as the search result and the value starting with ö i.e 'örder' is not retrieved.If I change the character-set-server to latin1 in my.cnf file of MySQL server and collation to latin1_general_ci(Same charset and collation change to latin1 I did for the corresponding Database and table also),it works fine i.e both values are retrieved as search result.But is there a way to achieve case-insensitve search for German umlauts using utf8 as the character-set-server?Facing this issue for other words starting with and containing othe german umlauts like ü,Ü,ä,Ä as well.

Use the mysql commandline tool to test things:
mysql> SELECT 'Ö' = 'ö' COLLATE utf8_general_ci;
+-------------------------------------+
| 'Ö' = 'ö' COLLATE utf8_general_ci |
+-------------------------------------+
| 1 |
+-------------------------------------+
1 means true; 0 means false.
Any of the utf8_..._ci collations will treat a bunch of accented / not-accented, uppercase / lowercase characters as equal. For example, utf8mb4_german2_ci says these are equal: O=o=º=Ò=Ó=Ô=Õ=ò=ó=ô=õ and sort before oe=Ö=ö=Œ=œ.
Here is a rundown on where most letters stand in most utf8 collations.
For most usage, _general_ci is not as good as _unicode_ci, which is not as good as _unicode_520_ci, which will be superseded by a Unicode 9.0 standard in MySQL 8.0.
I think german2 is aimed at "phone book" collation. But maybe not. In 8.0, utf8mb4_german2_ci collates D=d=Ď=ď < Dž=dz=dž < Ð=ð. But utf8mb4_de_pb_0900_ai_ci has D=d=Ð=ð=Ď=ď < Dž=dz=dž. More details here .

unicode strings: difference between Habo and Håbo

Now I work with Swedish geography data
In Sweden there are two different places: Habo and Håbo
If run query like SELECT * FROM g2_se_raw_zip WHERE province EQUALS 'Håbo' or SELECT * FROM g2_se_raw_zip WHERE province='Håbo' it gives me Habo too.
I have same issues with GROUP BY and other queries
Why it works like this and how to fix it?
Additional info:
character_set_client utf8,
character_set_connection utf8,
character_set_database utf8,
character_set_filesystem binary,
character_set_results utf8,
character_set_server utf8,
character_set_system utf8,

This is a collation issue. The Swedish dictionary treats o-ring and o as distinct letters of the alphabet, whereas the international collation treats them as different variants of the same letter.
These queries should do the trick for you.
SELECT *
FROM g2_se_raw_zip
WHERE province COLLATE utf8_swedish_ci EQUALS 'Håbo'
SELECT *
FROM g2_se_raw_zip
WHERE province COLLATE utf8_swedish_ci = 'Håbo'
You may wish to change the collation setting of columns in your database containing Swedish place names to the Swedish collation for the sake of index performance. But, if you're developing a pan-European application you may prefer to ask users to tell you their own national language in their user profiles so you can search in a way that meets their expectations.

MySQL view - Illegal mix of collations

I'll be very clear: What's the solution for create views in MySQL without have the damned Illegal mix of collations error.
My SQL code is like this (it has some portuguese words), and my database default collation is latin1_swedish_ci:
CREATE VIEW v_veiculos AS
SELECT
v.id,
v.marca_id,
v.modelo,
v.placa,
v.cor,
CASE v.combustivel
WHEN 'A' THEN 'Álcool'
WHEN 'O' THEN 'Óleo Diesel'
WHEN 'G' THEN 'Gasolina'
ELSE 'Não Informado'
END AS combustivel,
marcas.marca,
/*I think that the CONCAT and COALESCE below causes this error, when the next line the view works fine*/
CONCAT(marca, ' ', v.modelo, ' - Placa: ', v.placa, ' - Combustível: ', COALESCE(v.combustivel, 'Não informado')) AS info_completa
FROM veiculos v
LEFT JOIN
marcas on(marcas.id = v.marca_id);
I think that the error cause is because I'm using coalesce and/or concat as the full error's description tells me: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'coalesce'

You may also use CAST() to convert a string to a different character set. The syntax is:
CAST(character_string AS character_data_type CHARACTER SET charset_name)
eg:
SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8);
alternative : use CONVERT(expr USING transcoding_name)

This is kind of old, but well
I had this same error,
As far as I know the Views does not have a collation, the tables does.
So, if you get the "illegal mix..." is because your view is linking (comparing, whatever) 2 tables with different collation
The thing is, if you create a table you can specify the collation, for instance
CREATE TABLE IF NOT EXISTS `vwHotelCode_Terminal` (
`HOTELCODE` varchar(8)
,`TERMINALCODE` varchar(5)
,`DISTKM` varchar(6)
,`DISTMIN` varchar(3)
,`TERMINALNAME` varchar(50)
)ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci ;
But if you don't, the default collation will be applied. So for me the defaul collation is utf8_unicode_ci so my tables will be created with this collation and I ended having some tables with utf8_spanish_ci and the ones I did not specify with utf8_unicode_ci
If you are exporting from one server to another one and the default collation is different, you are probably going to get the "illegal mix" message.
if you have views, phpmyadmin likes to create the tables of all the views and then the views. The tables are created without the collation so it takes the default one. Then, many times, when the view is created uses different collations.

That is actually a bug in MySQL.
Maybe you can update to the latest version of MySQL?

After searching around for a while and taking information from this answer, I found a hack that could be useful.
Simply check the default character set system default_character_set of your database with the below command:
SHOW VARIABLES LIKE "char%";
You'll see something like this:
mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 | <--
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I just set the character_set_system which is nothing but default system character set. Copied the create code of the view and created a new view and that's all.
What happens here is the new view that you will create will use the new default character set that you defined for the system. Hence resolving the issue.
Just use below command to set the default character set
SET character_set_server = 'latin2';
This worked in my case.
NOTE: Alternatively you can change the character set of that view. That would also do the trick but I wasn't able to find the solution so I used this hack.
REFERENCE: Read more on Illegal Collation Mix on MariaDB.
A CITATION FROM Illegal Collation Mix on MariaDB:
If you encounter this issue, set the character set in the view to force it to the value you want.
Read more about Collation and Character Sets here.

character problem in sql

select * from table where key='çmyk'
when i run this query on table that have row which's value is 'cmyk'.
the query returns me that row. but values are different. when i search 'çmyk' it returns 'cmyk'.
so what can i do?
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci
table collation: latin1_swedish_ci

The problem is that the latin1_swedish_ci collation is not only case insensitive, it is umlaut insensitive as well, so the following applies:
Ä = A
Ö = O
etc.
switching to a case sensitive collation in the WHERE clause should work, like so:
select * from table where key='çmyk' collate latin1_general_cs;
with the caveat that this is not good for performance.
mySQL Reference: 9.1.7.8. Examples of the Effect of Collation

Try running the command SET NAMES latin1; and then running your query.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Returning 'ずる' When Searching for 'する' (Japanese) - mysql

Related

Charset and collation priority in MySQL

Search result is case sensitive for German umlauts with utf8_general_ci collation and utf8 character-set-server

unicode strings: difference between Habo and Håbo

MySQL view - Illegal mix of collations

character problem in sql

Categories

Resources