Charset and collation priority in MySQL - mysql

In MySQL I can configure the charset and collation at both Server, Database and Table level.
Is this the same level of priority? From the least to the most specific?
I didn't manage to find it in the DOCS.
e.g.:
SERVER level
SHOW VARIABLES LIKE "char%";
RESULTS IN:
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
DB level
SELECT * FROM information_schema.SCHEMATA;
RESULTS IN:
name character_set_name collation_name
my_database latin1 latin1_swedish_ci
TABLE level
SELECT T.table_name, CCSA.character_set_name
FROM information_schema.`TABLES` T,
information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
AND T.table_schema = "my_database";
RESULTS IN:
table_name character_set_name
my_table1 latin1
my_table2 utf8

The Database setting is the default when creating a table.
The Table setting is the default when creating a column.
The Column setting is the only thing that matters.
Well, that is not completely true. You need to specify what encoding (CHARACTER SET) and collation is used by the client. That's where character_set_client/connection/results comes into play. Those 3 are normally set the same as each other, but they do not need to match the column's CHARACTER SET.
If the column does not match those settings, the MySQL will transcode the bytes on the fly as they go between client and server. Note that this lets you have different charsets for different columns in the same table.
The previous paragraph says nothing about the Table and Database settings -- because they are irrelevant. (Because they are only defaults.) Once a table has been CREATEd, each column's charset has been 'set in stone'.

You'll find the detail of the different levels (server,database, table and column) in this section of the manual 10.3 Specifying Character Sets and Collations and subsequent pages. At the database, table and column levels it states that the character set and collation from the previous level is used if you don't explicitly set it, which would equate to your notion of the least to the most specific.

Related

MySQL Returning 'ずる' When Searching for 'する' (Japanese)

I have a database with Japanese words. I'm stumped because this query:
SELECT japanese
FROM my_table
where japanese = 'する'
returns two results:
ずる
する
I tried to look in the documentation but can't figure out what's going on or how to correct it. Here's some of the information about my setup using the queries recommended in the documentation:
SELECT CHARACTER_SET_NAME, DESCRIPTION
FROM INFORMATION_SCHEMA.CHARACTER_SETS
WHERE DESCRIPTION LIKE '%Japanese%'
ORDER BY CHARACTER_SET_NAME;
Returns:
'CHARACTER_SET_NAME','DESCRIPTION'
'cp932', 'SJIS for Windows Japanese'
'eucjpms', 'UJIS for Windows Japanese'
'sjis', 'Shift-JIS Japanese'
'ujis', 'EUC-JP Japanese'
And the following query:
SHOW VARIABLES LIKE 'char%'
Returns:
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8
And this really gets beyond my skill set. If someone can point me in the right direction that would be a big help.
Thank you.
You should use accent sensitive collation:
SELECT japanese
FROM my_table
where japanese = 'する' COLLATE utf8mb4_ja_0900_as_cs;
-- alternatively binary collation: COLLATE utf8mb4_bin
db<>fiddle demo
Introducing as_cs collations:
In MySQL 8.0.0 we improved our character set support with the addition of new accent and case insensitive (ai_ci) collations. In MySQL 8.0.1 the corresponding accent and case sensitive collations (as_cs) have also been added, as well as a Japanese collation

Can I convert user input language to default collation of database?

I want to search user input in my database. database collation is latin1_swedish_ci. I don't want to change that, instead can I change user input utf-8 to latin1_swedish_ci?
Edit:
I approach two methods.
Method 1: I imported and used default collation latin1_swedish_ci and character set latin1. Then I have
Here I can query like SELECT * FROM dict WHERE english_word = '$_value' and I get all the values of column including malayalam_definition in the browser as desired. But problem is I can't query like SELECT * FROM dict WHERE malayalam_definition = '$_value'. It returns no result.
Method 2: I changed collation to utf8_unicode_ci and character set to utf8. Then in mysql I get desired values like
Here I when I query like SELECT * FROM dict WHERE english_word = '$_value' in browser I get question marks in malayalam_definition values like
Result of SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
Do I need to change character_set_server, then how to do it?
First of all, the "database collation" is only a default. The real question is what is the CHARACTER SET of the columns that you are interested in.
Then, what are the bytes in your client? Are they encoded as latin1? Or utf8? In either case, tell MySQL that that is what is coming at it. This is preferably done in the connection parameters. (What is your client language?) Alternatively, use SET NAMES latin1 or SET NAMES utf8, according to the client encoding.
Now, what MySQL will do on INSERT and SELECT... It will convert the encoding from the client's encoding to the column's encoding as you do an INSERT. No further action is needed to achieve this.
Similarly, MySQL will convert the other way during a SELECT.
(Of course, if the column and the client are talking the same encoding, no "convert" is needed.)
Your question mentions "collation". So far, I have only talked about CHARACTER SETs, also known as "encoding". Contrast with that, the sorting and comparing of two strings -- this is COLLATION.
For the CHARACTER SET latin1, the default COLLATION is latin1_swedish_ci.
For the CHARACTER SET utf8, the default COLLATION is utf8_general_ci.
There are several different "collations" to handle the quirks of German or Turkish or Spanish or (etc) orderings.
Please explain why you are trying to do what you stated. There are many ways you can do it wrong, so I do not want to give you an ALTER statement -- it may just make things worse for the real goal.
It is better to use utf8mb4 instead of utf8. The outside world refers to UTF-8; this is equivalent to MySQL's utf8mb4.
Edit (after OP's Edit)
The first screenshot shows "Mojibake". Another screenshot shows question marks. The causes of each are covered in Trouble with UTF-8 characters; what I see is not what I stored

unicode strings: difference between Habo and Håbo

Now I work with Swedish geography data
In Sweden there are two different places: Habo and Håbo
If run query like SELECT * FROM g2_se_raw_zip WHERE province EQUALS 'Håbo' or SELECT * FROM g2_se_raw_zip WHERE province='Håbo' it gives me Habo too.
I have same issues with GROUP BY and other queries
Why it works like this and how to fix it?
Additional info:
character_set_client utf8,
character_set_connection utf8,
character_set_database utf8,
character_set_filesystem binary,
character_set_results utf8,
character_set_server utf8,
character_set_system utf8,
This is a collation issue. The Swedish dictionary treats o-ring and o as distinct letters of the alphabet, whereas the international collation treats them as different variants of the same letter.
These queries should do the trick for you.
SELECT *
FROM g2_se_raw_zip
WHERE province COLLATE utf8_swedish_ci EQUALS 'Håbo'
SELECT *
FROM g2_se_raw_zip
WHERE province COLLATE utf8_swedish_ci = 'Håbo'
You may wish to change the collation setting of columns in your database containing Swedish place names to the Swedish collation for the sake of index performance. But, if you're developing a pan-European application you may prefer to ask users to tell you their own national language in their user profiles so you can search in a way that meets their expectations.

MySQL view - Illegal mix of collations

I'll be very clear: What's the solution for create views in MySQL without have the damned Illegal mix of collations error.
My SQL code is like this (it has some portuguese words), and my database default collation is latin1_swedish_ci:
CREATE VIEW v_veiculos AS
SELECT
v.id,
v.marca_id,
v.modelo,
v.placa,
v.cor,
CASE v.combustivel
WHEN 'A' THEN 'Álcool'
WHEN 'O' THEN 'Óleo Diesel'
WHEN 'G' THEN 'Gasolina'
ELSE 'Não Informado'
END AS combustivel,
marcas.marca,
/*I think that the CONCAT and COALESCE below causes this error, when the next line the view works fine*/
CONCAT(marca, ' ', v.modelo, ' - Placa: ', v.placa, ' - Combustível: ', COALESCE(v.combustivel, 'Não informado')) AS info_completa
FROM veiculos v
LEFT JOIN
marcas on(marcas.id = v.marca_id);
I think that the error cause is because I'm using coalesce and/or concat as the full error's description tells me: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'coalesce'
You may also use CAST() to convert a string to a different character set. The syntax is:
CAST(character_string AS character_data_type CHARACTER SET charset_name)
eg:
SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8);
alternative : use CONVERT(expr USING transcoding_name)
This is kind of old, but well
I had this same error,
As far as I know the Views does not have a collation, the tables does.
So, if you get the "illegal mix..." is because your view is linking (comparing, whatever) 2 tables with different collation
The thing is, if you create a table you can specify the collation, for instance
CREATE TABLE IF NOT EXISTS `vwHotelCode_Terminal` (
`HOTELCODE` varchar(8)
,`TERMINALCODE` varchar(5)
,`DISTKM` varchar(6)
,`DISTMIN` varchar(3)
,`TERMINALNAME` varchar(50)
)ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci ;
But if you don't, the default collation will be applied. So for me the defaul collation is utf8_unicode_ci so my tables will be created with this collation and I ended having some tables with utf8_spanish_ci and the ones I did not specify with utf8_unicode_ci
If you are exporting from one server to another one and the default collation is different, you are probably going to get the "illegal mix" message.
if you have views, phpmyadmin likes to create the tables of all the views and then the views. The tables are created without the collation so it takes the default one. Then, many times, when the view is created uses different collations.
That is actually a bug in MySQL.
Maybe you can update to the latest version of MySQL?
After searching around for a while and taking information from this answer, I found a hack that could be useful.
Simply check the default character set system default_character_set of your database with the below command:
SHOW VARIABLES LIKE "char%";
You'll see something like this:
mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 | <--
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I just set the character_set_system which is nothing but default system character set. Copied the create code of the view and created a new view and that's all.
What happens here is the new view that you will create will use the new default character set that you defined for the system. Hence resolving the issue.
Just use below command to set the default character set
SET character_set_server = 'latin2';
This worked in my case.
NOTE: Alternatively you can change the character set of that view. That would also do the trick but I wasn't able to find the solution so I used this hack.
REFERENCE: Read more on Illegal Collation Mix on MariaDB.
A CITATION FROM Illegal Collation Mix on MariaDB:
If you encounter this issue, set the character set in the view to force it to the value you want.
Read more about Collation and Character Sets here.

character problem in sql

select * from table where key='çmyk'
when i run this query on table that have row which's value is 'cmyk'.
the query returns me that row. but values are different. when i search 'çmyk' it returns 'cmyk'.
so what can i do?
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci
table collation: latin1_swedish_ci
The problem is that the latin1_swedish_ci collation is not only case insensitive, it is umlaut insensitive as well, so the following applies:
Ä = A
Ö = O
etc.
switching to a case sensitive collation in the WHERE clause should work, like so:
select * from table where key='çmyk' collate latin1_general_cs;
with the caveat that this is not good for performance.
mySQL Reference: 9.1.7.8. Examples of the Effect of Collation
Try running the command SET NAMES latin1; and then running your query.