I've been following this tutorial on how to setup a MySQL server/database for unicode, with the hopes of setting up the default character set to utf8mb4, and the collation to utf8mb4_unicode_ci
Just like what is specified in the tutorial, I have the following settings applied in my .ini file, located at C:\ProgramData\MySQL\MySQL Server 5.7 :
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
When running this query in MySql Workbench:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'
It has not been showing the system variables that I would expect, so initially I thought I had misconfigured the server:
MySql Workbench output
character_set_client utf8
character_set_connection utf8
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8
character_set_server utf8mb4
character_set_system utf8
collation_connection utf8_general_ci
collation_database utf8mb4_unicode_ci
collation_server utf8mb4_unicode_ci
However, when using the native MySQL command line client, I'm seeing what I would expect:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Why would MySql Workbench not respect the configuration settings, like the command line client does?
MySql: Windows/5.7.21
MySql Workbench: Windows/6.3.10
MySQL Workbench doesn't read the ini file and hence doesn't use any of the values in it. How can it, given that it can connect to many servers at the same time?
MySQL Workbench uses a fixed encoding for client and connection character sets (utf-8).
Until version 8.0.11 RC MySQL Workbench used the utf8mb3 charset actually. Starting with that version it switched to utf8mb4.
Related
I'm running into an issue where I'm getting differently ordered results when querying with PHP Versus the command line. From my research, it appears that in some cases that bad encoding can cause problems with the order of the results.
That said, all my DB tables are encoded as utf8mb4, with the collation utf8mb4_general_ci. However, it doesnt seem that the mysql variables are set correctly.
I'm on Mysql 5.5.5-10.1.26-MariaDb.
Here are my CNF settings, but to be honest I don't know what I'm doing here:
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mariadb]
[mysqld]
character-set-server=utf8mb4
character_set_client=utf8mb4
collation-server=utf8mb4_general_ci
The variables output from mysql:
character_set_client utf8
character_set_connection utf8
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8
character_set_server utf8mb4
character_set_system utf8
collation_connection utf8_general_ci
collation_database utf8mb4_unicode_ci
collation_server utf8mb4_general_ci
Update: A person has asked for how I'm connecting to the database:
$this->connection = new PDO('mysql:host='.DB_SERVER.';dbname='.DB_NAME.';port='.DB_PORT, DB_USER, DB_PASS, $options);
Update: I've switched to utf8mb4_unicode_ci (as per suggestions in answers below).
You want to have character-set-client-handshake = FALSE as well.
With /etc/my.cnf.d/character-set.cnf
# https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/
# https://mariadb.com/kb/en/library/setting-character-sets-and-collations/
# https://medium.com/#adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434
# https://stackoverflow.com/questions/47566730/force-mariadb-clients-to-use-utf8mb4
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
collation-server = utf8mb4_unicode_ci
init-connect = 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
character-set-server = utf8mb4
I get everything to be utf8mb41
MariaDB [(none)]> show variables like 'char%'; show variables like 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)
MariaDB [(none)]>
however without the character-set-client-handshake line some are still utf8
MariaDB [(none)]> show variables like 'char%'; show variables like 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.01 sec)
MariaDB [(none)]>
1 character_set_system is always utf8.
You should probably use utf8mb4_unicode_ci instead of utf8mb4_general_ci as it's more accurate. Unless you're running MariaDB on a system with an old/limited CPU and performance is a huge concern.
That being said, the solution is to set init_connect in your MariaDB configuration (or --init-connect on the command line):
init_connect = "SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci"
Either way is fine. I am not recommending one way over the other. Both are equally valid approaches.
Your MariaDB configuration may be in my.cnf or a file included by my.cnf, typically found under /etc/mysql. Check your system documentation for details. Because you are configuring a server variable, as indicated by the MariaDB documentation linked to above, you should set the variable in the server part of the configuration file. The server part of the configuration files is indicated by the INI section names ending in "d". An INI section is denoted by a keyword surrounded by square brackets, e.g. "[section]". The "d" stands for "daemon", which is standard UNIX nomenclature for a server process. You can set the variable in either the [mysqld] section or the [mariadb] section. Because the init_connect server variable is common to both MySQL and MariaDB, I would recommend you put it under [mysqld].
I see that you are setting character_set_client=utf8mb4 in your pasted configuration. You don't need to do this. You can delete or comment out the line. Comments are lines starting with pound symbol (#), also known as a hash mark, octothorp, or number sign.
Any and all clients that connect to the server will execute these command(s) before any other commands are processed.
init_connect is not performed by anyone connecting as root, so it is not as universal as you would like.
SET NAMES utf8mb4 sets 3 things; experiment to see that. You need all 3.
If you weren't as far back as 5.5, I would recommend utf8mb4_unicode_520_ci as being a better collation: "Unicode collation names now may include a version number to indicate the Unicode Collation Algorithm (UCA) version on which the collation is based. Initial collations thus created use version UCA 5.2.0. For example, utf8_unicode_520_ci is based on UCA 5.2.0. UCA-based Unicode collation names that do not include a version number are based on version 4.0.0."
Version 8.0 has Unicode 9.0 standard.
Back to the question: There is no perfect solution; the user can override whatever you do -- either through ignorance or through malice.
You could police the tables created, but that won't keep them from connecting incorrectly. Or correctly, but with a different charset. It is valid to do SET NAMES latin1, then provide latin1-encode bytes. MySQL will convert as it stores/fetches.
But if they have utf8-encoded bytes, but say SET NAMES latin1, you get "double encoding". This "bug" destroys any chance of collating correctly, but is otherwise (usually) transparent. That is, stuff is messed up as it is stored, then un-messed up as it is fetched.
To fix this warning you should edit
/etc/my.cnf (my.ini on Windows)
Simply add/set in the file
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
collation-server=utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server=utf8mb4
I'm losing my mind on this issue since yesterday.
I'm trying to convert my MySQL database from utf8 to utf8mb4. To do so, I followed those sites/threads : https://mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4 , Cannot store emoji in database , Change MySQL default character set to UTF-8 in my.cnf? , etc.
My database seems to have a utf8mb4_unicode_ci collation as expected, and all her tables too.
Nevertheless, when I proceed SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; I obtain :
#With a Root access
Variable_name Value
character_set_client utf8 #expecting utf8mb4
character_set_connection utf8 #expecting utf8mb4
character_set_database utf8mb4 #good
character_set_filesystem binary #good
character_set_results utf8 #expecting utf8mb4
character_set_server utf8mb4 #good
character_set_system utf8 #expecting utf8mb4
collation_connection utf8_general_ci #expecting utf8mb4_unicode_ci
collation_database utf8mb4_unicode_ci #good
collation_server utf8mb4_unicode_ci #good
#With a standard user access
Variable_name Value
character_set_client utf8 #expecting utf8mb4
character_set_connection utf8mb4 #good
character_set_database utf8mb4 #good
character_set_filesystem binary #good
character_set_results utf8 #expecting utf8mb4
character_set_server utf8mb4 #good
character_set_system utf8 #expecting utf8mb4
collation_connection utf8mb4_unicode_ci #good
collation_database utf8mb4_unicode_ci #good
collation_server utf8mb4_unicode_ci #good
I set a /etc/mysql/conf.d/90-my.cnf file like this :
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect = 'SET collation_connection = utf8mb4_unicode_ci'
init-connect = 'SET NAMES utf8mb4'
My MySQL version is 5.5.54, all set in a debian 7.
Does anyone have a clue to help me ?
Thx for your help, and sorry for my bad english...
EDIT
Fun fact : when I check variables in the in-line command, I got this :
mysql> show variables like "%character%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
While I still have utf8 for character_set_client and character_set_results when I check with phpmyadmin (with a non-superuser access)
When connecting as user root, init-connect is ignored.
Give your application its own login without SUPER privilege.
Then, to make extra sure, whenever establishing a connection from your app, do the language-specific method of providing the character set. (Or execute SET NAMES utf8mb4.) What app language are you using?
In case something else is going wrong, see "Best Practice" in Trouble with utf8 characters; what I see is not what I stored
Trying to insert a simple 'é' into a table on mysql (v 5.6) terminal windows server 2008, I get Incorrect string value: '\x82' for column 'colum_name'
I've been searching on stack overflow for a day now. I think I am going crazy. All my collations are utf8mb4:
/*column*/
SHOW FULL COLUMNS FROM table_name;
utf8mb4_unicode_ci
/*database*/
show variables like "character_set_database";
utf8mb4
/*table*/
SHOW TABLE STATUS where name like 'table_name';
utf8mb4_unicode_ci
/*variables*/
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Here's what I added to my my.ini
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
collation-server = utf8mb4_unicode_ci
character-set-server = utf8mb4
init-connect='SET NAMES utf8'
I am stuck
I get Incorrect string value: '\x82' for column 'colum_name'
Please explain. Show us the query. And SHOW CREATE TABLE
x82, in the popular latin1 character set is a variant of comma: ‚. It is unrelated to e-acute (HEX: latin1: E9, utf8: C3A9).
As many others, I'm having some problems with mysql charset. As many others, I want everything to be UTF-8, but mysql was installed with latin-1, and no matter how I try/google/experiment with mysql config there is still latin-1 lurking in client settings.
Ok, here is the setup. I have a (non-root) mysql user 'usr' with a password 'pwd'. Whenever I access mysql via terminal (mysql -uusr -p) and then ask him nicely about his charsets, he tell that he is in love with utf8 (as he ought to be):
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
However, if I use PHP to access mysql (via the very same user):
$mysql_link=mysql_connect('localhost','usr','pwd');
$result1=mysql_query("show variables like 'character%'");
$result2=mysql_query("show variables like 'collation%'");
mysql_close($mysql_link)
And print_r $result1, $result 2, it magically falls back to latin-1:
character_set_client => latin1
character_set_connection => utf8
character_set_database => utf8
character_set_filesystem => binary
character_set_results => latin1
character_set_server => utf8
character_set_system => utf8
character_sets_dir => /usr/share/mysql/charsets/
collation_connection => utf8_unicode_ci
collation_database => utf8_general_ci
collation_server => utf8_unicode_ci
This happens regardless whether I invoke php via browser (as php-cgi) or via terminal (as php-cli).
Kinda fix for that is to set charset manually at each connection:
mysql_set_charset('utf8',$mysql_link);
That works. But I feel like there should be a way to do that via mysql config.
For reference, Mysql config (my.cfg) includes:
[client]
default_character_set = utf8
[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
character-set-server = utf8
collation-server = utf8_unicode_ci
And PHP config (php.ini) includes
default_charset = "utf-8"
Thank forward! =)
P.S. I know that mysql_ functions are deprecated and should be replaced with mysqli_ ones. But hopefully that doesn't have anything to do with this exact problem =)
If you're like most people, you use the root account to get to MySQL. This little snippet from the docs might be your smoking gun.
It is still necessary for applications to configure their connection using SET NAMES or equivalent after they connect, as described previously. You might be tempted to start the server with the --init_connect="SET NAMES 'utf8'" option to cause SET NAMES to be executed automatically for each client that connects. However, this will yield inconsistent results because the init_connect value is not executed for users who have the SUPER privilege.
I am installing MySQL server on FreeBSD. My current port version is 5.5.15 on FreeBSD 7.2.
I am wondering how to get it installed with different than latin1 default charset and collation.
Currently when I install it with default Makefile I get this:
| character_set_client | latin1
| character_set_connection | latin1
| character_set_database | latin1
| character_set_filesystem | binary
| character_set_results | latin1
| character_set_server | latin1
| character_set_system | latin1
| character_sets_dir | /usr/local/share/mysql/charsets/
| collation_connection | latin1_swedish_ci
| collation_database | latin1_swedish_ci
| collation_server | latin1_swedish_ci
I can't understand why latin1 is the default charset in the first place, but well, probably not the best place to discuss it.
Anyway... I'd like to change the default charsets to utf8 and collation to utf8_unicode_ci.
I tried changing Makefile and added following lines to CMAKE_ARGS:
-DWITH_CHARSET="utf8" \
-DWITH_COLLATION="utf8_unicode_ci"
All that got changed was character_set_system to utf8.
How do I change all of those? Could be a compilation param (preferrably) or my.cnf setting.
Will appreciate any help.
Go ahead and install with the wrong defaults, and later change the settings when creating a /etc/my.cnf file.
[mysqld]
collation_server = utf8_general_ci
character_set_server = utf8
This website below explains is quite well.
http://rentzsch.tumblr.com/post/9133498042/howto-use-utf-8-throughout-your-web-stack