MySQL change database + tables charset & collation from UTF8 to UTF8mb4 - mysql

I currently have a MySQL database with the following settings:
character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8
character_set_server: latin1
character_set_system: utf8
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: latin1_swedish_ci
I want to support emoji's and other languages (like Chinese) in the database. Currently this is not working, those characters are automatically converted to a ?.
I created a test database with charset & collation utf8mb4(_general_ci) and a table with the same settings. Emojis work here. However, when I change the database settings to utf8(_general_ci) and leave the table as utf8mb4(_general_ci), emojis are still working, while this is not the case with my main database.
If I change my main database settings to charset + collation utf8mb4(_general_ci), and the tables as well, would that work?
And for database-access, will anything else have to be changed, such as character_set_connection or collation_connection?
I know on my JavaScript server, the connection is configured as utf8, I assume this has to be utf8mb4.
All current utf8(_general_ci) data, will that be kept intact when changing to utf8mb4(_general_ci)?

Correctly stored utf8 characters will convert correctly to utf8mb4.
You should also specify that the connections are utf8mb4.
See this for discussion of 'question mark'.
To convert all the char/text columns to utf8mb4:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
To convert one column:
ALTER TABLE tbl MODIFY COLUMN col ... CHARACTER SET utf8mb4;

Related

How to support emoji in Azure mysql database

How to support emoji in Azure mysql database?
I tried folowing steps. but emoji is not getting inserted into database. It works when I run "SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;" for a session only. I need to set it globally so that java application can insert emoji character to azure mysql database.
SET NAMES utf8mb4; ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
When you create a database on MySQL Database on Azure, the database adopts the UTF8 character set by default unless otherwise specified. Because the UTF8 character set on MySQL supports at most 3-byte encoding, the emoji icons that are encoded by 4 bytes cannot be inserted into the table.
You will have to edit character set, and select utf8mb4 as the character set for the database.

Mysql server's encoding differs from the client's (latin1 vs utf8mb4). How bad is it?

I'm designing a php web-app and have some difficulties understanding a meaning of Mysql variables related to encoding and how they interact between each other. The encoding of the server is set to latin1 but the client's is utf8mb4.
Running the mysql query inside a database
SHOW VARIABLES
WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'
gives the following:
character_set_client = utf8mb4
character_set_connection = utf8mb4
character_set_database = latin1
character_set_filesystem = binary
character_set_results = utf8mb4
character_set_server = latin1
character_set_system = utf8
collation_connection = utf8mb4_unicode_ci
collation_database = latin1_swedish_ci
collation_server = latin1_swedish_ci
I'm afraid running into issues with the older databases which are in latin1 if I change the character set of the mysql server to utf8mb4, but I certainly want to use utf8mb4 for the new databases I create. To correctly serve and retrieve data from the database should server's and client's encoding and collation always be the same? Any insight would be appreciated?
Some of those VARIABLES must agree with what encoding is used in the client.
CREATE TABLE ... specifies how they are to be stored in the tables.
If those two differ, then MySQL will convert "on the wire" between the client encoding an the table encoding.
If that means converting, say, Korean characters (encoding in utf8 or utf8mb4) to latin1 encoding, it will not be possible. On the other hand, all accented letters in Western Europe have encodings in both latin1 and utf8, so there is no problem.
Read this for common screwups:
Trouble with UTF-8 characters; what I see is not what I stored
See ALTER TABLE .. CONVERT TO .. for converting all character columns in one table to a different encoding (assuming it was correctly stored to begin with).

Html Text-area: problems with accented letters

When in in a text-area I write words with acceted letters ....the application store the words in mysql with some errors
E.g. if i write può in my sql I have può
How can i solve it?
To change an existing table to use the UTF-8 charset:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
To set the default charset of the database to UTF8 for tables you will create in the future:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
You can use either utf8_general_ci or utf8_unicode_ci. It is explained at What's the difference between utf8_general_ci and utf8_unicode_ci that there is a difference between them in the speed and accuracy of the sorting, with utf8_unicode_ci being more accurate and the performance gain of using utf8_general_ci being very minimal.
(Also, be aware, when you are doing queries in the mysql console in the command prompt, it will not display as UTF-8 even when it is stored properly. Its a limitation of the command prompt.)

MySQL UTF8 Data Not Being Displayed Properly

I want all the data in MySQL to be UTF8 encoded. I've set all the character sets and collations to be UTF8 for the database, tables and columns. Before anything is written to the database, I use mb_detect_encoding in PHP to check if it is UTF8. Thus, I believe all the data is UTF8 encoded.
However, here is the problem: take this word Ríkarðsdóttir, it shows up correctly when queried from the database and displayed through PHP on a UTF8 encoded webpage. If I query this same record through phpMyAdmin, I get Ríkarðsdóttir. The same is true if I use the MySQL command line.
Running SHOW VARIABLES returns to me:
character_set_client utf8,
character_set_connection utf8,
character_set_database utf8,
character_set_filesystem binary,
character_set_results utf8,
character_set_server latin1,
character_set_system utf8
Only the server is latin1, and I am on a shared hosting site and don't believe I can change that. Could that be the problem?
Here is what I do not understand: why does my UTF8 webpage correctly display Ríkarðsdóttir, but a UTF8 encoded phpMyAdmin webpage display it as Ríkarðsdóttir? Is the data not truly UTF8 encoded or does the database not believe it is? What needs to be done to correct this?
Try running this query right after you connect:
SET NAMES UTF8
Your database needs to store the data as UTF8, and your web page header should also have a UTF8 declaration, but your connection to the database also needs to use UTF8. You can run that on the command line and/or through PHPMyAdmin. All communication after that "query" will then be UTF8 encoded.

How to store non-english characters?

Non-english characters are messed up in a text column. Arabic text looks like this:
نـجـم سـهـيـل
How to store non-english characters correctly?
You should consider using utf8 to store your text.
You can do this at the database creation:
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
You can also configure mysql at installation or at startup to use utf8 (see Mysql manual)
The mysql manual pages cover all aspects of characterset and collations: http://dev.mysql.com/doc/refman/5.0/en/charset.html
The character set of the connection can be changed by
SET CHARACTER SET utf8
More details here and in the chapter Character set support
What OS are you using?
If Linux then it's good to have a system locale set to utf8 also, like "en_US.utf8".
And, to be sure, issue an "SET NAMES UTF8" command to mysql just after connection.
(db character set/collation must also be utf8)
The query below solved the issue.
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;