MySQL UTF8 Data Not Being Displayed Properly - mysql

I want all the data in MySQL to be UTF8 encoded. I've set all the character sets and collations to be UTF8 for the database, tables and columns. Before anything is written to the database, I use mb_detect_encoding in PHP to check if it is UTF8. Thus, I believe all the data is UTF8 encoded.
However, here is the problem: take this word Ríkarðsdóttir, it shows up correctly when queried from the database and displayed through PHP on a UTF8 encoded webpage. If I query this same record through phpMyAdmin, I get Ríkarðsdóttir. The same is true if I use the MySQL command line.
Running SHOW VARIABLES returns to me:
character_set_client utf8,
character_set_connection utf8,
character_set_database utf8,
character_set_filesystem binary,
character_set_results utf8,
character_set_server latin1,
character_set_system utf8
Only the server is latin1, and I am on a shared hosting site and don't believe I can change that. Could that be the problem?
Here is what I do not understand: why does my UTF8 webpage correctly display Ríkarðsdóttir, but a UTF8 encoded phpMyAdmin webpage display it as Ríkarðsdóttir? Is the data not truly UTF8 encoded or does the database not believe it is? What needs to be done to correct this?

Try running this query right after you connect:
SET NAMES UTF8
Your database needs to store the data as UTF8, and your web page header should also have a UTF8 declaration, but your connection to the database also needs to use UTF8. You can run that on the command line and/or through PHPMyAdmin. All communication after that "query" will then be UTF8 encoded.

Related

Mysql server's encoding differs from the client's (latin1 vs utf8mb4). How bad is it?

I'm designing a php web-app and have some difficulties understanding a meaning of Mysql variables related to encoding and how they interact between each other. The encoding of the server is set to latin1 but the client's is utf8mb4.
Running the mysql query inside a database
SHOW VARIABLES
WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'
gives the following:
character_set_client = utf8mb4
character_set_connection = utf8mb4
character_set_database = latin1
character_set_filesystem = binary
character_set_results = utf8mb4
character_set_server = latin1
character_set_system = utf8
collation_connection = utf8mb4_unicode_ci
collation_database = latin1_swedish_ci
collation_server = latin1_swedish_ci
I'm afraid running into issues with the older databases which are in latin1 if I change the character set of the mysql server to utf8mb4, but I certainly want to use utf8mb4 for the new databases I create. To correctly serve and retrieve data from the database should server's and client's encoding and collation always be the same? Any insight would be appreciated?
Some of those VARIABLES must agree with what encoding is used in the client.
CREATE TABLE ... specifies how they are to be stored in the tables.
If those two differ, then MySQL will convert "on the wire" between the client encoding an the table encoding.
If that means converting, say, Korean characters (encoding in utf8 or utf8mb4) to latin1 encoding, it will not be possible. On the other hand, all accented letters in Western Europe have encodings in both latin1 and utf8, so there is no problem.
Read this for common screwups:
Trouble with UTF-8 characters; what I see is not what I stored
See ALTER TABLE .. CONVERT TO .. for converting all character columns in one table to a different encoding (assuming it was correctly stored to begin with).

MySQL change database + tables charset & collation from UTF8 to UTF8mb4

I currently have a MySQL database with the following settings:
character_set_client: utf8
character_set_connection: utf8
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8
character_set_server: latin1
character_set_system: utf8
collation_connection: utf8_general_ci
collation_database: utf8_general_ci
collation_server: latin1_swedish_ci
I want to support emoji's and other languages (like Chinese) in the database. Currently this is not working, those characters are automatically converted to a ?.
I created a test database with charset & collation utf8mb4(_general_ci) and a table with the same settings. Emojis work here. However, when I change the database settings to utf8(_general_ci) and leave the table as utf8mb4(_general_ci), emojis are still working, while this is not the case with my main database.
If I change my main database settings to charset + collation utf8mb4(_general_ci), and the tables as well, would that work?
And for database-access, will anything else have to be changed, such as character_set_connection or collation_connection?
I know on my JavaScript server, the connection is configured as utf8, I assume this has to be utf8mb4.
All current utf8(_general_ci) data, will that be kept intact when changing to utf8mb4(_general_ci)?
Correctly stored utf8 characters will convert correctly to utf8mb4.
You should also specify that the connections are utf8mb4.
See this for discussion of 'question mark'.
To convert all the char/text columns to utf8mb4:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
To convert one column:
ALTER TABLE tbl MODIFY COLUMN col ... CHARACTER SET utf8mb4;

character_set_database not updating on mysql

so I followed the "tutorial" here:
http://outofcontrol.ca/blog/comments/change-mysql-5.5-default-character-set-to-utf8
And after doing so, the mysql 5.7 servers on linux updated properly and everything is showing as I would expect.
BUT, I did the same on my development machine, Macbook Pro, El Capitan, and I see the following:
character_set_client utf8
character_set_connection utf8
**character_set_database latin1**
character_set_filesystem binary
character_set_results utf8
character_set_server utf8
character_set_system utf8
character_sets_dir /usr/local/mysql-5.6.10-osx10.7-x86_64/share/charsets/
Why is this still latin1? The other entries updated to utf8 and this worked on linux.
character_set_database is of virtually no use. Don't worry about it.
When creating a table, explicitly state
CREATE TABLE (...) ... CHARACTER SET utf8;
That helps you avoid various changes to defaults that may occur now or in the future.
Actually, the future is likely to be utf8mb4 with collation utf8mb4_unicode_520_ci.
As long as you are going to 5.7, you may as well go to that combo. utf8mb4 lets you get to Emoji and the rest of the Chinese character set; the 520 collation is arguably "more correct".
The link you gave is a bit weak -- it does not explain how to convert any data you have.
Existing tables are not changed by changing the settings you mentioned.
So, what is the whole picture? Do you have existing data? In utf8 or latin1? Are you updating in place? Or dumping and reloading? Etc.

How to save a Chinese character 𥚃 in MySQL

I am unable to save the character 𥚃 on mySQL 5.5. I have tried collation utf8mb4 and utf32. I have to store both Chinese and English characters in the same table.
I was able to save this charecter by using utf8mb4 charecterset on mysql server. So the output of show variables like 'char%'; should be all utf8mb4 except for perhaps system charset.
Try utf8 general, and also, don't change to execute
SET NAMES utf8;
beore the actual query, which is quite an important part

How to store non-english characters?

Non-english characters are messed up in a text column. Arabic text looks like this:
نـجـم سـهـيـل
How to store non-english characters correctly?
You should consider using utf8 to store your text.
You can do this at the database creation:
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
You can also configure mysql at installation or at startup to use utf8 (see Mysql manual)
The mysql manual pages cover all aspects of characterset and collations: http://dev.mysql.com/doc/refman/5.0/en/charset.html
The character set of the connection can be changed by
SET CHARACTER SET utf8
More details here and in the chapter Character set support
What OS are you using?
If Linux then it's good to have a system locale set to utf8 also, like "en_US.utf8".
And, to be sure, issue an "SET NAMES UTF8" command to mysql just after connection.
(db character set/collation must also be utf8)
The query below solved the issue.
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;