ISO_1 to UTF8 failed - mysql

I have a datafile encoding by iso_1, and I changed it to UTF8:
file -i test.txt:
... text/plain; charset=utf-8
and mysql character_set is:
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/
My question is:
Why the chinese character is still messy code?
ºâÑô...

Which of these were you expecting?
big5 6 2 '算栠'
gb2312, gbk 6 2 '衡阳'
eucjpms, ujis 6 2 '財剩'
ºâÑô is "Mojibake" for one of those. See Trouble with UTF-8 characters; what I see is not what I stored
Some of the character_set_* settings reference the encoding in the client. It is quite OK for a column to be utf8mb4 while the client is using big5 or gb2312 (etc), but you must do SET NAMES big5 or the equivalent.

THANKS guys,I find use gb18030 covered to utf-8 worked.
But I dont know why the file -i showed the file charset is iso-8859-1.

Related

Does characterEncoding in connection string set how values is stored?

I'm about to change the encoding for a database from latin1 to utf8mb4.
Due to privacy restrictions, I don't know what the database to be converted contains. I'm worried that by running below SQL, existing data may be changed.
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
However, the connection string from the grails application contains useUnicode=true&characterEncoding=UTF-8, does this mean that even though latin1_swedish_ci is used for a column, the actual value that has been saved is UTF-8 encoded?
And since this value is UTF-8 encoded, there is no risk that the data will be affected by the change from latin1 to utf8mb4?
+--------------------------+-------------------+
| Variable_name | Value |
+--------------------------+-------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+--------------------------+-------------------+```
That's Ώπα? That's the interpretation in UTF-8 (as the outside world calls it), utf8mb4 (MySQL's equivalent) or utf8 (MySQL's partial implementation of UTF-8).
It would not work well in latin1.
The encoding in the client and the encoding of a column in the database need not be the same. However, Greek in the client cannot be crammed into latin1 in the table, hence the error message.
What ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; does is to change all the text columns in that table to be utf8-encoded and convert from whatever encoding is currently used (presumably latin1). This is fine for Western European characters, all of which exist (with different encodings) in both latin1 and utf8.
To handle Emoji and some of Chinese, you may as well go for utf8mb4:
ALTER TABLE table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8_unicode_520_ci;

Saving emoji characters in mysql database

I am trying to save some data on mysql database, input contains emoji characters like this : '\U0001f60a\U0001f48d' and I'm getting this error:
1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8A\\xF0\\x9F...' for column 'caption' at row 1"
I searched over net and read a lot of answers include these:
MySQL utf8mb4, Errors when saving Emojis or MySQL utf8mb4, Errors when saving Emojis or https://mathiasbynens.be/notes/mysql-utf8mb4#character-sets or http://www.java2s.com/Tutorial/MySQL/0080__Table/charactersetsystem.htm but nothing worked !!
I have different problems:
here is mydb info:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8_general_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
I tried to change character_set_server value to utf8mb4 by
mysql>SET character_set_server = utf8mb4
Query OK, 0 rows affected (0.00 sec)
But when restart mysqld everything revert !
I don't have any /etc/my.cnf file in also, and I edited /etc/mysql/my.cnf file instead.
What should I do?
How can I save emoji file in my database?
1st or 2nd line in source code (to have literals in the code utf8-encoded: # -- coding: utf-8 --
Your columns/tables need to be CHARACTER SET utf8mb4
The python package "MySQL-python" version needs to be at least 1.2.5 in order to handle utf8mb4.
self.query('SET NAMES utf8mb4') may be necessary.
Django needs client_encoding: 'UTF8' -- I don't know if that should be 'utf8mb4`.
References:
https://code.djangoproject.com/ticket/18392
http://mysql.rjweb.org/doc.php/charcoll#python

When does mysql throw an error instead of coercing text into the default column format?

I'm working with two mysql servers, trying to understand why they behave differently.
I've created identical tables on each:
| Field | Type | Collation |
+----------------+------------+-------------------+
| some_chars | char(45) | latin1_swedish_ci |
| some_text | text | latin1_swedish_ci |
and I've set identical character set variables:
| Variable_name | Value
+--------------------------+-------+
| character_set_client | utf8
| character_set_connection | utf8
| character_set_database | latin1
| character_set_filesystem | binary
| character_set_results | utf8
| character_set_server | latin1
| character_set_system | utf8
When I insert UTF-8 characters into the database on one server, I get an error:
DatabaseError: 1366 (HY000): Incorrect string value: '\xE7\xBE\x8E\xE5\x9B\xBD...'
The same insertion in the other server throws no error. The table just silently accepts the utf-8 insertion and renders a bunch of ? marks where the utf-8 characters should be.
Why is the behavior of the two servers different?
What command were you executing when you got the error?
Your data is obviously utf8 (good).
Your connection apparently is utf8 (good).
Your table/column is declared CHARACTER SET latin1? It should be utf8.
That is 美 - Chinese, correct? Some Chinese characters need 4-byte utf8. So you should use utf8mb4 instead of utf8 in all 3 cases listed above.
Other notes:
There is no substantive difference in this area in 5.6 versus 5.7.
##SQL_MODE is not relevant.
VARCHAR is usually advisable over CHAR.

Unicode characters from database not recognized

This stumps me. I'm upgrading a fairly large app (for me) from Rails 2.3 to Rails 3.0. I'm also running this app in Ruby 1.9.2 as opposed to 1.8.7 before. On top of that I've also switched to HTML5. There are therefore many variables in play.
In several pages, the text coming from the MySQL database just does not display right anymore. This can be as simple as the euro symbol (€) or as esoteric as some Sanskrit text: सर्वम् मंगलम्
While everything looked great on the old site now I get some garbage characters such as € instead of the euro sign or the following:
सर्वम् मंगलम्
... instead of the sanskrit text.
The data in the database is unchanged. As far as I know everything is set up for utf-8 everywhere.
What gives?
Edit 1 following up Roland's help:
Here is what I get on my ubuntu server's MySQL databases:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
but here is what I get from running the command on my local mac:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.5.14/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
The second listing looks better to me (who doesn't understand encoding very much).
Should I modify my server databases' settings? Won't that mess up their existing data? If so how do I go about changing the char. set variables?
When you interpret the given string as Unicode, save it as UTF-8 to a byte stream and then convert the byte stream to MacRoman, you will get the right bytes. These are the UTF-8 encoded string.
I did this (in a UTF-8 terminal):
$ echo 'सर्वम् मंगलम्' > in
$ iconv -f UTF-8 -t MacRoman < in
सर्वम् मंगलम्
So somewhere, the opposite conversion is done to the data. The byte stream is interpreted as being in MacRoman, and it is then converted to UTF-8 again.

tomcat/jdbc/mysql: can insert ÿ(U+00FF) but not Ā (U+0100)

my setup:
mysql 5.1
show variables:
| character_set_client | utf8
| character_set_connection | utf8
| character_set_database | utf8
| character_set_filesystem | binary
| character_set_results | utf8
| character_set_server | utf8
| character_set_system | utf8
| character_sets_dir | D:\Programme\MySQL\MySQL Server 5.1\share
charsets\
| collation_connection | utf8_general_ci
| collation_database | utf8_unicode_ci
| collation_server | utf8_general_ci
and even
| init_connect | SET collation_connection = utf8_general_ci; SET NAMES utf8;
the table table has character set utf8
tomcat 6.0
the jdbc connector uses characterEncoding="utf8" useUnicode="true"
now when i try
stmt.execute("UPDATE *table* SET *value*=\"ÿ\" WHERE ...)
it works but for
stmt.execute("UPDATE *table* SET *value*=\"Ā\" WHERE ...)
i get an
java.sql.SQLException: Incorrect string value: '\xC4\x80' for column
'value' at row 1
furthermore it works for all characters below ÿ, which can be encoded with 1 byte but as soon as 2 bytes are needed: bang!
why is that so? and how can i get it to work?
after i added another two tables to check if it's an MyISAM vs. InnoDB problem it just worked on the new tables and why?
in the new tables each column used the default charset while in my existing tables the charsets of each column were set to latin1. this was because i copied the db from a non-utf8 mysql instance and manually changed the table charset to utf-8. BUT while copying, HeidiSQL added a "CHARACTER SET latin1" to each column which wasn't changed when changing the charset AND it is not very easily visible in HeidiSQL that a column has an individual charset ...