Unicode characters from database not recognized

Unicode characters from database not recognized - mysql

This stumps me. I'm upgrading a fairly large app (for me) from Rails 2.3 to Rails 3.0. I'm also running this app in Ruby 1.9.2 as opposed to 1.8.7 before. On top of that I've also switched to HTML5. There are therefore many variables in play.
In several pages, the text coming from the MySQL database just does not display right anymore. This can be as simple as the euro symbol (€) or as esoteric as some Sanskrit text: सर्वम् मंगलम्
While everything looked great on the old site now I get some garbage characters such as ‚Ç¨ instead of the euro sign or the following:
‡§∏‡§∞‡•ç‡§µ‡§Æ‡•ç ‡§Æ‡§Ç‡§ó‡§≤‡§Æ‡•ç
... instead of the sanskrit text.
The data in the database is unchanged. As far as I know everything is set up for utf-8 everywhere.
What gives?
Edit 1 following up Roland's help:
Here is what I get on my ubuntu server's MySQL databases:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
but here is what I get from running the command on my local mac:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.5.14/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
The second listing looks better to me (who doesn't understand encoding very much).
Should I modify my server databases' settings? Won't that mess up their existing data? If so how do I go about changing the char. set variables?

When you interpret the given string as Unicode, save it as UTF-8 to a byte stream and then convert the byte stream to MacRoman, you will get the right bytes. These are the UTF-8 encoded string.
I did this (in a UTF-8 terminal):
$ echo '‡§∏‡§∞‡•ç‡§µ‡§Æ‡•ç ‡§Æ‡§Ç‡§ó‡§≤‡§Æ‡•ç' > in
$ iconv -f UTF-8 -t MacRoman < in
सर्वम् मंगलम्
So somewhere, the opposite conversion is done to the data. The byte stream is interpreted as being in MacRoman, and it is then converted to UTF-8 again.

Related

ISO_1 to UTF8 failed

I have a datafile encoding by iso_1, and I changed it to UTF8:
file -i test.txt:
... text/plain; charset=utf-8
and mysql character_set is:
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/
My question is:
Why the chinese character is still messy code?
ºâÑô...

Which of these were you expecting?
big5 6 2 '算栠'
gb2312, gbk 6 2 '衡阳'
eucjpms, ujis 6 2 '財剩'
ºâÑô is "Mojibake" for one of those. See Trouble with UTF-8 characters; what I see is not what I stored
Some of the character_set_* settings reference the encoding in the client. It is quite OK for a column to be utf8mb4 while the client is using big5 or gb2312 (etc), but you must do SET NAMES big5 or the equivalent.

THANKS guys，I find use gb18030 covered to utf-8 worked.
But I dont know why the file -i showed the file charset is iso-8859-1.

Saving emoji characters in mysql database

I am trying to save some data on mysql database, input contains emoji characters like this : '\U0001f60a\U0001f48d' and I'm getting this error:
1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8A\\xF0\\x9F...' for column 'caption' at row 1"
I searched over net and read a lot of answers include these:
MySQL utf8mb4, Errors when saving Emojis or MySQL utf8mb4, Errors when saving Emojis or https://mathiasbynens.be/notes/mysql-utf8mb4#character-sets or http://www.java2s.com/Tutorial/MySQL/0080__Table/charactersetsystem.htm but nothing worked !!
I have different problems:
here is mydb info:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8_general_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
I tried to change character_set_server value to utf8mb4 by
mysql>SET character_set_server = utf8mb4
Query OK, 0 rows affected (0.00 sec)
But when restart mysqld everything revert !
I don't have any /etc/my.cnf file in also, and I edited /etc/mysql/my.cnf file instead.
What should I do?
How can I save emoji file in my database?

1st or 2nd line in source code (to have literals in the code utf8-encoded: # -- coding: utf-8 --
Your columns/tables need to be CHARACTER SET utf8mb4
The python package "MySQL-python" version needs to be at least 1.2.5 in order to handle utf8mb4.
self.query('SET NAMES utf8mb4') may be necessary.
Django needs client_encoding: 'UTF8' -- I don't know if that should be 'utf8mb4`.
References:
https://code.djangoproject.com/ticket/18392
http://mysql.rjweb.org/doc.php/charcoll#python

How do I convert the OpenShift MySQL 5.1 cartridge to UTF-8

The default MySQL 5.1 cartridge apparently creates all its tables with the latin1 character set. I have an application (Review Board, a python/Django application) that has some issues unless the DB is running as UTF-8. How do I change that? I can't just edit my.cnf because it will be wiped at the next cartridge restart.
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I cannot change this setting in my.cnf, because to the best of my knowledge, there exists no OpenShift environment variable to set the character encoding. How do I persistently change this (ideally in my OpenShift hooks so this will persist into future deployments) and update my existing tables to UTF-8?

I found a solution but not a perfect one :
In openshift installing phpMyAdmin,
Find and change server settings, the relevant character variables changed from latin1 to utf8.
Hope that helps

Retrieving latin1 encoded results with JDBC

I am trying to retrieve result sets from a MySQL database sing JDBC which is then used to generate reports in BiRT. The connection string is set up in BiRT.
The database is latin1:
SHOW VARIABLES LIKE 'c%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 5 |
+--------------------------+----------------------------+
So I have been trying to correct the strange looking encoding results that are returned (German characters). I thought it would make sense to us the "characterSetResults" property to retrieve the result set as "latin1" like this:
jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=latin1&characterSetResults=latin1
This connection string fails and by deduction I have discovered that it is the property:
characterSetResults=latin1
is causing the connection to fail. The error is a long java error which means little to me. It starts with:
org.eclipse.birt.report.data.oda.jdbc.JDBCException: There is an error in get connection, Communications link failure
Last packet sent to the server was 38 ms ago..
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.doConnect(JDBCDriverManager.java:262)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.getConnection(JDBCDriverManager.java:186)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.tryCreateConnection(JDBCDriverManager.java:706)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.testConnection(JDBCDriverManager.java:634)
at org.eclipse.birt.report.data.oda.jdbc.ui.util.DriverLoader.testConnection(DriverLoader.java:120)
at org.eclipse.birt.report.data.oda.jdbc.ui.util.DriverLoader.testConnection(DriverLoader.java:133)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper.testConnection(JDBCSelectionPageHelper.java:687)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper.access$7(JDBCSelectionPageHelper.java:655)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper$7.widgetSelected(JDBCSelectionPageHelper.java:578)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:234)
If I change this to:
characterSetResults=utf8
the connection string connects without errors, but the encoding issue remains.
Does anyone know the correct way to retrieve latin1? And yes, I know UTF8 is the thing to use, but this is not my database....
Thank you for reading this,
Stephen

After some digging, have you tried characterSetResults=ISO8859_1? This is equivalent to latin1 and there is evidence MySQL handles this much better.
I do not have a DB to test this on, but it looks form what I read to be spot-on for what you need.

When specifying character encodings on the client side, use Java-style names(Mysql connector-j-reference-charsets).So it is supposed to work by using jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=utf-8&characterSetResults=Cp1252

MSSQL and MySQL as Linked Server

I have a MSSQL Server 2005 and MySQL Server as linked server.
I want to save particular data from MSSQL to MySQL.
And I have a huge problem related with encoding.
MS SQL
select SERVERPROPERTY ('collation')
Result: Cyrillic_General_CI_AS
MySQL
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
When I'm trying to retrive data from MySQL or to insert ones to MySQL
I have a wrong character set in text field,
something like that "???????????????"
How can I convert text data to UTF-8 encoding before inserting the data to linked server?
Or should I change some settings?
I don't want to change encoding of MySQL server on CP-1251, it's not convenient for me.

What is your Collation Compatible property for linked server? This might help.
Have you tried COLLATE?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Unicode characters from database not recognized - mysql

Related

ISO_1 to UTF8 failed

Saving emoji characters in mysql database

How do I convert the OpenShift MySQL 5.1 cartridge to UTF-8

Retrieving latin1 encoded results with JDBC

MSSQL and MySQL as Linked Server

Categories

Resources