Retrieving latin1 encoded results with JDBC

Retrieving latin1 encoded results with JDBC - mysql

I am trying to retrieve result sets from a MySQL database sing JDBC which is then used to generate reports in BiRT. The connection string is set up in BiRT.
The database is latin1:
SHOW VARIABLES LIKE 'c%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 5 |
+--------------------------+----------------------------+
So I have been trying to correct the strange looking encoding results that are returned (German characters). I thought it would make sense to us the "characterSetResults" property to retrieve the result set as "latin1" like this:
jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=latin1&characterSetResults=latin1
This connection string fails and by deduction I have discovered that it is the property:
characterSetResults=latin1
is causing the connection to fail. The error is a long java error which means little to me. It starts with:
org.eclipse.birt.report.data.oda.jdbc.JDBCException: There is an error in get connection, Communications link failure
Last packet sent to the server was 38 ms ago..
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.doConnect(JDBCDriverManager.java:262)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.getConnection(JDBCDriverManager.java:186)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.tryCreateConnection(JDBCDriverManager.java:706)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.testConnection(JDBCDriverManager.java:634)
at org.eclipse.birt.report.data.oda.jdbc.ui.util.DriverLoader.testConnection(DriverLoader.java:120)
at org.eclipse.birt.report.data.oda.jdbc.ui.util.DriverLoader.testConnection(DriverLoader.java:133)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper.testConnection(JDBCSelectionPageHelper.java:687)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper.access$7(JDBCSelectionPageHelper.java:655)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper$7.widgetSelected(JDBCSelectionPageHelper.java:578)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:234)
If I change this to:
characterSetResults=utf8
the connection string connects without errors, but the encoding issue remains.
Does anyone know the correct way to retrieve latin1? And yes, I know UTF8 is the thing to use, but this is not my database....
Thank you for reading this,
Stephen

After some digging, have you tried characterSetResults=ISO8859_1? This is equivalent to latin1 and there is evidence MySQL handles this much better.
I do not have a DB to test this on, but it looks form what I read to be spot-on for what you need.

When specifying character encodings on the client side, use Java-style names(Mysql connector-j-reference-charsets).So it is supposed to work by using jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=utf-8&characterSetResults=Cp1252

Related

Why after executing set names utf8mb4, the column name changes to question mark?

Why after executing set names utf8mb4, the column name changes to question mark? See below:
mysql> show variables like 'character%' ;
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> select '\U+1F600';
+------+
| 😀 |
+------+
| 😀 |
+------+
mysql> set names utf8mb4;
mysql> select '\U+1F600';
+------+
| ? |
+------+
| 😀 |
+------+
In my opinion, utf8mb4 is designed to support these emoji characters. Why changed to utf8mb4, the column name changed to question mark?
In addition, I copied the emoji character from website(http://getemoji.com/) , then pasted it in terminal.If I just type '\U+1F600' manually. See below:
mysql> select '\U+1F600' ;
+---------+
| U+1F600 |
+---------+
| U+1F600 |
+---------+
So I guess when I pasted it in terminal there is something happened implicitly. And this implicitly conversion(😀 --> '\U+1F600') maybe could explain this phenomenpon.

This would appear to be expected behaviour according to MySQL documentation, where metadata is declared to be stored in utf8 (the non-4byte version).
It is returned to the client as character_set_result (utf8mb4), however most likely your virtual column name is being stored at utf8 to be compatible and comparable with all other metadata and thus the 4-byte part of the character is lost even though it is not in a real table.
See here:
https://dev.mysql.com/doc/refman/5.6/en/charset-metadata.html

I had found more info by using wireshark. See below:
Before executing set names utf8mb4
After executing set names utf8mb4
In this case the server can't find a Charset number, so the column name become a question mark. And it seems which Charset number does not matter, just need it is not Unknow. If I execute set names latin1, the response packet info is:

MySQL UTF8 Issue

Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.

How do I convert the OpenShift MySQL 5.1 cartridge to UTF-8

The default MySQL 5.1 cartridge apparently creates all its tables with the latin1 character set. I have an application (Review Board, a python/Django application) that has some issues unless the DB is running as UTF-8. How do I change that? I can't just edit my.cnf because it will be wiped at the next cartridge restart.
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I cannot change this setting in my.cnf, because to the best of my knowledge, there exists no OpenShift environment variable to set the character encoding. How do I persistently change this (ideally in my OpenShift hooks so this will persist into future deployments) and update my existing tables to UTF-8?

I found a solution but not a perfect one :
In openshift installing phpMyAdmin,
Find and change server settings, the relevant character variables changed from latin1 to utf8.
Hope that helps

Unicode characters from database not recognized

This stumps me. I'm upgrading a fairly large app (for me) from Rails 2.3 to Rails 3.0. I'm also running this app in Ruby 1.9.2 as opposed to 1.8.7 before. On top of that I've also switched to HTML5. There are therefore many variables in play.
In several pages, the text coming from the MySQL database just does not display right anymore. This can be as simple as the euro symbol (€) or as esoteric as some Sanskrit text: सर्वम् मंगलम्
While everything looked great on the old site now I get some garbage characters such as ‚Ç¨ instead of the euro sign or the following:
‡§∏‡§∞‡•ç‡§µ‡§Æ‡•ç ‡§Æ‡§Ç‡§ó‡§≤‡§Æ‡•ç
... instead of the sanskrit text.
The data in the database is unchanged. As far as I know everything is set up for utf-8 everywhere.
What gives?
Edit 1 following up Roland's help:
Here is what I get on my ubuntu server's MySQL databases:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
but here is what I get from running the command on my local mac:
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.5.14/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
The second listing looks better to me (who doesn't understand encoding very much).
Should I modify my server databases' settings? Won't that mess up their existing data? If so how do I go about changing the char. set variables?

When you interpret the given string as Unicode, save it as UTF-8 to a byte stream and then convert the byte stream to MacRoman, you will get the right bytes. These are the UTF-8 encoded string.
I did this (in a UTF-8 terminal):
$ echo '‡§∏‡§∞‡•ç‡§µ‡§Æ‡•ç ‡§Æ‡§Ç‡§ó‡§≤‡§Æ‡•ç' > in
$ iconv -f UTF-8 -t MacRoman < in
सर्वम् मंगलम्
So somewhere, the opposite conversion is done to the data. The byte stream is interpreted as being in MacRoman, and it is then converted to UTF-8 again.

MSSQL and MySQL as Linked Server

I have a MSSQL Server 2005 and MySQL Server as linked server.
I want to save particular data from MSSQL to MySQL.
And I have a huge problem related with encoding.
MS SQL
select SERVERPROPERTY ('collation')
Result: Cyrillic_General_CI_AS
MySQL
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
When I'm trying to retrive data from MySQL or to insert ones to MySQL
I have a wrong character set in text field,
something like that "???????????????"
How can I convert text data to UTF-8 encoding before inserting the data to linked server?
Or should I change some settings?
I don't want to change encoding of MySQL server on CP-1251, it's not convenient for me.

What is your Collation Compatible property for linked server? This might help.
Have you tried COLLATE?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Retrieving latin1 encoded results with JDBC - mysql

After some digging, have you tried characterSetResults=ISO8859_1? This is equivalent to latin1 and there is evidence MySQL handles this much better. I do not have a DB to test this on, but it looks form what I read to be spot-on for what you need.

When specifying character encodings on the client side, use Java-style names(Mysql connector-j-reference-charsets).So it is supposed to work by using jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=utf-8&characterSetResults=Cp1252

Related

Why after executing set names utf8mb4, the column name changes to question mark?

MySQL UTF8 Issue

How do I convert the OpenShift MySQL 5.1 cartridge to UTF-8

Unicode characters from database not recognized

MSSQL and MySQL as Linked Server

Categories

Resources