Ruby - mysql2 driver changing encoding / various utf-8 issues - mysql

I have an API running on Sinatra. It queries a mysql databases, and returns data in json or xml format. I'm having a problem with unicode data. If I query the production database from the console, I'll get data correctly:
persönlichen
However, in my API results (or if I were to query the database in irb using the mysql2 gem), I get this:
persönlichen
Everything works swimmingly on my development box, which is confounding my efforts to solve the problem.
I have done everything I can to make sure that the database is utf-8 only (encodings, collations, client and server character sets are all utf-8). I'm using the mysql2 driver, which supposedly forces everything to utf-8. I'm setting :encoding => 'UTF8' on my active record connection.
What am I missing?

I was able to nail the problem down - the data wasn't encoded correctly in the database. I was populating my database using a sql dump file - I added this to the top, and everything worked great:
set names utf8;
create database if not exists `my_db_name` CHARACTER SET utf8 COLLATE utf8_general_ci;

Related

Convert Mysql stored data to correct utf8

I stored large Arabic database in Mysql using Perl in wrong format, here what happened:
1)-Mysql tables created with attributes:
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
2)-My perl scripts are created in utf8 and I use "use utf8;" at the top.
3)-I can read the data from the table and display it normal in Arabic on html pages using charset utf8 meta tag.
The problem when I see the data in the database it is stored as encoded not readable, after search about Perl module DBI, I found I should do this:
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');
Immediately after connecting to the database which I did not do.
Now if I do this, the new data stored in the table are shown correct everywhere even in the mysql browser windows application.
The problem now with the already data stored in the table, it seems it comes double utf8 encode or something like that. How to fix the already stored data when the above flags were not set.
After days of search, did not find anyway to fix the data in mysql direct or using programming like Perl.
The only solution I did is to export the data from mysql the same way I put it in which seems to be double utf8 encoded by Perl to text files but after I utf8 decode it in Perl first.
After that the data is correctly saved to text files in UTF8 format which can be imported or managed direct as a valid UTF8 data in Perl and mysql.

Querying mysql database with SQLDeveloper doesn't return correct values

I have a mysql database with a charset utf8 of all the tables.
I am using SQLDeveloper to access and query the database with the latest JConnector JDBC driver.
When executing a simple query such as SELECT 'Варна'; equivalent to SELECT 'Варна' from DUAL;, which contains Bulgarian language, SQLDeveloper returns '?????'. This makes selects from the database in which I have used Bulgarian language return NULL, because their where clauses (containing Bulgarian language) mismatch the uft8 Bulgarian characters in the database. (When the select doesn't use Bulgarian language at all SQLDeveloper returns completely correct values and displays the Bulgarian language returned as a result of the query correctly.)
The Preferences -> Environment -> Encoding in SQLDeveloper is set currently to UTF-8, but I have tried virtually every applicable encoding listed in there and even the simplest query SELECT 'Варна' from DUAL; still does not return back the correct value Варна.
I have looked into setting the variable NLS_LANG, thinking this may be the cause but to no avail. (Perhaps it is the key after all but I am unable to actually configure it properly).
Edit: In order to reproduce the problem and visualise it (as I realise I may have explained it poorly) just go in SQLDeveloper and connect to a mysql database and execute the query SELECT 'Варна' from DUAL;.
Edit2: Clarifications.
Edit3: As shown by the comment left by #tenhouse it appears that this may be a bug.
Edit4: As stated below as a comment, the above query SELECT 'Варна' from DUAL; works perfectly fine without any modifications and/or settings fiddling on MySQL Workbench.
Edit5: Please, feel free to correct the title and/or tags if you feel that something can be improved as there is still no answer to the problem.
Edit6: By now can I assume that it really is a bug? Could anyone advise me where exactly to report it - is it a JConnector or SQLDeveloper related bug. I would think that I have to report it as a SQLDeveloper bug but I'd rather get a confirmation before possibly wasting their time.
Edit7: Tried to clarify it even further in my hopes for an answer.
Edit8: (Important!) My current database is hosted on linux (Ubuntu 12.04, MySQL 5.5.28) server. If, however, I install MySQL on a fresh Windows machine and create a utf8 db there, querying through SQLDeveloper works as it is supposed to, SELECT 'Варна' from DUAL; actually returns Варна. Could someone please confirm this?
So I didn't know this myself till having this problem a few months back, but MySQL actually offers the ability for different encodings for clients, databases, and connections. MySQL will convert (or collate) the requests/responses from/to a client to different encodings as specified by the client or in his config file. So even though the database is storing stuff as utf8, if the client is set to latin1, your going to see latin1 as your result encoding. The easiest way to check this is to spin up a connection to MySQL and run the following query:
SHOW VARIABLES LIKE "%char%";
You should see a whole bunch of encodings for different connections/sources. From your description, I imagine most of these will not be utf8. Here's mysql's doc on what each of these mean. You can test if this in fact the problem by doing a SET NAMES 'utf8'; or charset utf8; (can't remember which one) and running your queries again to see if that fixes the problem.
A summary of what each of these guys does (since the docs leave some stuff out):
character_set_client: specifies how data is encoded when sending from client to server. Anything connecting through MySQL's API is not a client (ex. php's mysqli, most C/C++ wrapper libs)
character_set_database: specifies the encoding for data stored in the database
character_set_filesystem: not really sure, but I believe how data is written to disk?
character_set_results: the encoding that MySQL returns query results
character_set_server: server's default set (not really sure cases where this is used)
character_set_system: not sure on this one either
character_sets_dir: where your collation/encoding definitions are located
Most of these guys can be specified by editing your my.cnf and sticking your defaults in there.
I'm not exactly sure how JConnector works, but I imagine it uses MySQL's C API, in which case you'll need to do something like the following somewhere in the code. Maybe JConnector has a way for you to set this through him. I'm not sure, but here's the syntax for the MySQL API:
mysql_options( myLink, MYSQL_SET_CHARSET_NAME, "utf8" );
EDIT: For MySQL 5.5
You can try a command like this ::
ALTER DATABASE CHARACTER SET WE8ISO8859P5;
Please restart the DB after changing the characterset.
More details refer this link where it explains about the encoding required for different languages
http://www.csee.umbc.edu/portal/help/oracle8/server.815/a67789/ch3.htm
after you connect with a mysql_connect:
$dbcnx = mysql_connect($dbhost, $dbuser, $dbpass)
you do this query:
mysql_query("SET
character_set_results = 'utf8',
character_set_client = 'utf8',
character_set_connection = 'utf8',
character_set_database = 'utf8',
character_set_server = 'utf8'",
$dbcnx);
Now this will set the encoding for what is returned, what happens on the server - so all of it has the same encoding.
In your following query's, you specify this connection to be used
Export
Add [?characterEncoding=utf8]
<StringRefAddr addrType="customUrl">
<Contents>jdbc:mysql://instance_host_name:3306/database_name?characterEncoding=utf8</Contents>
</StringRefAddr>
Import

Rails, MySQL, Unicode data and latin1 tables - Where to go from here?

I'm not 100% sure on the particulars, so I'd love someone straightening me out, but I'll forge ahead with what I think is going on...
When I first setup my database, I used the default character encoding of the system without even thinking, and it was latin1. I never even thought about i18n/l10n. It just didn't occur to me. I just accepted the defaults and went with it.
Anyways, I've been using the database exclusively for a Rails app, and we've now got several GB of data, 100,000s of rows, and many international users. I've noticed that many of our foreign users are inserting data that seems to be Unicode / non-latin1. Here is an example:
What about crazy Unicode stuff? ☢ ☠ ☭
database.yml
Here is our database.yml file.
development:
adapter: mysql
database: XXX
username: YYY
password: ZZZ
host: localhost
encoding: utf8
As you can see, we're setting our character encoding to utf8. However, all our tables have a default character set of latin1. I'm sure of this.
Update After looking closely, our production database.yml does not specify an encoding, while my local copy was specifying utf8. This was causing problems when I would dump the production database and import it locally. It seems now that the import was working fine, but Rails was reading it incorrectly.
mysql CLI tool
When I view the data via the mysql CLI tool, it displays all the Unicode characters correctly. However, the 'show create table' statement clearly shows that the tables are default charset latin1. This leads me to believe that MySQL is somehow smart enough to store non-latin1 data.
HTTP header
Our HTTP Content-Type header is set to utf-8, like so:
Content-Type: text/html; charset=utf-8
Conversion Attempts
I've played a little with converting our tables to utf-8 encoding, all with no success. Mainly I tried dumping the database, running iconv to convert, then re-importing with the tables set to utf-8. MySQL had no errors, but all the Unicode data was garbled.
What to do?
I'm kind of stuck as to what to do (if anything). I'm a strong believer in not fixing what isn't broken, but this whole situation worries me. We've never had any complaints from users about not being able to store their data, and everything seems to be working fine. I'd just like to know what exactly is going on, who/what is doing the conversion (MySQL? Ruby? Rails? MySQL connection?), and any tips on how to proceed.
Most likely the data stored in your tables is valid UTF-8, but MySQL thinks it's Latin-1 (because that's the datatype the column was declared with). It is also valid Latin-1, of course, since AFAIK any arbitrary sequence of bytes is valid Latin-1.
What happens when you convert to UTF-8 is that MySQL sees valid Latin-1 encoded data and converts that to the equivalent valid UTF-8. This means that you get data that's double-UTF-8-encoded, which is why it is garbled.
The way to get around this is to convert the column to a binary string and then to UTF-8 from there. MySQL does not convert the string when you do this (because you're converting it via a format that just says, "treat this string as a series of 0s and 1s").
ALTER TABLE MyTable
MODIFY MyColumn CHAR(100) CHARACTER SET binary,
MODIFY MyColumn CHAR(100) CHARACTER SET utf8
What worked for me (and others) was to use the mysql2 adapter.
In your Gemfile:
gem "mysql2"
In config/database.yml:
adapter: mysql2
And, you should remember to set your database character set to UTF-8 as well, but as I understand, you also did this :-)
Hope this helps?

OpenJPA & MySQL persist wrong encoded characters

my mysql db has character encoding utf8. In QueryBrowser i can see special characters are correct. In appplication using openjpa i can see the same values also correct.
But when I persist object into DB, I have correct values in application but incorrect in DB!
When I restart application that special characters in application are incorrect.(as they are picked from DB)
All is set to UTF-8, java application works well, reading data from DB is correct but problem is when openjpa stores values in DB, they turn into '?'.
Any ideas? Thanks
Check your encoding on the MySql server configuration level (my.cnf file), and also on the level of a specific database. Once, I had similar issue when these two options has been set to different values (encodings).

Managing Unicode Data in MySQL and VB.NET

I wish to develop a client-server application in VB.NET. I want to store some fields in
Unicode. As per MySQL documentation I tried the fields with varchar and charset UTF-8 for storing Unicode data.
I could insert data using the MySQL connector command object but when I try to display data in datagridview some junk is appearing.
What am I missing?
I don't know VB.NET, but you should have the possibility to set the encoding of the database connection from your application to MySQL when setting up to connection. Is that part set to UTF-8 as well?
Alternatively you can try issuing the following MySQL command after you connect:
SET NAMES utf8