UTF8 MySQL problems on Rails - encoding issues with utf8_general_ci - mysql

I have a staging Rails site up that's running on MySQL 5.0.32-Debian.
On this particular site, all of my tables are using utf8 / utf8_general_ci encoding.
Inside that database, I have some data that looks like so:
mysql> select * from currency_types limit 1,10;
+------+-----------------+---------+
| code | name | symbol |
+------+-----------------+---------+
| CAD | Canadian Dollar | $ |
| CNY | Chinese Yuan | å…ƒ |
| EUR | Euro | € |
| GBP | Pound | £ |
| INR | Indian Rupees | ₨ |
| JPY | Yen | ¥ |
| MXN | Mexican Peso | $ |
| USD | US Dollar | $ |
| PHP | Philippine Peso | ₱ |
| DKK | Denmark Kroner | kr |
+------+-----------------+---------+
Here's the issue I'm having
On staging (with the db and Rails site running on the debian box), the characters for symbols are appearing correctly when displayed from Rails. For instance, the Chinese Yuan is appearing as 元 in my browser, not å…ƒ as it shows inside the database.
When I download that data to my local OS X development machine and run the db and Rails locally, I see the representation from inside the DB (å…ƒ) on my browser, not the character 元 as I see in staging.
Debugging I've done
I've ensured all headers for Content-Type are coming back as utf8 from each webserver (local, staging).
My local mysql server and the staging server are both setup to use utf8 as the default charset. I'm using "set names 'utf8'" before I make any calls.
I can even connect to my staging db from my OS X Rails host, and I still see the characters å…ƒ representing the yuan. I'm guessing then, perhaps there's an issue with my mysql local client, but I can't figure out what the issue is.
Perhaps this might lend a clue
To make it even more confusing, if I paste the character 元 into the db on my local machine, I see that in the web browser fine. --- YET if I paste that same character into my staging db, I get a ? mark in it's place on the page from my staging Rails site.
Also, locally on my OS X rails machine if I use "set names 'latin1'" before my queries, the characters all come back properly. I did have these tables set as latin1 before - could this be the issue?
Someone please help me out here, I'm going crazy trying to figure out what's wrong!

AHA! Seems I had some table information encoded in latin1 before, and stupidly changed the databases to utf8 without converting.
Running the following fixed that currency_types table:
mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset DBNAME > DBNAME.sql
mysql -u root -p --default-character-set=utf8 DBNAME < DBNAME.sql
Now I just have to ensure that the other content generated after the latin1 > utf8 switch isn't messed up by that :(

Do you have these two lines in your database.yml under the proper section?
encoding: utf8
collation: utf8_general_ci

The problem could have been with you MySQL client in staging it does not support UTF-8.
Your local OSX ruby installation configuration might not have declared the proper configs.
You should have "encoding: utf8" in "config/database.yml" for the MySQL database.
You should have "$KCODE = 'u'" in "config/environment.rb" for the ruby enviroment.

Another simple approach is to set the encode type by using SQL Alter statement. You can do this using the below bash script.
for t in $(mysql --user=root --password=admin --database=DBNAME -e "show tables";);do echo "Altering" $t;mysql --user=root --password=admin --database=DBNAME -e "ALTER TABLE $t CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;";done
prettified
for t in $(mysql --user=root --password=admin --database=DBNAME -e "show tables";);
do
echo "Altering" $t;
mysql --user=root --password=admin --database=DBNAME -e "ALTER TABLE $t CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;";
done

My DB was already set by default to utf8, but I encountered the same problem.
Also after adding the following usual meta tag, the problem was still there:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then I created a dedicated connection.php to ensure all communication with MySQL is set to charset utf8. Note that there is no - in utf8 in mysqli_set_charset($bd, 'utf8')!
Here is my Connection.php:
<?php
$mysql_hostname = "localhost";
$mysql_user = "username";
$mysql_password = "password";
$mysql_database = "dbname";
$prefix = "";
$bd = mysqli_connect($mysql_hostname, $mysql_user, $mysql_password) or die("Could not connect database");
mysqli_select_db($bd, $mysql_database) or die("Could not select database");
if(!mysqli_set_charset($bd, 'utf8')) {
exit() ;
}
?>
Another php file:
<?php
//Include database connection details
require_once('connection.php');
//Enter code here...
//Create query
$qry = "SELECT * FROM subject";
$result = mysqli_query($bd, $qry);
?>
//Other stuff

For Rails run the following code snippet into rails console. It will generate an sql for all tables. Then log in to mysql and execute copied sql from rails console. It will alter all tables encoding.
schema = File.open('db/schema.rb', 'r').read
rows = schema.split("\n")
table_name = nil
rows.each do |row|
if row =~ /create_table/
table_name = row.match(/create_table "(.+)"/)[1]
puts "ALTER TABLE `#{table_name}` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;"
end
end

You can generate a migration, the Rails way, to change the collation type on your databases:
rails generate migration ChangeDatabaseCollation
Then you can edit the generated file and paste:
def change
# for each table that will store the new collation execute:
execute "ALTER TABLE my_table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci"
end
And run the migration:
rake db:migrate
You can also enforce the new collation on your database.yml:
development:
adapter: mysql2
encoding: utf8
collation: utf8_general_ci
For more information on Rails migrations:
http://edgeguides.rubyonrails.org/active_record_migrations.html
For more information on collation types:
http://collation-charts.org/

Related

why in remote server cannot input Chinese character in mysql command line but local is ok

In my local machine(Mac OS) iTerm2 terminal, I could login remote mysql server and input Chinese character successfully
➜ ~ mysql -h remote_ip -u username -p foo --safe-updates
mysql> select '你好';
+--------+
| 你好 |
+--------+
| 你好 |
+--------+
1 row in set (0.01 sec)
then I login a remote server by ssh
➜ ~ ssh root#remote_ip
and login the same mysql server
root#qa-web:~# mysql -h remote_ip -u username -p foo --safe-updates
mysql> select '
but this time I cannot input Chinese character in command line, after inputting they are disappeared immediately.
Why is so?
Additional information
In above second case
mysql> status
...
Server characterset: utf8mb4
Db characterset: utf8
Client characterset: utf8mb4
Conn. characterset: utf8mb4
My machine Mac OS
➜ ~ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
Remote Server
root#hg:~# locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
root#hg:~# locale -a
C
C.UTF-8
en_US.utf8
POSIX
Copy this sql(select 'hello','你好';) from Atom to Mysql command line, and the effect is
mysql> select 'hello','';
+-------+--+
| hello | |
+-------+--+
| hello | |
+-------+--+
1 row in set (0.00 sec)
Locale
➜ ~ 你好
zsh: command not found: 你好
Remote
root#hg:~# 你好
-bash: $'\344\275\240\345\245\275': command not found
try starting mysql with mysql --default-character-set=utf8
This forces the character_set_client, character_set_connection and character_set_results variables to be UTF8.
Source: MySQL command line formatting with UTF8
Of course if you are using a chinese specific charset, start it with that. Like GB (gb18030) charsets for example.
If you are using Windows with cmd, then...
The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. some code pages
To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console
On Unix systems you need to make sure your locale matches your terminal emulation. Programs, like the MySQL shell, uses the locale value to determine what encoding to write to the terminal in.
On Mac, the default locale is usually UTF-8 based, which matches the default iTerm profile.
When you remote shell, you also need to make sure the remote locale matches your terminal. If the remote server is non-mac then it's more likely that the locale is not UTF8 based.
When you remote shell, get the locale in play
locale
To work with the Mac defaults, it needs to be *.utf-8. For example:
en_gb.utf-8
If it's not then you must change it, at least temporarily. See what locales are available:
locale -a
Find an appropriate locale then set the LANG environment variable:
export LANG=en_gb.utf-8
Now you can run mysql and get the correct results.

Character set and collation in database

I am a beginner when it comes to databases, so please bear with me. I'm trying to set up a database and import some tables from a file tables.sql. Some of the Columns in tables.sql have Swedish letters in them (Ä, Ö) and the problem is that I get the following:
Ä = ä
Ö = ö
First I begin to check the character set of the server:
mysql> show variables like 'character_set_server';
The server is configured to character set 'Latin-1'. I must mention that I have no control over the server more than to create a database. So I guess I have to create my database and specify the character set of the database.
This is how I proceed:
mysql> create database db;
mysql> alter database db character set utf8 collate utf8_swedish_ci;
I double checked that my tables.sql have charset utf-8 by executing:
file -bi allsok_tables.sql
And then I load it into the database by:
$ mysql -u [username] -h [hostname] -P [port] -p db < tables.sql
when I create my tables in tables.sql I use engine = InnoDB (don't know if this is relevant or not). However if I now select everything from the table TableTest
mysql> select * from TableTest
I get these weird characters instead of the Swedish characters. I appreciate any help right now.
Thanks in advance!
UPDATE:
If I insert a value manually into a table it works e.g.
mysql> insert into TableTest values ('åäö');
So the problem seems to be with the .sql-file. Right?
$ mysql ... --default-character-set=utf8 < tables.sql
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MySQL needs to know what encoding the data you're sending it is in. To do this, you need to set the connection encoding. When connecting to MySQL from a client, you usually run a SET NAMES query or use an equivalent call on your API of choice to do so. On the command line, the --default-character-set option does this. It needs to be set to whatever encoding your file is in.

Changing the charset to utf8 for all tables and fields in mysql

After search, I found (just read, but did not try) that I can use the below command to change the charset for all the fields and tables of a mysql database to utf8:
mysql --database=dbname -B -N -e "SHOW TABLES" \
| awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' \
| mysql --database=dbname &
But I have some questions on it:
Where is the username and password to be given to be able to login to mysql database?
Why to use the &?
Should I type it as it is without username and wait? How can I know the situation?
Where is the username and password to be given to be able to login to
mysql database
You might need to provide them in the two commands if you haven't already put them in environment variables.
Why to use the &? Should I type it as it is without username and wait? How can I know the situation?
This runs the process in the background. You should get an error if it fails. You can run it without the ampersand to know when it's complete.

Encoding error with Rails 2.3 on Ruby 1.9.3

I'm in the process of upgrading an old legacy Rails 2.3 app to something more modern and running into an encoding issue. I've read all the existing answers I can find on this issue but I'm still running into problems.
Rails ver: 2.3.17
Ruby ver: 1.9.3p385
My MySQL tables are default charset: utf8, collation: utf8_general_ci. Prior to 1.9 I was using the original mysql gem without incident. After upgrading to 1.9 when it retrieved anything with utf8 characters in it would get this well-documented problem:
ActionView::TemplateError (incompatible character encodings: ASCII-8BIT and UTF-8)
I switched to the mysql2 gem for it's superior handling and I no longer see exceptions but things are definitely not encoding correctly. For example, what appears in the DB as the string Repoussé is being rendered by Rails as Repoussé, “Boat” appears as “Boatâ€, etc.
A few more details:
I see the same results when I use the ruby-mysql gem as the driver.
I've added encoding: utf8 lines to each entry in my database.yml
I've also added the following to my environment.rb:
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
It has occurred to me that I may have some mismatch where latin1 was being written by the old version of the app into the utf8 fields of the database or something, but all of the characters appear correctly when viewed in the mysql command line client.
Thanks in advance for any advice, much appreciated!
UPDATE: I now believe that the issue is that my utf8 data is being coerced through a binary conversion into latin1 on the way out of the db, I'm just not sure where.
mysql> SELECT CONVERT(CONVERT(name USING BINARY) USING latin1) AS latin1, CONVERT(CONVERT(name USING BINARY) USING utf8) AS utf8 FROM items WHERE id=myid;
+-------------+----------+
| latin1 | utf8 |
+-------------+----------+
| Repoussé | Repoussé |
+-------------+----------+
I have my encoding set to utf8 in database.yml, any other ideas where this could be coming from?
I finally figured out what my issue was. While my databases were encoded with utf8, the app with the original mysql gem was injecting latin1 text into the utf8 tables.
What threw me off was that the output from the mysql comand line client looked correct. It is important to verify that your terminal, the database fields and the MySQL client are all running in utf8.
MySQL's client runs in latin1 by default. You can discover what it is running in by issuing this query:
show variables like 'char%';
If setup properly for utf8 you should see:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
If these don't look correct, make sure the following is set in the [client] section of your my.cnf config file:
default-character-set = utf8
Add add the following to the [mysqld] section:
# use utf8 by default
character-set-server=utf8
collation-server=utf8_general_ci
Make sure to restart the mysql daemon before relaunching the client and then verify.
NOTE: This doesn't change the charset or collation of existing databases, just ensures that any new databases created will default into utf8 and that the client will display in utf8.
After I did this I saw characters in the mysql client that matched what I was getting from the mysql2 gem. I was also able to verify that this content was latin1 by switching to "encoding: latin1" temporarily in my database.conf.
One extremely handy query to find issues is using char length to find the rows with multi-byte characters:
SELECT id, name FROM items WHERE LENGTH(name) != CHAR_LENGTH(name);
There are a lot of scripts out there to convert latin1 contents to utf8, but what worked best for me was dumping all of the databases as latin1 and stuffing the contents back in as utf8:
mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset DBNAME > DBNAME.sql
mysql -u root -p --default-character-set=utf8 DBNAME < DBNAME.sql
I backed up my primary db first, then dumped into a test database and verified like crazy before rolling over to the corrected DB.
My understanding is that MySQL's translation can leave some things to be desired with certain more complex characters but since most of my multibyte chars are fairly common things (accent marks, quotes, etc), this worked great for me.
Some resources that proved invaluable in sorting all of this out:
Derek Sivers guide on transforming MySQL data latin1 in utf8 -> utf8
Blue Box article on MySQL character set hell
Simple table conversion instructions on Stack Overlow
You say it all looks OK in the command line client, but perhaps your Terminal's character encoding isn't set to show UTF8? To check in OS X Terminal, click Terminal > Preferences > Settings > Advanced > Character Encoding. Also, check using a graphical tool like MySQL Query Browser at http://dev.mysql.com/downloads/gui-tools/5.0.html.

MySQL Table Convrted to utf-8

I had an old table with latin1 charset. Using phpmyadmin, I convert it to utf-8
After that, when I read data with php, my data shows as ???? ????? question marks
my page charset is utf-8 there is no problem with my php , and i also tried :
#mysqli_query("SET NAMES 'utf8'", $db);
#mysqli_query("SET CHARACTER SET 'utf8'", $db);
#mysqli_query("SET character_set_client = utf8 ",$db);
#mysqli_query("SET character_set_results = utf8 ",$db) ;
#mysqli_query("SET character_set_connection = utf8 ",$db);
before any query
seems doesn't work, still showing as ???? ??????
there is no problem for new records, but old records are not readable
they are stored in db like : غلامی
Is there any way to retrieve those old data?
What I used to do is go on the command line
mysqldump --default-character-set=latin1 ... > file.sql
then convert file.sql to UTF (using ICONV or any other option) then alter the table's charset and use
mysql --default-character-set=utf8 < file.sql
You could just use the export option from phpMyAdmin and converting like in the above example, change the charset on the table from phpMyAdmin and then import it again.
I usually did a backup before doing this cause mistakes happen but its very simple