character set problem in mysql - mysql

Mysql's environment is following:
character_set_database="big5"
And when I send a SQL which contains tranditional Chinese
(such as "select * from a where name =
'中')
from jdbc to mysql database, it will throw the following exception:
Illegal mix of collations (big5_chinese_ci,IMPLICIT), (latin1_swedish_ci,COERCIBLE), (latin1_swedish_ci,COERCIBLE) for operation ' IN ''
How can i solve this ?
But we need to do that between oracle and mysql, and when my program get the data from oracle(it's encoding is ISO-8859-1) and pass it into the SQL statement in JDBC, it will have such problem, but i can't change the collation of oracle. How to solve this? Why JSP can't solve this automatically ?
I have tried to convert but Chinese characters can not be saved into Latin1 character set.
might this cause the problem ?

Check your collations.
The database itself can have one collation and the tables another one totally different.
If you mix collations from two tables, you get this error.
Also, the swedish collation seems to be the default for databases (have no idea of why).

What's your encoding in your java project side? Did you make sure that it's big-5 too?

JSP can't solve the problem. How should JSP know, what you want?
Do you make any encoding-transformation in your JSP or do you just put the oracle-data into the mysql-database?
In generaly it is important to have the same encoding in your script, your tables and very important in the connection to the database.

Related

MySQL Workbench Connection Encoding

While testing some code I stumbled on the following MySQL error:
Error Code: 1267. Illegal mix of collations (utf8_general_ci,IMPLICIT) and ( utf8mb4_general_ci,COERCIBLE) for operation '='
I was using a WHERE statement on a standard MySQL UTF-8 collation column which contained a character using 4 bytes. Unless I misunderstood, while reading, I found the following information:
MySQL's original UTF-8 implementation was incomplete (supporting maximum 3 bytes)
The way to solve this is a new collation called utf8mb4 which by no means a new encoding but only used by MySQL to patch their original mistake.
On my end I see no reasons to use the original MySQL UTF-8 implementation since it's incomplete. So I did a few server side configuration to make sure all defaults were pointing to utf8mb4. Everything seemed fine but now on my application: I can use 🐼 characters in my form without having to worry about MySQL.
My problem now remains that when I connect with MySQL Workbench, it seems that the encoding is being forced to UTF-8. So even if my application works correctly, if I want to run tests directly in MySQL Workbench, I get the "Illegal mix of collation" error unless I run this fix (in Workbench) after starting the application:
SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci'
I found this old question (MySQL Workbench charset) where it seemed impossible to overwrite the setting but even after I spent too much time searching for the config, I cannot believe this is still the case??
For now, I'm afraid, you will have to live with that. There's a WL for MySQL to rename that encoding to utf8 (throwing out the existing 3 byte variant). So it makes sense to keep utf8 in MySQL Workbench or we have to use different settings for different servers, which makes things more complicated.

best solution for Mysql error in rails 4 app 'Incorrect string value'?

I have a rails application, in which I am using ‘delayed_job_active_record’ gem for running background jobs. While using the method ‘.delay’ with an object, I am getting the following mysql error:
**
‘Incorrect string value: '\xE2\x9C\x93"\x0A ...' for column 'handler'
at row 1
**
I already searched for the above error and found that its because of the difference in encoding in mysql and rails. The solution suggested by many programmers is to alter the encoding in mysql database to utf8.
But I also read that MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported. Which might cause trouble in some other cases. Also, when I tried to insert the value directly in mysql, it worked like a charm. Suggesting that the issue might lie elsewhere.
So, can anyone please suggest the right method to rectify this problem?
Today, I found a fixed a very similar bug.
You say:
when I tried to insert the value directly in mysql, it worked like a charm
... it's not clear whether you're inserting the value into the model, or into the DelayedJob#handler column?
In my case, the problem was, certain columns in my (old, legacy) database had DEFAULT CHARSET=latin1 ... so, I needed to manually convert them to UTF8.
Specifically, the model that .delay was being called upon was UTF8, but the delayed_jobs table was latin1. So it was only when the app serialized the UTF8 model and attempted to insert it into the latin1 handler column of the delayed_jobs table, that the exception was raised. It's a little tricky.
Here is the core of the migration I wrote to convert the rando-latin1 tables to utf8:
%w( table1
table2
table3 ).each do |latin1_table_with_char_columns|
execute("ALTER TABLE #{latin1_table_with_char_columns} CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;")
end
Here is a good, related StackOverflow post which is more generally about converting db columns to UTF8: How to convert an entire MySQL database characterset and collation to UTF-8?
Best of luck!

How to sort right to left language data in mysql?

I have a MySQL database that is storing Persian data and information.
the information are names and I want to sort names by alphabet. but MySQL don't know Persian language, and some other right to left languages.
How can I sort them?
and my other problem is with phpmyAdmin, phpmyAdmin can't show Persian language data and show some character instead of that
About the first question; as fancyPants said, use the proper collation and you should be fine. Sorting is handled by collations and there is a utf8 Persian collation available.
About your second problem:
Almost certainly what is happening is that you're improperly storing the data. As Sid M said, knowing what you've tried and how your system is running would be a big help, but these questions almost always end up being misconfigured or poorly written software. phpMyAdmin and MySQL can deal just fine with multiple character sets. Presumably, you'll want to use utf8.
Set up your database and tables properly, then make sure your client application is configured properly (likely using SET NAMES 'UTF8' or mysql_set_charset('utf8'), but read the links for more detail than is worth including here).
See https://wiki.phpmyadmin.net/pma/Garbled_data and How to display UTF-8 characters in phpMyAdmin? for starters and SQL injection that gets around mysql_real_escape_string() for way more information than you probably wanted to learn :)

Accent insensitive search on a problematic database

I have a database that contains data in different languages. Some languages use accents (like áéíóú) and I need to search in this data as the accents doesn't exist (search for 'campeon' should return 'campeón' as a valir result).
The problem is that the tables in my database (utf8_unicode_ci) are not storing utf8 characteres. If you see the data through phpmyadmin the words with accents looks like this: campeón
After some researching, I've found (in a StackOverflow question) that the problem is related to the inexistence of a SET NAMES [charset]. In fact, I've made some testings and if I set names to utf8, everything works as expected.
Well, I have the solution, what's the problem? The problem is that the database is in production, so there are thousands of strings in the database. If I change the character set the client will use, all already existing string will become invalid. The question is: is there any way to:
perform accent-insensitive searches in a database that uses a wrong charset like mine?
transform safely the data in the tables to the appropriate charset?
continue working with mixed charsets (latin1 and utf8) in the database, assuming that latin1 data will not be accent-insensitive?
If anybody has experience in any of the solutions I propose or has a new one, I'll be very thankful if share.
The problem being that the data was inserted using the wrong connection encoding, you can fix it by
Exporting the data using the wrong connection encoding, just like you have used it thus far, followed by
Importing the data using the correct utf8 connection encoding.
That will fix the encoding problem, after which search will work as expected.
What if you create a copy of the table at the beginning of your session, alter the copy's charset, perform all your queries from that, and then drop the table at the end of your session? I don't know how practical this would be - depends on how often you need to perform these queries and how big the table is.

MySQL update error when special characters are used

I was wondering if anyone had come across this one before. I have a customer who uses special characters in their product description field. Updating to a MySQL database works fine if we use their HTML equivalents but it fails if the character itself is used (copied from either character map or Word I would assume).
Has anyone seen this behaviour before? The character in question in this case is ø - and we can't seem to do a replace on it (in ASP at least) as the character comes though to the SQL string as a "?".
Any suggestions much appreciated - thanks!
This suggests a mismatched character set between your database (connection) and actual data.
Most likely, you're using ISO-8859-1 on your site, but MySQL thinks it should be getting UTF-8.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html describes what to check and how to change it. The simplest way is probably to run the query "SET NAMES latin1" when connecting to the database (assuming that's the character set you need).
Being a fan of Unicode, I'd suggest switching over to UTF-8 entirely, but I realize that this is not always a feasible option.
Edit: #markokocic: Collation only dictates the sorting order. Although this should of course match your character set, it does not affect the range of characters that can be stored in a field.
Have you tried to set collation for the table to utf-8 or something non latin1/ascii.