Erlang Emysql encoding difference between prepared and regular Query - mysql

I Have wrote a question which got a right answer here about emysql encoding.
The answer pinpoint another question...
I'm trying to store iPhone emojis into a database...
When I do :
Query = io_lib:format("UPDATE Users SET c=\"~s\" WHERE id=~B", [C, Id]),
emysql:execute(mydb, Query).
Everything works fine...
But with:
emysql:prepare(update_c, <<"UPDATE Users SET c=? WHERE id=?">>),
emysql:execute(mydb, update_c, [C, Id]).
I'm retrieving Mojibake. EDITED TO USE THE CORRECT TERM
I'm connecting with :
emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", latin1)
Unfortunately, I cannot use utf8 because of the previous software that used the database and stored emoji's that way, If I do use utf8, it will work with the new system, but not with rows inserted by the old one.
EDIT:
I really would really like to use prepared statement, that would prevent SQL injection effectively.

Edit: should be fixed in 253b7f94f9b04526e6868d7b693e6e9ee41de374. Thanks for feedback.
https://github.com/Eonblast/Emysql/commit/253b7f94f9b04526e6868d7b693e6e9ee41de374
I believe it's an error in Emysql and I think I fixed it. Still working out the unit tests so it all makes sense. I'll let you know when it's posted to github.
I opened an issue for this: https://github.com/Eonblast/Emysql/issues/24
Essentially, you are tricking the driver and the database because you open the connection with latin-1 but the database is utf-8. Then you trip over the automatic conversion.
Still, I think you are right that the driver should respect that you set the connection to latin-1 and not do the magic of automatic conversion to utf-8. If you read issue #14 at Eonblast/Emysql at github you'll find I always suspected automatic conversion was a bad idea.
However, just from the fact that the unit tests for the conversions are now blowing up by the factor of four (and pose some rather uninteresting but mind boggling fringe issues I can't get my head around) I think tricking the database the way you do is likewise a bad idea. If you can, you should clean this up rather than rely on the mechanics in-between to hold. There are multiple levels in MySQL where conversions occur. As you know you can set the connection, the database, also the table to a character set. It's a great way to produce bugs. Can you describe why you could not? Because you have no control and must act blind to encoding? I'd like to know if there is a real case where you can't live without this hack.
Regardless, your complaint about the setting of the connection to latin-1 probably showed the way to eliminate all or most of the guessing in the character conversions in Emysql. That's very much appreciated and I hope I'll have a solution for you later today.
Henning

Just convert you table to UTF-8:
ALTER TABLE Users CONVERT TO CHARACTER SET utf8;
Then you can use utf-8 with new data and the old will have been converted to UTF-8 aswell.

Related

Possible to have only some fields use utf8mb4 with Laravel, Doctrine, and MySQL?

We have an application running Laravel 5.6 on MySQL 5.6. We can't upgrade those yet. We're hoping to fix an issue with accepting "special characters" in a form, but without upgrading MySQL yet.
I've changed the collation and and character set of select relevant columns, and also tried updating the whole table thusly though other columns are still mysql's "utf8" (aka utf8mb3)... but the special characters are not persisting. We're getting mojibake ("garbled text that is the result of text being decoded using an unintended character encoding")—for example, when we should have 𝘈Ḇ𝖢𝕯٤ḞԍНǏ𝙅ƘԸⲘ𝙉০Ρ𝗤Ɍ𝓢ȚЦ𝒱Ѡ𝓧ƳȤѧᖯć𝗱ễ𝑓𝙜Ⴙ𝞲𝑗𝒌ļṃʼnо𝞎𝒒ᵲꜱ𝙩ừ𝗏ŵ𝒙𝒚ź, we instead get ????Ḇ????????٤ḞԍНǏ????ƘԸⲘ????০Ρ????Ɍ????ȚЦ????Ѡ.
The "special characters" (𝘈Ḇ𝖢𝕯٤Ḟ…) are being passed—intact—from the front-end to the backend, and then along in the backend—intact—as the stack sets the information as the value of a Doctrine Entity property—still intact. Then doctrine persists the info… and things get complicated as we move deeper into the lower levels of the ORM. I've yet to debug it that deep, but may have to because so far the database saves the mojibake, not the intact "special" characters.
I've also added character set and collation optional values to property declarations on the relevant Doctrine entity.
Laravel's database configuration has charset and collation settings too, but while I do need utf8mb4 for these few fields, the rest are still using utf8mb3, so I'm unsure about how setting the laravel config values might affect things.
I've tried various permutations of the settings, but not all yet. But, away for a couple of days I wanted to post this question in the hope of perhaps somebody else having some helpful advice or links to information. I've found helpful information about converting your application or database to utf8mb4, but nothing about a "partial conversion" like I'm trying here.
So, my question is this: Has anybody here come across this before? Trying to set just some fields to use utf8mb4 but without upgrading everything?
I don't have an easy-to-replicate example of the failure, but produce one in several days if need be.
Otherwise, thanks for reading.

MySQL encoding weird characters

As many people already had, I have a problem in MySQL with the encoding of my data.
More specifically, the collation of the table seems to be utf8_general_ci. The data inserted is inserted well, but when a select is done, some characters get translated badly:
Marie-Thérèse becomes Marie-Thérèse.
Is it possible to do a select and translate these characters back to the original value, or is it impossible? It's harder to change the original table in my case, so I'd rather solve it in my select query.
When using phpmyadmin (or the like) and looking at those entries, are those entries okay?
update: if not, the inserts are probably flawed already, and the connection from the insertion script must be adapted.
If so, then it's not technically MySQL's fault but the software connecting to it. See for example: UTF-8 all the way through . You have to set some parameters on/after opening the connection.
btw: The collation should be irrelevant. http://dev.mysql.com/doc/refman/5.7/en/charset-general.html
The gist is: a collation tells the you, how you have to order/compare strings, which is mainly important for special characters like äöü in German or àéô in French/... because their local/regional collation say, ä is - for ordering purposes - exactly like a (for example), in another collation, ä could be distinctly after a or even after z.
In the end it seems like the problem was with running it all through a cronbjob.
We run a script through a cronjob that generates the insert statements. Apparently, when running the script manually, everything goes well, but when running the same script through a cronjob, the data got messed up. We solved that by this article:http://www.logikdev.com/2010/02/02/locale-settings-for-your-cron-job/
We had to add a variable LANG in the etc/environment file.

Liferay does not display UTF8 characters anymore

I just made a database restore from mysql workbench and found out that liferay does not display UTF-8 spec characters e.g ÅÄÖ, these letters are instead displayed as a question mark.
I wonder if anyone knows the solution for this issue? Do I have to specify a charset while importing the sql files? And if so how do I do that in mysql workbench?
To be honest I have no idea if the mysql restore has a direct effect on what happened, I'm just describing what I did before the issue occurred.
If you do a restore to a new database, make sure that this database defaults the character set to UTF-8:
create database lportal character set utf8;
Then import your data into that table.
Let me also use this opportunity to link my favourite site to generate great UTF-8 testdata: http://www.fliptitle.com - great if you need testdata for people who know only ASCII languages but still need immediate feedback on correct encoding with data they're able to interpret. You don't seem to be one of them, but I guess others that are in this group might stumble upon this later.

Reference for converting UTF8 in a MySql DB

Wonder if anyone can help.
I recently had an issue with UTF8 in the Database and pages of a bespoke CMS I inherited. Going forward that's all sorted now, the code and DB has been changed to cater and properly convert, however I have an issue in that existing entries in the DB are obvioulsy sat there in the old character format and I need to convert all those.
Eg Ķ, ī
I was going to run an replace in the mysql DB to replace all these, but what I could do with is knowing what all these weird characters translate to eg ó.
Can anyone recommend a good table/reference to look at ? I have been searching but can't seem to come up with the right thing.
If I understand right these are two byte UTF8 characters.
Thanks
Try running these values in utf8_decode.
It looks like they've been valid, then utf8_encode'd.
If that's the case, try running a loop and update these rows.

Is it OK to fix a character encoding error using SQL REPLACE?

I have a (Wordpress) blog and some of my older posts have a character encoding problem where £ displays as £ (i.e. a pound sign prepended with a capital 'A' with a hat on).
The problem is at the DB level, so I was going to run the following SQL statement:
update wp_posts set post_content = replace(post_content, ‘£’, ‘£’);
Would this be foolish?
Background info (not required to read):
How did this problem happen? I don't know. The blog has been though various updates (including from Wordpress Version 2.1.3 when the default table CHARSET changed from latin1 to utf8) and been migrated to and from various machines and I guess at some point Wordpress must have written UTF-8 encoded characters into the Database that had a CHARSET of latin1, or vice-versa. I know I should have been more careful (yes I have read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)).
How have I ensured that this doesn't happen again? I have made sure my encodings are consistent. All MySQL tables use CHARSET utf-8 and the HEAD section of blog pages set <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
It should be ok. The best thing is the following:
Make a dump of your blog db
Load it to another db
Perform the replace on the temporary db
Check!
If all goes well, perform it on the production db as well.
Well, I would say that it would probably be the best "solution" to the problem.
As the data has been stored using the wrong encoding somewhere along the line, the original data is lost and there is no real solution. You just have to try to salvage what you can from the corrupt data that you have.
If it's only isolated to a single character, you are lucky. There may be byte codes that didn't translate into any available character, so if that happened anywhere you wouldn't have a character combination that is possible to identify, you would just have a character replaced by another or a missing character. It would only be possible to spot that manually.
Sure you have data in one encoding and the table with another one. You can fix this within mysql.
Check here
Don't do that!
Use a trigger on update/insert if you really need to.
EDIT: hmm, after reading your situation, I would suggest making a backup copy of the DB and trying what you said. I think it would work, as long as you're not planning to ever do it again (which seems to be the case)