I have my mySQL collation set to utf8_general_ci and despite the fact that my searches are diacritical-insensitive, ie LIKE 'test' returns 'tést', some searches which I would like to work fail, in particular LIKE 'host' will NOT return 'høst'.
Two questions: Is there a table that will show which characters are equivalent for a particular collation? Is there a way to set two characters as equivalents in mySQL as an override?
Thanks for the help.
To answer for first question you can referance collation-charts.org. It's kind of a pain because you will need to search each collation by hand, but it'll show you how they stack.
The relevant section in the MySQL manual can also be found here.
As far as your second question, I'm not sure if you can do an explicit override for a particular character; however you can create your own custom character set if you wish.
You can read about creating custom collations from the MySQL manual.
Related
We have an application running Laravel 5.6 on MySQL 5.6. We can't upgrade those yet. We're hoping to fix an issue with accepting "special characters" in a form, but without upgrading MySQL yet.
I've changed the collation and and character set of select relevant columns, and also tried updating the whole table thusly though other columns are still mysql's "utf8" (aka utf8mb3)... but the special characters are not persisting. We're getting mojibake ("garbled text that is the result of text being decoded using an unintended character encoding")—for example, when we should have 𝘈Ḇ𝖢𝕯٤ḞԍНǏ𝙅ƘԸⲘ𝙉০Ρ𝗤Ɍ𝓢ȚЦ𝒱Ѡ𝓧ƳȤѧᖯć𝗱ễ𝑓𝙜Ⴙ𝞲𝑗𝒌ļṃʼnо𝞎𝒒ᵲꜱ𝙩ừ𝗏ŵ𝒙𝒚ź, we instead get ????Ḇ????????٤ḞԍНǏ????ƘԸⲘ????০Ρ????Ɍ????ȚЦ????Ѡ.
The "special characters" (𝘈Ḇ𝖢𝕯٤Ḟ…) are being passed—intact—from the front-end to the backend, and then along in the backend—intact—as the stack sets the information as the value of a Doctrine Entity property—still intact. Then doctrine persists the info… and things get complicated as we move deeper into the lower levels of the ORM. I've yet to debug it that deep, but may have to because so far the database saves the mojibake, not the intact "special" characters.
I've also added character set and collation optional values to property declarations on the relevant Doctrine entity.
Laravel's database configuration has charset and collation settings too, but while I do need utf8mb4 for these few fields, the rest are still using utf8mb3, so I'm unsure about how setting the laravel config values might affect things.
I've tried various permutations of the settings, but not all yet. But, away for a couple of days I wanted to post this question in the hope of perhaps somebody else having some helpful advice or links to information. I've found helpful information about converting your application or database to utf8mb4, but nothing about a "partial conversion" like I'm trying here.
So, my question is this: Has anybody here come across this before? Trying to set just some fields to use utf8mb4 but without upgrading everything?
I don't have an easy-to-replicate example of the failure, but produce one in several days if need be.
Otherwise, thanks for reading.
As many people already had, I have a problem in MySQL with the encoding of my data.
More specifically, the collation of the table seems to be utf8_general_ci. The data inserted is inserted well, but when a select is done, some characters get translated badly:
Marie-Thérèse becomes Marie-Thérèse.
Is it possible to do a select and translate these characters back to the original value, or is it impossible? It's harder to change the original table in my case, so I'd rather solve it in my select query.
When using phpmyadmin (or the like) and looking at those entries, are those entries okay?
update: if not, the inserts are probably flawed already, and the connection from the insertion script must be adapted.
If so, then it's not technically MySQL's fault but the software connecting to it. See for example: UTF-8 all the way through . You have to set some parameters on/after opening the connection.
btw: The collation should be irrelevant. http://dev.mysql.com/doc/refman/5.7/en/charset-general.html
The gist is: a collation tells the you, how you have to order/compare strings, which is mainly important for special characters like äöü in German or àéô in French/... because their local/regional collation say, ä is - for ordering purposes - exactly like a (for example), in another collation, ä could be distinctly after a or even after z.
In the end it seems like the problem was with running it all through a cronbjob.
We run a script through a cronjob that generates the insert statements. Apparently, when running the script manually, everything goes well, but when running the same script through a cronjob, the data got messed up. We solved that by this article:http://www.logikdev.com/2010/02/02/locale-settings-for-your-cron-job/
We had to add a variable LANG in the etc/environment file.
I am getting an error like this:
COLLATION 'latin1_swedish_ci' is not valid for CHARACTER SET 'utf8'
Whenever I try to run a particular query. The problem in my case is, that I need this query to be able to run - without modification - against two separate databases, which have a different character collation (one is latin1, the other is utf8).
Since the strings I am trying to match are guaranteed to be basic letters (a-z), I was wondering if there was any way to force the comparison to work irrespective of the specific encoding?
I mean, a A is an A no matter how it is encoded - is there some way to tell mysql to compare the content of the string as letters rather than as whatever binary thing it does internally? I don't even understand why it can't auto-convert collations, since it is quite capable of doing it when explicitly told to.
We have issues with utf8-string comparisons in MySQL 5, regarding case and accents :
from what I gathered, what MySQL implements collations by considering that "groups of characters should be considered equal".
For example, in the utf8_unicode_ci collation, all the letters "EÉÈÊeéèê" are in the same box (together with other variants of "e").
So if you have a table containing ["video", "vidéo", "vidÉo", "vidÊo", "vidêo", "vidÈo", "vidèo", "vidEo"] (in a varchar column declared with ut8_general_ci collation) :
when asking MySQL to sort the rows according to this column, the sorting is random (MySQL does not enforce a sorting rule between "é" and "É" for example),
when asking MySQL to add a Unique Key on this column, it raises an error because it considers all the values are equal.
What setting can we fiddle with to fix these two points ?
PS : on a related note, I do not see any case sensitive collation for the utf8 charset. Did I miss something ?
[edit] I think my initial question still holds some interest, and I will leave it as is (and maybe one day get a positive answer).
It turned out, however, that our problems with string comparisons regarding accents was not linked to the collation of our text columns. It was linked to a configuration problem with the character_set_client parameter when talking with MySQL - which defaulted to latin1.
Here is the article that explained it all to us, and allowed us to fix the problem :
Getting out of MySQL character set hell
It is lengthy, but trust me, you need this length to explain both the problem and the fix.
Use collation that considers these characters to be distinct. Maybe utf8_bin (it's case sensitive, since it does binary comparison of characters)
http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html
I was wondering if anyone had come across this one before. I have a customer who uses special characters in their product description field. Updating to a MySQL database works fine if we use their HTML equivalents but it fails if the character itself is used (copied from either character map or Word I would assume).
Has anyone seen this behaviour before? The character in question in this case is ø - and we can't seem to do a replace on it (in ASP at least) as the character comes though to the SQL string as a "?".
Any suggestions much appreciated - thanks!
This suggests a mismatched character set between your database (connection) and actual data.
Most likely, you're using ISO-8859-1 on your site, but MySQL thinks it should be getting UTF-8.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html describes what to check and how to change it. The simplest way is probably to run the query "SET NAMES latin1" when connecting to the database (assuming that's the character set you need).
Being a fan of Unicode, I'd suggest switching over to UTF-8 entirely, but I realize that this is not always a feasible option.
Edit: #markokocic: Collation only dictates the sorting order. Although this should of course match your character set, it does not affect the range of characters that can be stored in a field.
Have you tried to set collation for the table to utf-8 or something non latin1/ascii.