MySQL Collation: latin1_swedish_ci Vs utf8_general_ci - mysql

What should I set for Collation when creating tables in MySQL:
latin1_swedish_ci or utf8_general_ci
What is Collation anyway?
I have been using latin1_swedish_ci, would it cause any problems?

Whatever you do, don't try to use the default swedish_ci collation with utf8 (instead of latin) in mysql, or you'll get an error. Collations must be paired with the right charset to work. This SQL will fail because of the mismatch in charset and collation:
CREATE TABLE IF NOT EXISTS `db`.`events_user_preference` (
`user_id` INT(10) UNSIGNED NOT NULL ,
`email` VARCHAR(40) NULL DEFAULT NULL ,
PRIMARY KEY (`user_id`) )
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = latin1_swedish_ci
And #Blaisorblade pointed out that the way to fix this is to use the character set that goes with the swedish collation:
DEFAULT CHARACTER SET = utf8_swedish_ci
The SQL for the cal (calendar) module for the Yii php framework had something similar to the above erroneous code. Hopefully they've fixed it by now.

You can read about character sets and collations as of MySQL 5.5 here:
Character Sets and Collations in General
Character Sets and Collations in MySQL
The collations support is necessary to support all the many written languages of the world. For instance in my language (Danish) we have a special character 'æ'. It sounds like Swedish, German, Hungarian (and more) 'ä' . That character also appears in Danish with words imported form one of those languages. Due to collations' support we can have both printed correctly and and the same sorted (ORDER BY ...) as being identical. Without collations support that was not possible.
Swedish collations is the MySQL default for latin charsets. It works fine with English. English is so easy - it works with everything, because it has no special characters, accents etc. But if you have another language that you use often (for instance Spanish) you could change collation to a Spanish one, so sorting of Spanish Strings would be correct according to Spanish language rules.
A very special example of a collation is one of the German ones. It was created to allowed for sorting like in German phone books. German phone books don't follow general rules of german language!
You can create your own collation if you like. Collations can be compiled or text-format.

In Wamp Server 2.5 you can change the collation by going into PHPAdmin, selecting the database you need to change. This will give you another set of tabs. Select the Tab called Operations. In that tab will be section called collation, pick the one you want in the drop-down and select go.

Try these:
<?php
echo htmlspecialchars($string);
echo htmlentities($string);
?>
You can see more info from http://php.net/manual/en/function.htmlspecialchars.php. :D
Worked for me! No more diamonds :)

Related

What COLLATE should i set to use all kind of possible languages?

I have a column called username, i want the user to be able to insert text in japanese, roman, arabic, korean, and everything that is possible, including special chars [https://en.wiktionary.org/wiki/Index:All_languages], what COLLATE should i set on my database and tables?
I'm using utf_general_ci, i'm new so i don't know if this is the best COLLATE for my needs. I need to choose the right COLLATE to avoid sql error, because i will not use preg_replace or a function to replace special chars, i will only use prepared statement to avoid SLQ injection and protect by database.
First choice (MySQL 8.0): utf8mb4_0900_ai_ci
Second choice (as of 5.6): utf8mb4_unicode_520_ci
Third choice (5.5+): utf8mb4_unicode_ci
Before 5.5, you can't handle all of Chinese, nor Emoji: utf8_unicode_ci
The numbers refer to Unicode standards 9.0, 5.20, and (no number) 4.0.
No collation is good for sorting all languages at the same time. Spanish, German, Turkish, etc, have quirks that are incompatible. The collations above are the 'best' general purpose ones available.
utf8mb4 handles all characters yet specified by Unicode (including Cherokee, Klingon, Cuneiform, Byzantine, etc.)
If Portuguese is the focus:
See https://pt.stackoverflow.com/ and MySQL collation for Portugese .
Study this for 8.0 or this for pre 8.0 to see which utf8/utf8mb4 collation comes closest to sorting Portuguese 'correctly'. Perhaps utf8mb4_danish_ci or utf8mb4_de_pb_0900_ai_ci would be best.
(Else go with the 'choices' listed above.)
If you are using MySQL 5.5.3 or higher, I would recommend UTF-8 character encoding utf8mb4_unicode_ci . AFAIK it supports most, if not all languages, and implements the Unicode standard for sorting and comparison. As a second choice, have a look at utf8mb4_general_ci, which may be faster but also less accurate.
See this excellent SO post for (many) more details, or check out the official MySQL doc.
Below 5.5.3, utf8_unicode_ci is your friend.
COLLATION refers to ordering (as in comparisons in WHERE and ORDER BY); you should really ask about CHARACTER SET:
Pre-5.5.3: utf8 (aka utf8mb3) handles all languages, except for a few Chinese characters and Emoji.
5.5.3 forward: utf8mb4 - Handles everything. Outside of MySQL, it is spelled "UTF-8".

Which collation to use so that `ş` and `s` are treated as unique values?

The issue is that ş and s are interpreted by MySQL as identical values.
I'm new to MySQL, so I have no idea which collations would view them as unique.
The collations that I've tried using which don't work are:
utf8_general_ci
utf8_unicode_520_ci
utf8mb4_unicode_ci
utf8mb4_unicode_520_ci
Does anybody know which collation to use?
P.S. I also really need the collation to interpret emojis and other non-Latin characters, and, to my knowledge of MySQL and collations, the only collation able to do this is unicode?
utf8_turkish_ci and utf8_romanian_ci -- as shown in http://mysql.rjweb.org/utf8_collations.html
(Plus, of course, utf8_bin.)
For your added question: You are looking for a "character set" (not a "collation") that can represent Emoji and other non-Latin characters -- UTF-8 is the one to use. In MySQL, it is utf8mb4. The "collations" that are associated with that are named utf8mb4_.... Collations control ordering and equality, as indicated in the first part of your question about s and ş.
MySQL's CHARACTER SET utf8 is a subset of utf8mb4. Either can handle all the "letters" in the world. But only utf8mb4 can handle Emoji and some Chinese characters.

How to solve collation error in Mysql?

I migrated a Microsoft SQL database to Mysql and I hat some collation problems in the rows in Mysql, I tried to change the collation but the erros still there. The data is goning to be in a Wordpress, so I tried the Database Collation Fix pluguin but doesn't work.
The table afected is wp_posts in post_title and post_content. All the characters that contain an accent or 'ñ' in Spanish are replaceed by a random character.
I already tried with utf8_spanish_ci and utf8mb4_spanish_ci.
Any suggestions?
Microsoft SQL database collation: Modern_Spanish_CI_AI
Mysql database collation: UTF8 Defaul Collation
Thanks
I don't know if this helps you, but the collating orders in MySQL's Modern Spanish utf8_spanish_ci and/or utf8mb4_spanish_ci collations are different from those in utf8_unicode_ci and/or utf8mb4_unicode_ci.
Modern Spanish collation handles N and Ñ as separate characters, with Ñ coming directly after N. Generic latin-language collation treats them as variants of the same character. So, if you want Spanish collation -- that is, if you're dealing with lots of proper names and so forth -- you'll need to use the Spanish collation for this data.
If ñ turned into ?, you have one type of problem.
If ñ turned into ñ, you have "Mojibake".
If ñ turned into �, it's yet another problem.
Please be more specific, since the solutions are quite different.
Trouble with utf8 characters; what I see is not what I stored provides information on the common issues.
The "Collation" is not relevant to ñ being replaced by a 'random character'. Only the CHARACTER SET is relevant.
When you get into comparing or sorting strings, then the COLLATION becomes relevant. I think the only difference between ..._spanish_ci and ...spanish2_ci is the handling of ch and ll.

How to handle multilingual MySQL queries?

I have a DB which stores usernames, passwords, and basic info. The 'info' however, only stores English characters.
How to store data in different languages, let's say English, French, Russian, Chinese, Japanese, Arabic, etc.? I realized that default collation doesn't support that.
What is the best solution, and how do you guys get around it?
Change the default collation of the whole database and also of the table(s) to utf8_general_ci. There is no reason to suffer (with this kind of free form data).
ALTER DATABASE db CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE tbl CONVERT TO utf8
ALTER TABLE tbl CHARACTER SET utf8 COLLATE utf8_general_ci;
Read about a few gotchas at the end of this page.

What is MySQL Collation, how to use it in practice?

Let's say I want to make a search engine in some weird languages in 4 languages:
English
Swedish
Hebrew
Arabic
How would I set the collations in MySQL ?
A collation defines:
The character set used to store the characters (UTF8, ISO8859, etc.)
The sorting and presentation rules
If you want to have different languages (where they cannot be sanely represented in the same collation, as you mention) you can have columns with different collations.
Of course you can set collation at database and table levels too, and even set collation to a string literal.
If you can find a single collation that handles all the languages you're interested in, that's best.
The collation determines how MySQL compares strings.
A list of all character sets and collations can be found with:
SHOW CHARACTER SET;
SHOW COLLATION;
To change the collation for a table use:
ALTER TABLE `my_table` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
http://dev.mysql.com/doc/refman/5.0/en/charset.html