How to handle multilingual MySQL queries? - mysql

I have a DB which stores usernames, passwords, and basic info. The 'info' however, only stores English characters.
How to store data in different languages, let's say English, French, Russian, Chinese, Japanese, Arabic, etc.? I realized that default collation doesn't support that.
What is the best solution, and how do you guys get around it?

Change the default collation of the whole database and also of the table(s) to utf8_general_ci. There is no reason to suffer (with this kind of free form data).
ALTER DATABASE db CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE tbl CONVERT TO utf8
ALTER TABLE tbl CHARACTER SET utf8 COLLATE utf8_general_ci;
Read about a few gotchas at the end of this page.

Related

UTF8 letters issue in latin mysql database

I have latin1 MySQL database and it is too late to convert it to utf8.
When I search for text that contains (French letter for example), I get same result with English letter.
Example: when I search for "tést", I get "test" from MySQL.
How can I avoid this?
Thank you.
The COLLATION latin1_general_ci will find test when searching for tést. Since that is probably what your collation is (let's see SHOW CREATE TABLE), that is no efficient way to avoid getting test.
If all you have is Western European characters, utf8 is not a critical goal.
To change the table foo to that collation:
ALTER TABLE foo CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin;
Since it will involve copying the entire table, it will take some time.

Relationship between database's charset, table's charset and columns' charset? Is diffrent charsets lead to any performance issues?

I am developing a website by using ASP.net and my DB is MYSQL. In there users can submit articles. This site goes internationally so I dont want to restrict the language only to English.
So I decided few things. Please guide me If I made the wrong choice.
1) I choose utf8mb4 as database charset. Because it is an improved version of UTF8 for store further characters. Am I made the right choice? I mean I have only few tables where need to use utf8mb4. So Shall I use Latin1 as Database charset?
2) I dont have an idea which collation to use for above charset. I decided to use utf8mb4 swedish_ci. Or should I use general Ci or any other?
3) In my tables most of tables not needed utf8mb4 charset. Latin 1 swedesh will do the work. So can I maintain selected tables under specific charset and collation even DB is in another Charset and collation?
4) Can I use utf8mb4 charset for a specific column in a table which have Latin1 swedesh as charset?
If those can do what is the relationship between database charset, table charset and column charsets?
Is different charsets lead to any performance issues?
Thank you very much.
The database charset is inherited by the table, unless you override it. (I recommend being specific at the table level.)
The table charset is inherited by the columns in the table. Since one usually has only one charset, this inheritance is fine. Also, it is pretty clear when you do SHOW CREATE TABLE what each column is set to -- without having to look at the database or system.
Go international -- use utf8 or utf8mb4. I agree that utf8mb4 is a better choice, especially for Chinese and some emoticons.
character_set_% -- Only _client, _connection, and _results are important. And these are the three that are set by SET NAMES utf8mb4. Leave the rest alone.
The default collation for utf8mb4 is utf8mb4_general_ci, which is possibly a good choice if you have multiple languages. The other choice is utf8mb4_unicode_ci . I talk more about "combining diacriticals" in http://mysql.rjweb.org/doc.php/charcoll#combining_diacriticals . This section gives examples of where those two collations differ: http://mysql.rjweb.org/doc.php/charcoll#utf8_collations_examples
See also the "Best Practice" section.
latin1 is smaller than utf8 for Western European text. MySQL will do the proper conversions when needed, so that is not a problem. But I prefer not to confuse the programmer by mixing character sets. Keep in mind that converting an existing table column from latin1 to utf8 takes some effort, possible downtime, and maybe risk.
4) Can I use utf8mb4 charset for a specific column in a table which have Latin1 swedesh as charset?
Yes. Each column (but not each row) can have a different character set and/or collation.
The existence of different charsets is not a performance, per se. What could bite you is WHERE col1 = col2 (and other cases) when the two columns have a different character set and/or collation. MySQL will abandon an otherwise perfectly good index if it sees a difference that is not easy to handle.

Html Text-area: problems with accented letters

When in in a text-area I write words with acceted letters ....the application store the words in mysql with some errors
E.g. if i write può in my sql I have può
How can i solve it?
To change an existing table to use the UTF-8 charset:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
To set the default charset of the database to UTF8 for tables you will create in the future:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
You can use either utf8_general_ci or utf8_unicode_ci. It is explained at What's the difference between utf8_general_ci and utf8_unicode_ci that there is a difference between them in the speed and accuracy of the sorting, with utf8_unicode_ci being more accurate and the performance gain of using utf8_general_ci being very minimal.
(Also, be aware, when you are doing queries in the mysql console in the command prompt, it will not display as UTF-8 even when it is stored properly. Its a limitation of the command prompt.)

Store text in database in some languages

Can anyone tell me how can i store for example russian words in database? Please write an example. I know that i must use unicode, but don't know how use with mySql.
Thanks all for response.
You should create a database with utf8 character set and russian collation.
Find the russian collation:
SHOW COLLATION LIKE '%russian%'
Then create the database:
CREATE DATABASE db_name CHARACTER SET utf8 COLLATE latin1_russian_ci;
You can also ALTER it if you already have one.
Read the manual here https://dev.mysql.com/doc/refman/5.7/en/charset-database.html

MySQL Collation: latin1_swedish_ci Vs utf8_general_ci

What should I set for Collation when creating tables in MySQL:
latin1_swedish_ci or utf8_general_ci
What is Collation anyway?
I have been using latin1_swedish_ci, would it cause any problems?
Whatever you do, don't try to use the default swedish_ci collation with utf8 (instead of latin) in mysql, or you'll get an error. Collations must be paired with the right charset to work. This SQL will fail because of the mismatch in charset and collation:
CREATE TABLE IF NOT EXISTS `db`.`events_user_preference` (
`user_id` INT(10) UNSIGNED NOT NULL ,
`email` VARCHAR(40) NULL DEFAULT NULL ,
PRIMARY KEY (`user_id`) )
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = latin1_swedish_ci
And #Blaisorblade pointed out that the way to fix this is to use the character set that goes with the swedish collation:
DEFAULT CHARACTER SET = utf8_swedish_ci
The SQL for the cal (calendar) module for the Yii php framework had something similar to the above erroneous code. Hopefully they've fixed it by now.
You can read about character sets and collations as of MySQL 5.5 here:
Character Sets and Collations in General
Character Sets and Collations in MySQL
The collations support is necessary to support all the many written languages of the world. For instance in my language (Danish) we have a special character 'æ'. It sounds like Swedish, German, Hungarian (and more) 'ä' . That character also appears in Danish with words imported form one of those languages. Due to collations' support we can have both printed correctly and and the same sorted (ORDER BY ...) as being identical. Without collations support that was not possible.
Swedish collations is the MySQL default for latin charsets. It works fine with English. English is so easy - it works with everything, because it has no special characters, accents etc. But if you have another language that you use often (for instance Spanish) you could change collation to a Spanish one, so sorting of Spanish Strings would be correct according to Spanish language rules.
A very special example of a collation is one of the German ones. It was created to allowed for sorting like in German phone books. German phone books don't follow general rules of german language!
You can create your own collation if you like. Collations can be compiled or text-format.
In Wamp Server 2.5 you can change the collation by going into PHPAdmin, selecting the database you need to change. This will give you another set of tabs. Select the Tab called Operations. In that tab will be section called collation, pick the one you want in the drop-down and select go.
Try these:
<?php
echo htmlspecialchars($string);
echo htmlentities($string);
?>
You can see more info from http://php.net/manual/en/function.htmlspecialchars.php. :D
Worked for me! No more diamonds :)