Difference between database, table, column collation - mysql

I understand that collations are a set of rules for making comparisons over a character set. MySQL / MariaDB has table and database collations in addition to column collation. I was wondering what was the difference between a collation on these three (database, table and column).
Thanks.

MySQL's character sets and collations can be interpreted as a top-down list of prioritized items. The topmost is least priority and the bottommost is most priority.
Order of precedence with topmost being least precedence:
Server collation
Connection-specific collation
Database collation
Table collation
Column collation
Query collation (using CAST or CONVERT)
The server collation is set by the server, which is set either inside of my.cnf or when the server was built from source code. By default, this will usually be latin1 or utf8, depending on your platform.
The connection-specific collation is set by the client using a query like SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';. Most clients don't set a connection-specific collation, so the server will use its own default as explained above.
The database collation is set during database creation, or manually by updating it later. If you don't specify one, it will use the next higher-level collation, which would either be the connection-specific or the server collation.
The table collation is the same as the database collation, except if left blank, it will use the database as its default, then connection-specific, and then finally the server's collation.
The column collation uses the table's collation as its default, and if there is no collation set, it will then follow up the chain to find a collation to use, stopping at server if all of the others weren't set.
The query collation is specified in the query by using CAST or CONVERT, but otherwise will use the next available collation in the chain. There's no way to set this unless you use a function.
Please also refer to the manual page Character Set Support.

In short. When you set Server collation. to UTF-8. All Databases created without defining collation will inherit it from Server.
column iherits from table
table inherits from database
database inherits from server
However, you can overwrite default server at one of those points. Then everything will inherit from it.

Related

Connecting with wrong collation to a mysql server?

I have a Golang program that may connect to databases with different character sets or collation.
For example the default at the time of writing of the Golang MYSQL driver is utf8mb4_general_ci https://github.com/go-sql-driver/mysql#collation
However if I connect to a database configured like so:
CREATE DATABASE example character set utf8mb4 collate utf8mb4_unicode_ci;
Can I expect "bad things to happen"? Indexes not to work?
In most case, there are no problem. For example, column collation is used regardless connection collation when WHERE column=? is used.
See also: https://dev.mysql.com/doc/refman/5.6/en/charset-collation-coercibility.html
But I can't speak it's 100% safe. It's safe to use one collation all place.

default database collation not respected while importing

In my database, the collation was originally utf8_general_ci. However, I noticed that utf8_unicode_ci is necessary because of better sorting accuracy.
So I exported all database using phpmyadmin and checked that the word "COLLATION" does not appear in the exported sql file (except for only 2 times in one table where it is set to binary) so generally this script is collation agnostic and should not imply any specific collation when importing but use database default.
After dropping all tables, the database collation was changed to utf8_unicode_ci and then the import script was run from phpmyadmin. But as a result, all tables and all columns are shown again with utf8_general_ci collation (and sorting is incorrect). Why?? And what to do to change it?
P.S. The export/import script contains commented line at the beginning:
/*!40101 SET #OLD_COLLATION_CONNECTION=##COLLATION_CONNECTION */;
I don't know if it has any impact while importing, but after opening mysql console, the command show variables like 'collation_connection'shows COLLATION_CONNECTION as cp852_general_ci.
However, in phpmyadmin->variables the variable 'collation_connection' is set to utf8_general_ci. But there is no way to change it.
That happens because the database export is setting the character set on every table, and such a clause comes with a default collation that depends on the character set, not on the collation of your connection. utf8_general_ci is the default collation for utf8.
You'll have to convert your tables with something like ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; or edit your database export if this is affordable.
As for the MySQL console: the command-line client is pretty much broken on Windows. It'll never support, display or read Unicode, and you're getting a per-connection collation for that client that matches your Windows so-called OEM character set for your locale. This is a Windows misfeature that's difficult to workaround in portable software. PHPMyAdmin uses a web server and doesn't suffer from this problem. I advise you to use a UNIX-like operating system like GNU/Linux for any serious work in any case, not just for this reason. As an added benefit, MySQL, Apache and your whole application stack perform better on Linux.

Illegal mix of collations for operation 'concat'

I'm trying to execute this concat query in mysql
SELECT CONCAT(if(fName,fName,''),Name)
From Student
Error:
#1271 - Illegal mix of collations for operation 'concat'
This is due to collections difference, you can solve by converting the two strings or columns
to one collection say UTF8
CONCAT(CAST(fName AS CHAR CHARACTER SET utf8),CAST('' AS CHAR CHARACTER SET utf8))
This will solve :)
you can check more about casting in MySQL here MySQL Casting
The charsets and/or collations you use in your connection do not match the charset/collation in your table.
There are 4 solutions:
1- Change the charset in your connection:
//find out the charset used in your table.
SHOW TABLES LIKE 'student'
//set the server charset to match
SET NAMES 'charset_name' [COLLATE 'collation_name']
2- Change the charset used in your table to match the server charset:
//find out the charset used in the server
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
//Change the charset used in the table
ALTER TABLE student ......
3- Change the default charset settings and restart MySQL
Edit My.ini and replace the character_set_* options, so they match your tables.
4- Change the charset settings for your connection
You client can override the charset and collation settings.
If it does not option 1 or 3 should fix your issue, but if the connection overrides these settings, you need to check the connection-string and edit the charset/collation settings to match your database.
Some advice:
Find a charset. I recommend UTF8 and a collation: I recommend utf8_general_ci. And use those consistantly everywhere.
A concatenation can only work if the collation of all used values matches OR you use a collation that all collations are a subset of (from a logical standpoint).
If you want to concatenate text, each text should be the same collation. Take a look at the collation the database uses, then take a look at the collation that your connection uses:
show variables like '%coll%'
The collation_connection should match the collation of the table you try to concatenate. If it doesn't, the error message will arise.
You can then change the connection collation to match the one of the table.
Look like you have a miss use on the if statement there because it will resulting an undefined data type so the concat operation will fail as it different in data type. Try change the query by use ifnull instead.
Try this query instead:
SELECT concat(ifnull(fName,''),Name) From Student
see the demo here http://www.sqlize.com/kfy85j8f1e
for another reference read also http://forums.mysql.com/read.php?10,225982,225982#msg-225982
It can also be an error with your client library being too old for the mysql server.
We had a similar problem with LIKE and the character "ő" and using PHP MySQL library version 5.1.52 but MySQL server version 5.5.22.
The problem has gone away upon upgrading the client library.

SQL Server localization

Does SQL Server take over the localization from the server it's installed on? Or can you define the locale for each instance/database?
Which setting is responsible for having comma or period when a double is saved to the database?
SQL Server has a server collation and each database can either use the server collation or can be set to a different collation.
The format of the datatype will be taken from the Database collation. Providing that a collation has not been explicitly set for the column.
SQL Server Collations
Remember that if you use different collation for columns that you are trying to compare, you will need to use COLLATE and that will cause the argument to be a "non searchable argument", that is indexes will not be used to satisfy that statement.

SSIS - Data from MySql to MsSql some characters are?

I just transferred some data from MySql to MsSql (2K5) in a text field, some of my characters, such as apostrophes, are now ? (question mark) to me this indicates some sort of collation or character set error, right?
To be honest, I don't know which one should I be using
The MySql db currect charset is utf8_general_ci and in ms sql is SQL_Latin1_General_CP1_CI_AS .
I have tried changing the charset of the mysql table to latin1_swedish_ci, however this doesnt help
Thanks for the input
Have you tried changing the target (SQL Server) column data type to NVARCHAR?
The utf8_general_ci collation on the MySQL column indicates a Unicode data type. If the source is Unicode, so should be the target - for the easiest transition.
Collations themselves play a minor role here. They just affect comparison and sorting.
You might also need to check the SSIS type of the columns in your dataflow. Remember, the data type and character set is set at the connection manager on the source (and that may involve a conversion from the original native character set). Also, any operations like derived columns or conversions will have a character set which can be altered and will persist down that column's lineage in the data flow. At the end when it gets to the destination, there could be additional character set coercion/conversion.