I've always been surprised that MySQL and related tools tend to specify both CHARACTER SET and COLLATION in CREATE TABLE AND ALTER TABLE statements:
SHOW CREATE TABLE test;
CREATE TABLE ...
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
When I select a collation for a column in MySQL Workbench, I get:
ALTER TABLE ...
CHARACTER SET 'latin1' COLLATE 'latin1_general_ci' ...;
I've always supposed that specifying the character set and the collation was redundant, as the collation implies the character set.
Am I wrong?
When I try to mix a collation with another charset, I get an error:
CREATE TABLE ... DEFAULT CHARACTER SET utf8 collate latin1_bin;
COLLATION 'latin1_bin' is not valid for CHARACTER SET 'utf8'
Is it safe to only ever specify the collation?
If so, why do all these tools systematically include the charset in the statement, too?
You are not required to specify character set when creating table, it's automatically set on collate as it described in the MySQL Reference Manual:
MySQL chooses the table character set and collation in the following
manner:
If both CHARACTER SET charset_name and COLLATE collation_name are specified, character set charset_name and collation collation_name are used.
If CHARACTER SET charset_name is specified without COLLATE, character set charset_name and its default collation are used. To see the default collation for each character set, use the SHOW CHARACTER SET statement or query the INFORMATION_SCHEMA CHARACTER_SETS table.
If COLLATE collation_name is specified without CHARACTER SET, the character set associated with collation_name and collation collation_name are used.
Otherwise (neither CHARACTER SET nor COLLATE is specified), the database character set and collation are used.
The table character set and collation are used as default values
for column definitions if the column character set and collation are
not specified in individual column definitions. The table character
set and collation are MySQL extensions; there are no such things
in standard SQL.
So you can create a table using this query:
CREATE TABLE `tabletest` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 COLLATE=latin1_general_ci;
This query will create a table with CHARSET=latin1 COLLATE=latin1_general_ci, so it's safe to specify COLLATE only.
As for why are there both CHARSET and COLLATE, please read the following:
What is the difference between collation and character set?
Related
i tried to insert arabic word in MySql and it got inserted and when i displayed it it was there
but there was a warning 3720
for context here is the table:
create table SIGHTS ( S_no int not null PRIMARY KEY, S_name Nvarchar(30) not null);
and here is the inserted value:
insert into SIGHTS values (1,(N'كهف الهكبة'));
so? is it safe to ignore the warning or what should i do with it ?
it seems like you have used the default CHARACTER SET "utf8mb4" and the COLLATE utf8mb4_0900_ai_ci when you create the database. for more information, check this link out
when you create the database using the default "create" statement, which is
CREATE DATABASE _dbase;
it will execute the following statement adding the default CHARACTER SET
CREATE DATABASE _dbase CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
change the CHARACTER SET to "utf8" and the COLLATE to "utf8_general_ci" or "utf8_unicode_ci", you need to use the following statements :
CREATE DATABASE _dbase CHARACTER SET utf8 COLLATE utf8_general_ci;
or
CREATE DATABASE _dbase CHARACTER SET utf8 COLLATE utf8_unicode_ci;
When running SHOW CREATE TABLE `my_table`;, I notice that COLLATE utf8mb4_unicode_ci is shown for every char, varchar, and text column in the table. This seems a bit redundant since the collation is already declared in the table_option portion of the create statement.
mysql> SHOW CREATE TABLE `my_table`;
| Table | Create Table
| my_table | CREATE TABLE `my_table` (
...
`char_col_1` char(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`varchar_col_1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`varchar_col_2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`varchar_col_3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`text_col_1` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
...
) ENGINE=InnoDB AUTO_INCREMENT=1816178 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
This behavior is noticeable in both MySQL 5.7 and MySQL 8.0 and therefore most likely in other versions as well.
Is this behavior normal and acceptable, or is it a symptom of something that is misconfigured either with the table, database, or MySQL instance?
On the other hand, since collation can be individually set for any specific column, perhaps it is better to explicitly display the collation for every column to avoid any ambiguity or assumptions, even in cases where the collation of the column matches the collation of the table?
You have touched only the tip of the iceberg.
I think the settings on the table are just defaults for columns that are defined without charset or collate.
Ditto for ALTER TABLE ADD COLUMN -- will inherit from the table defaults.
I think that the column settings are put into the information_schema.COLUMNS table and that won't change with an ALTER TABLE .. MODIFY COLUMN ..
Similarly, the table charset and collation inherit from the database definition, and will be frozen as the table is defined.
About defaults:
The old default charset was latin1
The current default is utf8mb4; this is unlikely to ever change in the future.
Every collation applies to exactly one charset, and the charset name is the beginning of the collation name.
Each charset has exactly one "default" collation: latin1_swedish_ci, utf8_unicode_ci, utf8mb4_0900_ai_ci, etc.
That default collation (for a given charset) has rarely, if ever, changed. Perhaps the only change has been for utf8mb4 between 5.7 and 8.0??
(The more I experiment, the less certain I am about all this.)
Best practice: Always explicitly set CHARSET and COLLATE for each string column.
Secondary considerations:
Use utf8mb4, if available, for most string (VARCHAR / TEXT).
Use the latest available collation (Unicode keeps improving it); currently utf8mb4_0900_ai_ci.
Use ascii for things that are clearly only ascii -- country-code, postal_code, hex, etc. Mostly these can use CHAR(..)
Use ascii_general_ci or ascii_bin, depending on whether you need case folding.
Yes, it is redundant to have CHARACTER SET and COLLATION the same in a table definition and a column definition.
Having explicit column definitions means that anyone changes the table definitions of CHARACTER SET or COLLATION the column will remain identical.
Here is quetions about adding comment to column for MySQL. Can this comment be utf-8? Also what encoding MySQL uses for these columns by default?
Default character set and collation is set when the database is created
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
You can modify character set on a specific column like this
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
I'm trying to convert some mysql tables from latin1 to utf8. I'm using the following command, which seems to mostly work.
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
However, on one table I get an error about a duplicate key entry. This is caused by a unique index on a "name" field. It seems when converting to utf8, any "special" characters are indexed as their straight english equivalent. For example, there is already a record with a name field value of "Dru". When converting to utf8, a record with "Drü" is considered a duplicate. The same with "Patrick" and "Påtrìçk".
Here is how to reproduce the issue:
CREATE TABLE `example` ( `name` char(20) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO example (name) VALUES ('Drü'),('Dru'),('Patrick'),('Påtrìçk');
ALTER TABLE example convert to character set utf8 collate utf8_general_ci;
ERROR 1062 (23000): Duplicate entry 'Dru' for key 1
The reason why the strings 'Drü' and 'Dru' evaluate as the same is that in the utf8_general_ci collation, they count as "the same". The purpose of a collation for a character set is to provide a set of rules as to when strings are the same, when one sorts before the other, and so on.
If you want a different set of comparison rules, you need to choose a different collation. You can see the available collations for the utf8 character set by issuing SHOW COLLATION LIKE 'utf8%'. There are a bunch of collations intended for text that is mostly in a specific language; there is also the utf8_bin collation which compares all strings as binary strings (i.e. compares them as sequences of 0s and 1s).
UTF8_GENERAL_CI is accent insensitive.
Use UTF8_BIN or a language-specific collation.
In /etc/my.cnf the following has been added
character-set-server=utf8
collation-server=utf8_general_ci
But for the database and tables created before adding the above how to convert the database and tables to utf8 with collation settings
Well, the database character set and table character set are just defaults (they don't affect anything directly). You'd need to modify each column to the proper charset. PHPMyAdmin will do this for you (just edit the column, then change the character set). If you want to do raw SQL, you'll need to know the column definition (SHOW CREATE TABLE foo will show you the definition). Then, you can use ALTER TABLE to change the definition.
To change the default charset for a table:
ALTER TABLE `tablename` DEFAULT CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
To change the charset of a column with the definition `foo VARCHAR(128) CHARACTER SET 'foo' COLLATE 'foo'``:
ALTER TABLE `tablename` MODIFY
`foo` VARCHAR(128) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
https://serverfault.com/questions/65043/alter-charset-and-collation-in-all-columns-in-all-tables-in-mysql
And:
http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/