I want to store in unique column polish and german signs.
When i alter database:
alter database osa character set utf8 collate utf8_general_ci;
I have a problem with german signs.
sql> insert into company(uuid, name) VALUE ("1","IDE")
[2016-11-27 10:37:35] 1 row affected in 13ms
sql> insert into company(uuid, name) VALUE ("2","IDĘ")
[2016-11-27 10:37:37] 1 row affected in 9ms
sql> insert into company(uuid, name) VALUE ("3","Schuring")
[2016-11-27 10:37:38] 1 row affected in 13ms
sql> insert into company(uuid, name) VALUE ("4","Schüring")
[2016-11-27 10:37:39] [23000][1062] Duplicate entry 'Schüring' for key 'UK_niu8sfil2gxywcru9ah3r4ec5'
Which collate I have to use?
Edit:
Also not works for utf8_unicode_ci
The _ci in the COLLATION indicates "character insensitive". Unfortunately, it also means "accent insensitive". So to get E and Ę to be treated differently, you need a _bin collation -- either utf8_bin or utf8mb4_bin.
mb4 is needed for Emoji and Chinese, plus some obscure things.
Replace all occurrences of utf8_general_ci with utf8_unicode_ci instead. utf8_general_ci is broken, apparently: What are the diffrences between utf8_general_ci and utf8_unicode_ci?
utf8_general_ci is a very simple — and on Unicode, very broken — collation, one that gives incorrect results on general Unicode text.
Maybe you should try utf8mb4_unicode_ci ?
Utf8 charset cannot store all utf8 characters.
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
alter database osa character set utf8mb4 COLLATE utf8mb4_bin;
Works for me. #Maciek Bryński thank you for your hint.
Related
Here, In my table, I've one column name as description.
As per my error, I've tried many solutions from SO to change the collation type.
I've tried below collection
1) utf8mb4_unicode_ci
2) utf8_general_ci
Here, SHOW FULL COLUMNS FROM your_table;
Can anyone know what is the right collation for \'\\xC3\' this type of string?
To support full UTF-8 Unicode like for example emojis in your case it is the character À you should use utf8mb4 and utf8mb4_unicode_ci utf8 is outdated.
You can find a full explanation at https://mathiasbynens.be/notes/mysql-utf8mb4.
You can check the current collations of your table like this:
SHOW FULL COLUMNS FROM your_table;
I assume your description column has type TEXT otherwise you might need to change the type.
To alter the table default character set you can use:
ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4;
But this does not change the collation of your column.
To change the collation of your column you should use:
ALTER TABLE your_table MODIFY description TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Try this first
ALTER TABLE your_database_name.your_table CONVERT TO CHARACTER SET utf8
OR If above solution won't work then do the following after connecting to your database
SET NAMES 'utf8';
SET CHARACTER SET utf8;
Here is the command I use:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
It works well. Now I need to set utf8mb4_unicode_ci for a column (since currently characters are shown as ???). Anyway here is my new command:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb4_unicode_ci;
But sadly MySQL throws:
ERROacR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER
Any idea?
The first part of the COLLATION name must match the CHARACTER SET name.
CHARACTER SET utf8mb4 is needed for Emoji and some Chinese characters.
Let's back up to the 'real' problem -- of question marks.
COLLATION refers to the rules of ordering and sorting, not encoding.
CHARACTER SET refers to the encoding. This should be consistent at all stages. Question Marks come from inconsistencies.
Trouble with UTF-8 characters; what I see is not what I stored points out that these are the likely suspects for Question Marks:
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is not CHARACTER SET utf8mb4. Fix this if you need 4-byte UTF-8. (Use SHOW CREATE TABLE.)
Also, check that the connection during reading is UTF-8. The details depend on the application doing the connecting.
This worked for me:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
I have a utf8_general_ci database that I'm interested in converting to utf8_unicode_ci.
I've tried the following commands
ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; (for every single table)
But that seems to change the charset for future data but doesn't convert the actual existing data from utf8_general_ci to utf8_unicode_ci.
Is there any way to convert the existing data to utf8_unicode_ci?
SHOW CREATE TABLE to see if it really set the CHARACTER SET and COLLATION on the columns, not just the defaults.
What was the CHARACTER SET before the ALTERs?
Do SELECT col, HEX(col) ... for some field that should have utf8 in it. This will help us determine if you really have utf8 in the table. The encoding for characters is different based on CHARACTER SET; the HEX helps discover such.
The ordering (WHERE, ORDER BY, etc) is controlled by COLLATION. The indexes probably had to be rebuilt based on your ALTER TABLE. Did big tables with indexes take a 'long' time to convert?
To actually see the difference between utf8_general_ci and utf8_unicode_ci, you need a "combining accent" or, more simply, the German ß versus ss:
mysql> SELECT 'ß' = 'ss' COLLATE utf8_general_ci,
'ß' = 'ss' COLLATE utf8_unicode_ci;
+-------------------------------------+-------------------------------------+
| 'ß' = 'ss' COLLATE utf8_general_ci | 'ß' = 'ss' COLLATE utf8_unicode_ci |
+-------------------------------------+-------------------------------------+
| 0 | 1 |
+-------------------------------------+-------------------------------------+
However, to test that in your tables, you would need to store those values and use WHERE or GROUP_CONCAT or something else to determine the equality.
What 'proof' do you have that the ALTERs failed to achieve the collation change?
(Addressing other comments: REPAIR should be irrelevant. CONVERT TO tells the ALTER to actually modify the data, so it should have done the desired action.)
You have to change the collation of every field in every table. As you say, the collation of the table is only the default value for fields created later, and the collation of the database is only the default value for tables created later.
As Lorenz Meyer said, the collation of the table is only the default value for fields created later and you need to set the defaults for the columns explicitly too.
Such a change looks like:
ALTER TABLE mytable CHANGE mycolumn mycolumn varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
I'm trying to convert some mysql tables from latin1 to utf8. I'm using the following command, which seems to mostly work.
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
However, on one table I get an error about a duplicate key entry. This is caused by a unique index on a "name" field. It seems when converting to utf8, any "special" characters are indexed as their straight english equivalent. For example, there is already a record with a name field value of "Dru". When converting to utf8, a record with "Drü" is considered a duplicate. The same with "Patrick" and "Påtrìçk".
Here is how to reproduce the issue:
CREATE TABLE `example` ( `name` char(20) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO example (name) VALUES ('Drü'),('Dru'),('Patrick'),('Påtrìçk');
ALTER TABLE example convert to character set utf8 collate utf8_general_ci;
ERROR 1062 (23000): Duplicate entry 'Dru' for key 1
The reason why the strings 'Drü' and 'Dru' evaluate as the same is that in the utf8_general_ci collation, they count as "the same". The purpose of a collation for a character set is to provide a set of rules as to when strings are the same, when one sorts before the other, and so on.
If you want a different set of comparison rules, you need to choose a different collation. You can see the available collations for the utf8 character set by issuing SHOW COLLATION LIKE 'utf8%'. There are a bunch of collations intended for text that is mostly in a specific language; there is also the utf8_bin collation which compares all strings as binary strings (i.e. compares them as sequences of 0s and 1s).
UTF8_GENERAL_CI is accent insensitive.
Use UTF8_BIN or a language-specific collation.
I created a table and set the collation to utf8 in order to be able to add a unique index to a field. Now I need to do case insensitive searches, but when I performed some queries with the collate keyword and I got:
mysql> select * from page where pageTitle="Something" Collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for
CHARACTER SET 'latin1'
mysql> select * from page where pageTitle="Something" Collate latin1_general_ci;
ERROR 1267 (HY000): Illegal mix of collations (utf8_bin,IMPLICIT) and
(latin1_general_ci,EXPLICIT) for operation '='
I am pretty new to SQL, so I was wondering if anyone could help.
A string in MySQL has a character set and a collation. Utf8 is the character set, and utf8_bin is one of its collations. To compare your string literal to an utf8 column, convert it to utf8 by prefixing it with the _charset notation:
_utf8 'Something'
Now a collation is only valid for some character sets. The case-sensitive collation for utf8 appears to be utf8_bin, which you can specify like:
_utf8 'Something' collate utf8_bin
With these conversions, the query should work:
select * from page where pageTitle = _utf8 'Something' collate utf8_bin
The _charset prefix works with string literals. To change the character set of a field, there is CONVERT ... USING. This is useful when you'd like to convert the pageTitle field to another character set, as in:
select * from page
where convert(pageTitle using latin1) collate latin1_general_cs = 'Something'
To see the character and collation for a column named 'col' in a table called 'TAB', try:
select distinct collation(col), charset(col) from TAB
A list of all character sets and collations can be found with:
show character set
show collation
And all valid collations for utf8 can be found with:
show collation where charset = 'utf8'
Also please note that in case of using "Collate utf8_general_ci" or "Collate latin1_general_ci", i.e. "force" collate - such a converting will prevent from usage of existing indexes! This could be a bottleneck in future for performance.
Try this, Its working for me
SELECT * FROM users WHERE UPPER(name) = UPPER('josé') COLLATE utf8_bin;
May I ask why you have a need to explicitly change the collation when you do a SELECT? Why not just collate in the way you want to retrieve the records when sorted?
The problem you are having with your searches being case sensitive is that you have a binary collation. Try instead to use the general collation. For more information about case sensitivity and collations, look here:
Case Sensitivity in String Searches