MySQL thinks "e" is the same as "é"? - mysql

I have a table with the following columns:
ID
name
Both are indexed, both must be unique.
One post has name "ussé".
I try to insert a new post with the name "usse" and get the following error:
[MySQL][ODBC 5.3(w) Driver][mysqld-5.6.16-log]Duplicate entry 'usse' for key 'name'.
What the...? MySQL can't tell those two strings apart?
Is there some way to fix this?
All my text fields have "utf8 - default collation" selected. Does this make a difference? If I need to change it to something else, is there a way to do it quickly for all tex fields in all databases? There are so many that I would like to avoid going through them manually if I can.
My connection string:
DRIVER={MySQL ODBC 5.3 Unicode Driver}; SERVER=localhost;DATABASE=example; UID=example; PASSWORD=example; OPTION=3;

As you can see here :
How to convert an entire MySQL database characterset and collation to UTF-8?
here's a list of commands :
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Related

Incorrect string value error for unconventional characters

So I'm using a wrapper to fetch user data from instagram. I want to select the display names for users, and store them in a MYSQL database. I'm having issues inserting some of the display names, dealing with, specifically, an incorrect string value error:
Now, I've dealt with this issue before with accent marks, letters with umlauts, etc. The solution would be to change the collation to utf8_general_ci under the utf8 charset.
So as you can see, some of the display names I'm pulling have very unique characters that I'm not sure mySQL can recognize at all, i.e.:
ᛘ𝕰𝖆𝖗𝖙𝖍 𝕾𝖕𝖎𝖗𝖎𝖙𝖚𝖘𐂂®
So I receive:
Error Code: 1366. Incorrect string value: '\xF0\x9D\x99\x87\xF0\x9D...' for column 'dummy' at row 1
Here's my sql code
CREATE TABLE test_table(
id INT AUTO_INCREMENT,
dummy VARCHAR(255),
PRIMARY KEY(id)
);
INSERT INTO test_table (dummy)
VALUES ('ᛘ𝕰𝖆𝖗𝖙𝖍 𝕾𝖕𝖎𝖗𝖎𝖙𝖚𝖘𐂂®');
Any thoughts on a proper charset + collation pair that can handle characters like this? Not sure where to look for a solution, so I come here to see if anyone dealt with this.
P.S., I've tried utf8mb4 charset with utf8mb4_unicode_ci and utf8mb4_bin collations as well.
The characters you show require the column use the utf8mb4 encoding. Currently it seems your column is defined with the utf8mb3 encoding.
The way MySQL uses the name "utf8" is complicated, as described in https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html:
Note
Historically, MySQL has used utf8 as an alias for utf8mb3;
beginning with MySQL 8.0.28, utf8mb3 is used exclusively in the output
of SHOW statements and in Information Schema tables when this
character set is meant.
At some point in the future utf8 is expected to become a reference to
utf8mb4. To avoid ambiguity about the meaning of utf8, consider
specifying utf8mb4 explicitly for character set references instead of
utf8.
You should also be aware that the utf8mb3 character set is deprecated
and you should expect it to be removed in a future MySQL release.
Please use utf8mb4 instead.
You may have tried to change your table in the following way:
ALTER TABLE test_table CHARSET=utf8mb4;
But that only changes the default character set, to be used if you add new columns to the table subsequently. It does not change any of the current columns. To do that:
ALTER TABLE test_table MODIFY COLUMN dummy VARCHAR(255) CHARACTER SET utf8mb4;
Or to convert all string or TEXT columns in a table in one statement:
ALTER TABLE test_table CONVERT TO CHARACTER SET utf8mb4;
That would be 𝙇 - L MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL L
It requires the utf8mb4 Character set to even represent it. "F0" is the clue; it is the first of 4 bytes in a 4-byte UTF-8 character. It cannot be represented in MySQL's "utf8". Collation is (mostly) irrelevant.
Most, not all, of the characters in ᛘ𝕰𝖆𝖗𝖙𝖍 𝕾𝖕𝖎𝖗𝖎𝖙𝖚𝖘𐂂® also need utf8mb4. They are "MATHEMATICAL BOLD FRAKTUR" letters.
(Meanwhile, Bill gives you more of an answer.)

Changing character set of MySQL tables

I have a MySQL table CHINESE with DEFAULT CHARSET=latin1 and it has a column NAME with CHARACTER SET latin1. I have huge amount to data stored in this table. Around a million rows. And, I want to execute the following commands on my database:
ALTER DATABASE <DATABASE_NAME> DEFAULT CHARACTER SET utf8
ALTER TABLE CHINESE DEFAULT CHARACTER SET utf8
ALTER TABLE CHINESE MODIFY NAME VARCHAR(30) CHARACTER SET utf8
Considering the fact that I have huge amount of data stored in this database. Should I run these commands on my database? Will these commands lock the database in any way?
I am using Java to query and insert values in database. Will appending ?useUnicode=yes&characterEncoding=UTF-8 in URI string help me?
It will take a long time,
I think is it is the best that you export a .sql out and don't the trans-code there replace latin1 to utf8 , then import back in a temp table. Finally swap the name of they.

Sense of command collate in create table sql

I understand function of command collate (a little). It is truth that I did not test if it is possible to have tables with various collation (or even various charset) inside one DB.
But I found that (at least in phpmyadmin) when I create any DB, I set its charset and collation - and if I miss this command in CREATE TABLE ..., then automatically will be set collation set in creation of DB.
So, my question is: What is sense of presence of command collate in sql of CREATE TABLE ... if it can be missing there - and is recommended to have collate in CREATE TABLE ... or is it irrelevant?
In SQL Server if you don't specify the COLLATE it is defaulted to what ever DB is set to. Thus there is no danger in not specifying.
In MySQL behavior is the same:
The table character set and collation are used as default values for
column definitions if the column character set and collation are not
specified in individual column definitions. MySQL Reference
Collate is only used when you want to specify to non-default value. If all you are using is English character set than you have nothing to worry about it. If you store data from multiple languages than you have specify specific collation to ensure what characters are stored correctly.

SQL latin1 to utf-8 converting

I have a very big database with data stored on latin1 charsets.
On database with phpmyadmin if i use lines like this:
alter table TABLE_NAME modify FIELD_NAME blob;
alter database DATABASE_NAME charset=utf8;
alter table TABLE_NAME modify FIELD_NAME varchar(255) character set utf8;
All goes on. Text was converting.
But i have a lot of fields, tables. How to make it on all tables, fields? Is there a good convertation php script?
Also i uses on iconv, but no changes from latin1 to utf-8
create new database in utf8
restore your database structure
run "ALTER TABLE the_latin_one CONVERT TO CHARACTER SET utf8;" for all of your tables
create php script that create sql.txt like this
file_put_contents(PATH2."sql.txt","insert into ". $TABLE[0]. " values(". (rtrim($q_col,",") ).");\n",FILE_APPEND);
really you must create all tables insert query. using "show FULL tables where Table_type<>'VIEW'" and "set character set 'latin1'" query may help you for do this. if you have any problem write a comment for me ;)
in command line mysql -u xxx -p yourDB< sql.txt
i do this and solve my problem after 3 years :))

phpmyadmin MySQL data stored as ascii values in certain table

my host has just updated phpmyadmin to version 4.0.3. I don't know if it is related to the following problem.
I have a table 'users' which stores user data for the site and all data is now being stored as numbers. Where I had a username of 'rich' it is now '72696368' which is it's ascii code.
Any ideas why this might have happened? I have a lot of tables and have checked them all, it is only the users table that has been modified. It is not critical as I can still log in and accept new users etc but I would like to know why this is happening.
Thanks a lot
EDIT The collation is utf8_general_c
try adding this snippet to the script you are running in the line in which you create the schema
DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci
for example :
CREATE SCHEMA IF NOT EXISTS `my_schema` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;