MySQL convert to UTF8 without structure change - mysql

I have a rather large database that I am trying to convert from charset and collation latin1/latin1_swedish_ci to utf8mb4/utf8mb4_unicode_ci. I am hoping to setup replication to a slave, run the conversion, and then promote the slave when finished as to avoid downtime.
I noticed that when running the query...
ALTER TABLE `sometable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
...MySQL automatically converts text to mediumtext or mediumtext to longtext, etc.
Is there a way to turn this feature off? It is nice that MySQL has this feature, but the problem is that it breaks replication because the structure of the tables on the slave is different from master.

As documented under ALTER TABLE Syntax:
For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT column's length bytes, so MySQL will convert the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.
To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example:
ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;

(Not really an answer, but some illustrative examples.)
Case 1: Text is correctly stored as latin1 in latin1 column; use CONVERT TO
mysql> CREATE TABLE alters (
-> c VARCHAR(11) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')), (UNHEX('61e16263'));
mysql> SELECT c, HEX(c) from alters;
+-------+----------+
| c | HEX(c) |
+-------+----------+
| aabc | 61616263 |
| aàbc | 61E06263 |
| aábc | 61E16263 |
+-------+----------+
mysql> ALTER TABLE alters CONVERT TO CHARACTER SET utf8;
mysql> SELECT c, HEX(c) from alters;
+-------+------------+
| c | HEX(c) |
+-------+------------+
| aabc | 61616263 |
| aàbc | 61C3A06263 |
| aábc | 61C3A16263 |
+-------+------------+
mysql> -- Observation: text was correctly converted to utf8.
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Case 2: Text is correctly stored as latin1 in latin1 column; use "Double ALTER"
mysql> CREATE TABLE alters (
-> c VARCHAR(11) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')), (UNHEX('61e16263'));
mysql> ALTER TABLE alters MODIFY c VARBINARY(11) NOT NULL;
mysql> ALTER TABLE alters MODIFY c VARCHAR(11) CHARACTER SET utf8 NOT NULL;
Query OK, 3 rows affected, 2 warnings (0.10 sec)
Records: 3 Duplicates: 0 Warnings: 2
mysql> SHOW WARNINGS;
+---------+------+----------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xE0bc' for column 'c' at row 2 |
| Warning | 1366 | Incorrect string value: '\xE1bc' for column 'c' at row 3 |
+---------+------+----------------------------------------------------------+
mysql> SELECT c, HEX(c) from alters;
+------+----------+
| c | HEX(c) |
+------+----------+
| aabc | 61616263 |
| a | 61 |
| a | 61 |
+------+----------+
mysql> -- Observation: text was truncated ! BAD
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(11) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Case 3: Text was incorrectly stored as utf8 in a latin1 column; use the "Double ALTER to fix it
mysql> CREATE TABLE alters (
-> c VARCHAR(11) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61c3a06263')), (UNHEX('61c3a16263'));
mysql> ALTER TABLE alters MODIFY c VARBINARY(11) NOT NULL;
mysql> ALTER TABLE alters MODIFY c VARCHAR(11) CHARACTER SET utf8 NOT NULL;
mysql> SELECT c, HEX(c) from alters;
+-------+------------+
| c | HEX(c) |
+-------+------------+
| aabc | 61616263 |
| aàbc | 61C3A06263 |
| aábc | 61C3A16263 |
+-------+------------+
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(11) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Case 4: Using ALTER ... MODIFY; note the LENGTH and CHAR_LENGTH
mysql> CREATE TABLE alters (
-> c VARCHAR(9) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')),
-> (UNHEX('61e16263')),
-> (UNHEX('61e162633536373839'));
mysql> SELECT c, HEX(c), LENGTH(c), CHAR_LENGTH(c) from alters;
+------------+--------------------+-----------+----------------+
| c | HEX(c) | LENGTH(c) | CHAR_LENGTH(c) |
+------------+--------------------+-----------+----------------+
| aabc | 61616263 | 4 | 4 |
| aàbc | 61E06263 | 4 | 4 |
| aábc | 61E16263 | 4 | 4 |
| aábc56789 | 61E162633536373839 | 9 | 9 |
+------------+--------------------+-----------+----------------+
mysql> ALTER TABLE alters MODIFY c VARCHAR(9) CHARACTER SET utf8 NOT NULL;
mysql> SELECT c, HEX(c), LENGTH(c), CHAR_LENGTH(c) from alters;
+------------+----------------------+-----------+----------------+
| c | HEX(c) | LENGTH(c) | CHAR_LENGTH(c) |
+------------+----------------------+-----------+----------------+
| aabc | 61616263 | 4 | 4 |
| aàbc | 61C3A06263 | 5 | 4 |
| aábc | 61C3A16263 | 5 | 4 |
| aábc56789 | 61C3A162633536373839 | 10 | 9 |
+------------+----------------------+-----------+----------------+
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(9) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Notes:
No Warnings except for the one case where I did SHOW.
Default table CHARSET was not changed, but that is not a problem.

Related

Cyrillic encoding in MySQL

In my.ini I've changed properties from latin1 to cp1251 (then restarted the server)
[mysql]
default-character-set=cp1251
............................
[mysqld]
default-character-set=cp1251
I create database
CREATE DATABASE library DEFAULT CHARSET=cp1251;
Make request to check out the encoding:
SELECT ##character_set_database, ##collation_database;
+--------------------------+----------------------+
| ##character_set_database | ##collation_database |
+--------------------------+----------------------+
| cp1251 | cp1251_general_ci |
+--------------------------+----------------------+
show variables like "char%";
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | cp1251 |
| character_set_connection | cp1251 |
| character_set_database | cp1251 |
| character_set_filesystem | binary |
| character_set_results | cp1251 |
| character_set_server | cp1251 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+---------------------------------------------------------+
Create a table
CREATE TABLE genres (g_id INT, g_name VARCHAR(150)) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
As I try to insert cyrillic data, the Command Line window gets stuck:
mysql> INSERT INTO genres (g_id, g_name) VALUES (1, 'Поэзия');
'>
'>
'>
'>
Latin strings get inserted ok:
mysql> INSERT INTO genres (g_id, g_name) VALUES (1, 'Poetry');
Query OK, 1 row affected (0.06 sec)
Yesterday, after the whole day of trying and testing, I got it working well. Created some more tables and inserted some Cyrillic strings. But next morning and the whole day long I can't get it working again. The previously inserted data wouldn't display. After firing
set names utf8
the Cyrillic words appeared, numeric columns didn't show right. What have I missed?
It's not just one change.
character_set_client/connection/results, but not the other two that you changed, specify the encoding of the client.
The column definitions in the database tables need to have a character set that can handle Cyrillic. One way is to do this to each table:
ALTER TABLE t CONVERT TO cp1251;
Have you have already stored Cyrillic in latin1 columns?
Check by doing SELECT HEX(col) .... You may need the 2-step Alter as discussed in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
It would be best to switch to utf8mb4; that way you could handle all character sets throughout the world.
See also Trouble with UTF-8 characters; what I see is not what I stored
I have found a workaround. After starting cmd
C:\Users\nikol>chcp 866
Active code page: 866
Then after starting mysql
mysql> set names cp866;
Query OK, 0 rows affected (0.00 sec)
But when I select the data, there are multiple trailing spaces
mysql> SELECT * FROM genres;
+------+------------------+
| g_id | g_name |
+------+------------------+
| 1 | Поэзия |
| 2 | Программирование |
| 3 | Психология |
| 4 | Наука |
| 5 | Классика |
| 6 | Фантастика |
+------+------------------+
6 rows in set (0.00 sec)
I guess I'll have to TRIM

How to change encoding on fly in SELECT statement?

I have a table with a column, which has cp1251_general_ci collation. I don't want to change column collation, but I want to get data in utf8 encoding.
Is there a way to select any data somehow in a way that it looks just like a data with utf8_general_ci collation?
I.e. I need something like this
SELECT CONVERT_TO_UTF8(weirdColumn) FROM weirdTable
Here's a demo table using the cp1251 encoding. I'll insert some Cyrillic characters into it.
mysql> CREATE TABLE weirdTable (weirdColumn text) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
mysql> insert into weirdTable values ('ЂЃЉЌ');
mysql> select * from weirdTable;
+-------------+
| weirdColumn |
+-------------+
| ЂЃЉЌ |
+-------------+
Use MySQL's CONVERT() function to force the characters to a different encoding:
mysql> select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
Here's proof that the result has been converted to utf8. I create a table using metadata from the query result:
mysql> create table w2
as select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
Query OK, 1 row affected (0.07 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> show create table w2\G
*************************** 1. row ***************************
Table: w2
Create Table: CREATE TABLE `w2` (
`weirdColumnUtf8` longtext CHARACTER SET utf8
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> select * from w2;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
On my MySQL instance, utf8mb4 is the default character encoding. That's okay; it's a superset of utf8, and the utf8 encoding is enough to store these characters. However, I generally recommend if you use utf8, there's no reason not to use utf8mb4.
If you change the character encoding, you cannot keep the cp1251 collation. Collations are specific to encodings. But you can use one of the collations associated with utf8 or utf8mb4. You can see the available collations for a given character encoding:
mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------------+---------+-----+---------+----------+---------+---------------+
...
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | PAD SPACE |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 | PAD SPACE |
...

Alter table default character set modifies the rows in MySQL 5.6

I am using MySQL 5.6 and I want to modify the default encoding of one table (from latin1 to utf8) WITHOUT modifying the existing columns and rows.
Based on documentation I have tried the following command:
ALTER TABLE mytable DEFAULT CHARACTER SET utf8;
It modified the default character set encoding of my table and did NOT modify the collation of the columns, as expected, BUT I was really surprised to see:
Query OK, 32141 rows affected (6.31 sec)
Records: 32141 Duplicates: 0 Warnings: 0
Except "32141 rows affected", the results are as expected as you can see below:
MySQL> select count(*) from mytable;
+----------+
| count(*) |
+----------+
| 32141 |
+----------+
1 row in set (0.01 sec)
MySQL> show table status like 'mytable';
+-----------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| mytable | InnoDB | 10 | Compact | 16723 | 20798 | 347815936 | 0 | 21561344 | 15728640 | NULL | NULL | NULL | NULL | utf8_general_ci | NULL | partitioned | |
+-----------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
MySQL> show create table mytable;
CREATE TABLE `mytable` (
`ID` varchar(255) NOT NULL,
`COL1` double DEFAULT NULL,
`COL2` longtext CHARACTER SET latin1,
`COL3` datetime DEFAULT NULL,
`COL4` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`COL5` int(11) DEFAULT NULL,
`COL6` datetime DEFAULT NULL,
`COL7` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`COL8` datetime(3) NOT NULL,
`COL9` int(11) NOT NULL DEFAULT '-1',
`COL10` int(11) DEFAULT '0',
`COL11` double DEFAULT '0',
PRIMARY KEY (`ID`,`COL9`),
KEY `idx1` (`COL7`,`COL3`,`COL6`),
KEY `idx2` (`COL1`,`COL4`,`COL3`,`COL6`),
KEY `idx3` (`ID`,`COL3`,`COL6`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`COL9`)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (100) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (101) ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN (102) ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN (103) ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN (104) ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN (105) ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN (106) ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN (107) ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN (108) ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN (109) ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN (110) ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN (111) ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN (200) ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN (201) ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN (202) ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN (203) ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN (204) ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN (205) ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN (206) ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (207) ENGINE = InnoDB,
PARTITION p31 VALUES LESS THAN (208) ENGINE = InnoDB,
PARTITION p32 VALUES LESS THAN (209) ENGINE = InnoDB,
PARTITION p33 VALUES LESS THAN (210) ENGINE = InnoDB,
PARTITION p34 VALUES LESS THAN (211) ENGINE = InnoDB,
PARTITION p35 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
MySQL> show full columns from mytable;
+--------------------------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+--------------------------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| ID | varchar(255) | latin1_swedish_ci | NO | PRI | NULL | | select,insert,update,references | |
| COL1 | double | NULL | YES | MUL | NULL | | select,insert,update,references | |
| COL2 | longtext | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
| COL3 | datetime | NULL | YES | | NULL | | select,insert,update,references | |
| COL4 | varchar(255) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
| COL5 | int(11) | NULL | YES | | NULL | | select,insert,update,references | |
| COL6 | datetime | NULL | YES | | NULL | | select,insert,update,references | |
| COL7 | varchar(255) | latin1_swedish_ci | YES | MUL | NULL | | select,insert,update,references | |
| COL8 | datetime(3) | NULL | NO | | NULL | | select,insert,update,references | |
| COL9 | int(11) | NULL | NO | PRI | -1 | | select,insert,update,references | |
| COL10 | int(11) | NULL | YES | | 0 | | select,insert,update,references | |
| COL11 | double | NULL | YES | | 0 | | select,insert,update,references | |
+--------------------------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
My connection parameters are as follows:
MySQL> show variables where variable_name like '%char%' or variable_name like '%collation%';
+--------------------------+--------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+--------------------------+--------------------------------------------------+
Note that:
data was created from a java application
at the time of data creation, the connection parameters were set to utf8
there are no FK linked with this table
When I try to reproduce with some newly created tables, it seems that the rows are not modified. See below "0 rows affected":
MySQL> select count(*) from mytesttable;
+----------+
| count(*) |
+----------+
| 3 |
+----------+
3 row in set (0.10 sec)
MySQL> alter table mytesttable character set utf8;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
I tried to changed my connection parameters back to latin1 during the data creation but it didn't change the result: still "0 rows affected".
So my questions:
Is my understanding of the command correct? (that it shouldn't modify the rows)
What could explain that the rows are affected in the 1st case?
EDIT
I have just found out that the problem doesn't happen if I remove the partition.
With partition I get "XXX affected rows"
Without partition I get "0 affected rows"
Is it expected?
EDIT 2 with SUMMARY
Initially:
The table was using latin1 as default encoding (same for the columns)
The connection was declared as utf8
What works:
Before any ALTER TABLE command, characters like "é" seem to be latin1 encoded (E9)
Running command ALTER TABLE mytable CHARACTER SET utf8mb4; does not modify the data (hex command still shows E9)
The column is still declared latin1.
Running command ALTER TABLE mytable MODIFY COL2 LONGTEXT CHARACTER SET utf8mb4 changes the column to utf8mb4 (C3A9)
So far so good.
Remaining questions:
How to make sure that all data present in the table is latin1? I have tried SELECT COL2 FROM mytable WHERE LENGTH(COL2) != CHAR_LENGTH(COL2) LIMIT 1 and I got 0 results. Is it enough?
Why the command ALTER TABLE mytable CHARACTER SET utf8mb4; shows
"32141 rows affected" when it seems that the data is not modified?
(it happens when the table has partitions and index on the same column)
Following the previous point, is it safe (needed?) to also change the default encoding of the table? Or shall I just stick to the modification of the columns?
Thanks a lot for your help
You had a mess, and the ALTER made the mess worse.
To start with, the table columns were declared latin1 and the connection declared that the client was using latin1 (via SET NAMES latin1). That would have been fine if é had actually been hex E9 in the client. But the data in the client was UTF-8. So é was the two bytes C3A9 was sent to the database as 2 latin1 characters. The damage was not noticeable, because it was reversed when you SELECTed.
The later step messed things up by treating each of those bytes as latin1 and converting them to utf8, hence "double" encoding.
See "Mojibake" and "double encoding" in Trouble with UTF-8 characters; what I see is not what I stored . If you want to try to recover the data, see the appropriate case in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Well, apparently ALTER TABLE mytable DEFAULT CHARACTER SET utf8; was not just changing the default, but was copying the table over, and in doing so, introducing the double encoding.
I have been chasing MySQL charset problems for over a decade. This is a new wrinkle that I had not yet observed.
I'm pretty sure that character_set_system is not involved in your problem. (But I could be wrong!)
Wrong SET NAMES
Test case:
CREATE TABLE mytest ( MYDATA longtext ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SET NAMES latin1;
INSERT INTO mytest VALUES ( "é" );
SELECT MYDATA, HEX(MYDATA) FROM mytest;
Running that test case:
mysql> SET NAMES latin1;
mysql> SHOW CREATE TABLE mytest\G
*************************** 1. row ***************************
Table: mytest
Create Table: CREATE TABLE `mytest` (
`MYDATA` longtext
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> INSERT INTO mytest VALUES ( "é" );
mysql> SELECT MYDATA, HEX(MYDATA), LENGTH(MYDATA),
CHAR_LENGTH(MYDATA) FROM mytest;
+--------+-------------+----------------+---------------------+
| MYDATA | HEX(MYDATA) | LENGTH(MYDATA) | CHAR_LENGTH(MYDATA) |
+--------+-------------+----------------+---------------------+
| é | C3A9 | 2 | 2 |
+--------+-------------+----------------+---------------------+
The character looks fine. But the HEX looks like UTF-8, not latin1. And the CHAR_LENGTH is "wrong".
The case is: CHARACTER SET latin1, but utf8 bytes are in it.
To leave bytes alone while fixing charset:
Then to convert the column without changing the bytes:
ALTER TABLE tbl MODIFY COLUMN MYDATA LONGBLOB;
ALTER TABLE tbl MODIFY COLUMN MYDATA LONGTEXT CHARACTER SET utf8mb4;
(Be sure to have all the attributes that you originally had, such as NOT NULL.)
This is the "2-step ALTER", as discussed in http://mysql.rjweb.org/doc.php/charcoll .) (Be sure to keep the other specifications the same - VARCHAR, NOT NULL, etc.)
Partition Test case:
DROP TABLE IF EXISTS ptest;
CREATE TABLE ptest (
nn INT NOT NULL,
ee LONGTEXT
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (nn)
(PARTITION p0 VALUES LESS THAN (1),
PARTITION p1 VALUES LESS THAN MAXVALUE);
SET NAMES latin1;
INSERT INTO ptest (nn, ee) VALUES ( 0, "é" ), ( 1, "ü" );
SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
ALTER TABLE ptest
DEFAULT CHARSET utf8;
SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
SELECT ##version;
SHOW CREATE TABLE ptest\G
Partition results:
mysql> DROP TABLE IF EXISTS ptest;
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE TABLE ptest (
-> nn INT NOT NULL,
-> ee LONGTEXT
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
-> PARTITION BY RANGE (nn)
-> (PARTITION p0 VALUES LESS THAN (1),
-> PARTITION p1 VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (0.03 sec)
mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO ptest (nn, ee) VALUES ( 0, "é" ), ( 1, "ü" );
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
+----+------+---------+------------+-----------------+
| nn | ee | HEX(ee) | LENGTH(ee) | CHAR_LENGTH(ee) |
+----+------+---------+------------+-----------------+
| 0 | é | C3A9 | 2 | 2 |
| 1 | ü | C3BC | 2 | 2 |
+----+------+---------+------------+-----------------+
2 rows in set (0.00 sec)
mysql> ALTER TABLE ptest
-> DEFAULT CHARSET utf8;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
+----+------+---------+------------+-----------------+
| nn | ee | HEX(ee) | LENGTH(ee) | CHAR_LENGTH(ee) |
+----+------+---------+------------+-----------------+
| 0 | é | C3A9 | 2 | 2 |
| 1 | ü | C3BC | 2 | 2 |
+----+------+---------+------------+-----------------+
2 rows in set (0.00 sec)
mysql> SELECT ##version;
+-----------------+
| ##version |
+-----------------+
| 5.6.22-71.0-log |
+-----------------+
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE ptest\G
*************************** 1. row ***************************
Table: ptest
Create Table: CREATE TABLE `ptest` (
`nn` int(11) NOT NULL,
`ee` longtext CHARACTER SET latin1
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (nn)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0.00 sec)
Hmmm... I don't see the ALTER problem. What version are you using? Do you see the problem with this test case?

Can't insert some special characters into mysql

The value cut off after that character 💀
Why this happening?
create table tmp2(t1 varchar(100));
insert into tmp2 values('before💀after');
mysql> select * from tmp2;
+--------+
| t1 |
+--------+
| before |
+--------+
1 row in set (0.01 sec)
I ran followed commands and returned some useful information
mysql> SHOW FULL COLUMNS FROM tmp2;
+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| t1 | varchar(100) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |
+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
and this,
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "test" AND table_name = "tmp2" AND column_name = "t1";
+--------------------+
| character_set_name |
+--------------------+
| utf8 |
+--------------------+
1 row in set (0.00 sec)
Im testing this on ubuntu/mysql command line.
I found the solution here
I learnt some characters are not includes in utf8
There is a good article here
I needed to change column utf8 to utf8mb4 and it worked
alter table tmp2 modify t1 varchar(100) character set utf8mb4;
SET NAMES utf8mb4;
insert tmp2 values('before💀after');

What collation does MySQL use by default for ORDER BY?

These queries both give the result I expect:
SELECT sex
FROM ponies
ORDER BY sex COLLATE latin1_swedish_ci ASC
SELECT sex
FROM ponies
ORDER BY CONVERT(sex USING utf8) COLLATE utf8_general_ci ASC
| f |
| f |
| m |
| m |
+---+
But this query gives a different result:
SELECT sex FROM ponies ORDER BY sex ASC
| m |
| m |
| f |
| f |
+---+
Here's the configuration:
SHOW VARIABLES LIKE 'collation\_%'
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
The table collation is latin1_swedish_ci.
MySQL server is 5.5.16.
Table Collations
Collation defaults are stored on a table-by-table basis. There is a server-set default, but that is applied to the table at the time it is created.
To find the collation for a specific table, run this query:
SHOW TABLE STATUS LIKE 'ponies'\G
You should see output like this:
*************************** 1. row ***************************
Name: ponies
Engine: MyISAM
Version: 10
Row_format: Fixed
Rows: 8
Avg_row_length: 20
Data_length: 160
Max_data_length: 5629499534213119
Index_length: 1024
Data_free: 0
Auto_increment: NULL
Create_time: 2012-02-27 10:16:25
Update_time: 2012-02-27 10:17:40
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
And you can see the Collation setting in that result.
Column collations
You can also override collation settings on particular columns within a table. A create table statement like this would create a latin1_swedish_ci table, with a utf8_polish_ci column:
CREATE TABLE ponies (
sex CHAR(1) COLLATE utf8_polish_ci
) CHARACTER SET latin1 COLLATE latin1_swedish_ci;
The best way to view the results of this is like this:
SHOW FULL COLUMNS FROM ponies;
Output:
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
| sex | char(1) | utf8_polish_ci | YES | | NULL | | select,insert,update,references | |
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
The documentation says it uses a case insensitive character comparison by default. I don't see why you are not getting that result though.
The documentation also suggests using the binary qualifier for case sensitive comparison. I wonder if that would affect your result?:
SELECT sex FROM ponies ORDER BY BINARY sex ASC
This behaviour can be observed when sex is an ENUM in which case it is usually sorted by the numerical position in the ENUM definition. Only when a collation is explicitly given an it is sorted in alphabetical order.