I am using MySQL 5.6 and I want to modify the default encoding of one table (from latin1 to utf8) WITHOUT modifying the existing columns and rows.
Based on documentation I have tried the following command:
ALTER TABLE mytable DEFAULT CHARACTER SET utf8;
It modified the default character set encoding of my table and did NOT modify the collation of the columns, as expected, BUT I was really surprised to see:
Query OK, 32141 rows affected (6.31 sec)
Records: 32141 Duplicates: 0 Warnings: 0
Except "32141 rows affected", the results are as expected as you can see below:
MySQL> select count(*) from mytable;
+----------+
| count(*) |
+----------+
| 32141 |
+----------+
1 row in set (0.01 sec)
MySQL> show table status like 'mytable';
+-----------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
| mytable | InnoDB | 10 | Compact | 16723 | 20798 | 347815936 | 0 | 21561344 | 15728640 | NULL | NULL | NULL | NULL | utf8_general_ci | NULL | partitioned | |
+-----------------------+--------+---------+------------+-------+----------------+-------------+-----------------+--------------+-----------+----------------+-------------+-------------+------------+-----------------+----------+----------------+---------+
MySQL> show create table mytable;
CREATE TABLE `mytable` (
`ID` varchar(255) NOT NULL,
`COL1` double DEFAULT NULL,
`COL2` longtext CHARACTER SET latin1,
`COL3` datetime DEFAULT NULL,
`COL4` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`COL5` int(11) DEFAULT NULL,
`COL6` datetime DEFAULT NULL,
`COL7` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`COL8` datetime(3) NOT NULL,
`COL9` int(11) NOT NULL DEFAULT '-1',
`COL10` int(11) DEFAULT '0',
`COL11` double DEFAULT '0',
PRIMARY KEY (`ID`,`COL9`),
KEY `idx1` (`COL7`,`COL3`,`COL6`),
KEY `idx2` (`COL1`,`COL4`,`COL3`,`COL6`),
KEY `idx3` (`ID`,`COL3`,`COL6`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`COL9`)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (100) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (101) ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN (102) ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN (103) ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN (104) ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN (105) ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN (106) ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN (107) ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN (108) ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN (109) ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN (110) ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN (111) ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN (200) ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN (201) ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN (202) ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN (203) ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN (204) ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN (205) ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN (206) ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (207) ENGINE = InnoDB,
PARTITION p31 VALUES LESS THAN (208) ENGINE = InnoDB,
PARTITION p32 VALUES LESS THAN (209) ENGINE = InnoDB,
PARTITION p33 VALUES LESS THAN (210) ENGINE = InnoDB,
PARTITION p34 VALUES LESS THAN (211) ENGINE = InnoDB,
PARTITION p35 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
MySQL> show full columns from mytable;
+--------------------------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+--------------------------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| ID | varchar(255) | latin1_swedish_ci | NO | PRI | NULL | | select,insert,update,references | |
| COL1 | double | NULL | YES | MUL | NULL | | select,insert,update,references | |
| COL2 | longtext | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
| COL3 | datetime | NULL | YES | | NULL | | select,insert,update,references | |
| COL4 | varchar(255) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
| COL5 | int(11) | NULL | YES | | NULL | | select,insert,update,references | |
| COL6 | datetime | NULL | YES | | NULL | | select,insert,update,references | |
| COL7 | varchar(255) | latin1_swedish_ci | YES | MUL | NULL | | select,insert,update,references | |
| COL8 | datetime(3) | NULL | NO | | NULL | | select,insert,update,references | |
| COL9 | int(11) | NULL | NO | PRI | -1 | | select,insert,update,references | |
| COL10 | int(11) | NULL | YES | | 0 | | select,insert,update,references | |
| COL11 | double | NULL | YES | | 0 | | select,insert,update,references | |
+--------------------------+--------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
My connection parameters are as follows:
MySQL> show variables where variable_name like '%char%' or variable_name like '%collation%';
+--------------------------+--------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+--------------------------+--------------------------------------------------+
Note that:
data was created from a java application
at the time of data creation, the connection parameters were set to utf8
there are no FK linked with this table
When I try to reproduce with some newly created tables, it seems that the rows are not modified. See below "0 rows affected":
MySQL> select count(*) from mytesttable;
+----------+
| count(*) |
+----------+
| 3 |
+----------+
3 row in set (0.10 sec)
MySQL> alter table mytesttable character set utf8;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
I tried to changed my connection parameters back to latin1 during the data creation but it didn't change the result: still "0 rows affected".
So my questions:
Is my understanding of the command correct? (that it shouldn't modify the rows)
What could explain that the rows are affected in the 1st case?
EDIT
I have just found out that the problem doesn't happen if I remove the partition.
With partition I get "XXX affected rows"
Without partition I get "0 affected rows"
Is it expected?
EDIT 2 with SUMMARY
Initially:
The table was using latin1 as default encoding (same for the columns)
The connection was declared as utf8
What works:
Before any ALTER TABLE command, characters like "é" seem to be latin1 encoded (E9)
Running command ALTER TABLE mytable CHARACTER SET utf8mb4; does not modify the data (hex command still shows E9)
The column is still declared latin1.
Running command ALTER TABLE mytable MODIFY COL2 LONGTEXT CHARACTER SET utf8mb4 changes the column to utf8mb4 (C3A9)
So far so good.
Remaining questions:
How to make sure that all data present in the table is latin1? I have tried SELECT COL2 FROM mytable WHERE LENGTH(COL2) != CHAR_LENGTH(COL2) LIMIT 1 and I got 0 results. Is it enough?
Why the command ALTER TABLE mytable CHARACTER SET utf8mb4; shows
"32141 rows affected" when it seems that the data is not modified?
(it happens when the table has partitions and index on the same column)
Following the previous point, is it safe (needed?) to also change the default encoding of the table? Or shall I just stick to the modification of the columns?
Thanks a lot for your help
You had a mess, and the ALTER made the mess worse.
To start with, the table columns were declared latin1 and the connection declared that the client was using latin1 (via SET NAMES latin1). That would have been fine if é had actually been hex E9 in the client. But the data in the client was UTF-8. So é was the two bytes C3A9 was sent to the database as 2 latin1 characters. The damage was not noticeable, because it was reversed when you SELECTed.
The later step messed things up by treating each of those bytes as latin1 and converting them to utf8, hence "double" encoding.
See "Mojibake" and "double encoding" in Trouble with UTF-8 characters; what I see is not what I stored . If you want to try to recover the data, see the appropriate case in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Well, apparently ALTER TABLE mytable DEFAULT CHARACTER SET utf8; was not just changing the default, but was copying the table over, and in doing so, introducing the double encoding.
I have been chasing MySQL charset problems for over a decade. This is a new wrinkle that I had not yet observed.
I'm pretty sure that character_set_system is not involved in your problem. (But I could be wrong!)
Wrong SET NAMES
Test case:
CREATE TABLE mytest ( MYDATA longtext ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SET NAMES latin1;
INSERT INTO mytest VALUES ( "é" );
SELECT MYDATA, HEX(MYDATA) FROM mytest;
Running that test case:
mysql> SET NAMES latin1;
mysql> SHOW CREATE TABLE mytest\G
*************************** 1. row ***************************
Table: mytest
Create Table: CREATE TABLE `mytest` (
`MYDATA` longtext
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> INSERT INTO mytest VALUES ( "é" );
mysql> SELECT MYDATA, HEX(MYDATA), LENGTH(MYDATA),
CHAR_LENGTH(MYDATA) FROM mytest;
+--------+-------------+----------------+---------------------+
| MYDATA | HEX(MYDATA) | LENGTH(MYDATA) | CHAR_LENGTH(MYDATA) |
+--------+-------------+----------------+---------------------+
| é | C3A9 | 2 | 2 |
+--------+-------------+----------------+---------------------+
The character looks fine. But the HEX looks like UTF-8, not latin1. And the CHAR_LENGTH is "wrong".
The case is: CHARACTER SET latin1, but utf8 bytes are in it.
To leave bytes alone while fixing charset:
Then to convert the column without changing the bytes:
ALTER TABLE tbl MODIFY COLUMN MYDATA LONGBLOB;
ALTER TABLE tbl MODIFY COLUMN MYDATA LONGTEXT CHARACTER SET utf8mb4;
(Be sure to have all the attributes that you originally had, such as NOT NULL.)
This is the "2-step ALTER", as discussed in http://mysql.rjweb.org/doc.php/charcoll .) (Be sure to keep the other specifications the same - VARCHAR, NOT NULL, etc.)
Partition Test case:
DROP TABLE IF EXISTS ptest;
CREATE TABLE ptest (
nn INT NOT NULL,
ee LONGTEXT
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE (nn)
(PARTITION p0 VALUES LESS THAN (1),
PARTITION p1 VALUES LESS THAN MAXVALUE);
SET NAMES latin1;
INSERT INTO ptest (nn, ee) VALUES ( 0, "é" ), ( 1, "ü" );
SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
ALTER TABLE ptest
DEFAULT CHARSET utf8;
SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
SELECT ##version;
SHOW CREATE TABLE ptest\G
Partition results:
mysql> DROP TABLE IF EXISTS ptest;
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE TABLE ptest (
-> nn INT NOT NULL,
-> ee LONGTEXT
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1
-> PARTITION BY RANGE (nn)
-> (PARTITION p0 VALUES LESS THAN (1),
-> PARTITION p1 VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (0.03 sec)
mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO ptest (nn, ee) VALUES ( 0, "é" ), ( 1, "ü" );
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
+----+------+---------+------------+-----------------+
| nn | ee | HEX(ee) | LENGTH(ee) | CHAR_LENGTH(ee) |
+----+------+---------+------------+-----------------+
| 0 | é | C3A9 | 2 | 2 |
| 1 | ü | C3BC | 2 | 2 |
+----+------+---------+------------+-----------------+
2 rows in set (0.00 sec)
mysql> ALTER TABLE ptest
-> DEFAULT CHARSET utf8;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SELECT nn, ee, HEX(ee), LENGTH(ee), CHAR_LENGTH(ee) FROM ptest;
+----+------+---------+------------+-----------------+
| nn | ee | HEX(ee) | LENGTH(ee) | CHAR_LENGTH(ee) |
+----+------+---------+------------+-----------------+
| 0 | é | C3A9 | 2 | 2 |
| 1 | ü | C3BC | 2 | 2 |
+----+------+---------+------------+-----------------+
2 rows in set (0.00 sec)
mysql> SELECT ##version;
+-----------------+
| ##version |
+-----------------+
| 5.6.22-71.0-log |
+-----------------+
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE ptest\G
*************************** 1. row ***************************
Table: ptest
Create Table: CREATE TABLE `ptest` (
`nn` int(11) NOT NULL,
`ee` longtext CHARACTER SET latin1
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (nn)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0.00 sec)
Hmmm... I don't see the ALTER problem. What version are you using? Do you see the problem with this test case?
Related
I initially selected part of the stock_hfq table, and then filtered it with ts_code.
I used the intermediate table stock_hfq_temp to filter the stock_hfq data. The data queried out is about 2M rows. It takes 2min 1s without exists, and 1min 5s with exists.However, adding the time to write the temporary table stock_hfq_temp and the time to create the index ts_code of the temporary table, the total time difference is only 4s.
Is there any other way to speed up my query speed?
ts_code is unique in the stock_hfq_temp table.
The relevant sentences and results are as follows:
select * from stock_hfq t where t.trade_date>'20110302';
Wall time: 2min 12s
select ts_code from stock_hfq_temp b
Wall time: 8 ms
select * from stock_hfq t where t.trade_date>'20110302' and exists (select 1 from stock_hfq_temp b where b.ts_code=t.ts_code);
Wall time: 1min 5s
The analysis of the database is as follows:
mysql> select count(1) from stock_hfq;
+----------+
| count(1) |
+----------+
| 11546271 |
+----------+
1 row in set (3 min 31.64 sec)
mysql> select count(1) from (select distinct ts_code from stock_hfq b) t;
+----------+
| count(1) |
+----------+
| 4480 |
+----------+
1 row in set (1.26 sec)
mysql> select count(1) from stock_hfq_temp;
+----------+
| count(1) |
+----------+
| 1502 |
+----------+
1 row in set (0.18 sec)
Both tables are indexed.
mysql> show index from stock_hfq;
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| stock_hfq | 1 | ts_code | 1 | ts_code | A | 16782 | NULL | NULL | YES | BTREE | | | YES | NULL |
| stock_hfq | 1 | trade_date | 1 | trade_date | A | 94773 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-----------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.00 sec)
mysql> show index from stock_hfq_temp;
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| stock_hfq_temp | 1 | ts_code | 1 | ts_code | A | 1502 | NULL | NULL | YES | BTREE | | | YES | NULL |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
1 row in set (0.00 sec)
explain:
mysql> explain select * from stock_hfq t where exists (select 1 from stock_hfq_temp b where b.ts_code =t.ts_code);
+----+-------------+-------+------------+-------+---------------+---------+---------+----------------------+------+----------+-------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+----------------------+------+----------+-------------------------------------+
| 1 | SIMPLE | b | NULL | index | ts_code | ts_code | 83 | NULL | 1502 | 100.00 | Using where; Using index; LooseScan |
| 1 | SIMPLE | t | NULL | ref | ts_code | ts_code | 83 | quant_test.b.ts_code | 681 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+----------------------+------+----------+-------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'quant_test.t.ts_code' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | /* select#1 */ select `quant_test`.`t`.`ts_code` AS `ts_code`,`quant_test`.`t`.`trade_date` AS `trade_date`,`quant_test`.`t`.`open` AS `open`,`quant_test`.`t`.`high` AS `high`,`quant_test`.`t`.`low` AS `low`,`quant_test`.`t`.`close` AS `close`,`quant_test`.`t`.`pre_close` AS `pre_close`,`quant_test`.`t`.`change` AS `change`,`quant_test`.`t`.`pct_chg` AS `pct_chg`,`quant_test`.`t`.`vol` AS `vol`,`quant_test`.`t`.`amount` AS `amount`,`quant_test`.`t`.`adj_factor` AS `adj_factor` from `quant_test`.`stock_hfq` `t` semi join (`quant_test`.`stock_hfq_temp` `b`) where (`quant_test`.`t`.`ts_code` = `quant_test`.`b`.`ts_code`) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
CREATE TABLE statements for all relevant tables :
/*
Navicat Premium Data Transfer
Source Server : localhost_3306
Source Server Type : MySQL
Source Server Version : 80021
Source Host : localhost:3306
Source Schema : quant_test
Target Server Type : MySQL
Target Server Version : 80021
File Encoding : 65001
Date: 12/06/2021 13:55:28
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for stock_hfq
-- ----------------------------
DROP TABLE IF EXISTS `stock_hfq`;
CREATE TABLE `stock_hfq` (
`ts_code` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`trade_date` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`open` double DEFAULT NULL,
`high` double DEFAULT NULL,
`low` double DEFAULT NULL,
`close` double DEFAULT NULL,
`pre_close` double DEFAULT NULL,
`change` double DEFAULT NULL,
`pct_chg` double DEFAULT NULL,
`vol` double DEFAULT NULL,
`amount` double DEFAULT NULL,
`adj_factor` double DEFAULT NULL,
INDEX `ts_code`(`ts_code`) USING BTREE,
INDEX `trade_date`(`trade_date`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
SET FOREIGN_KEY_CHECKS = 1;
/*
Navicat Premium Data Transfer
Source Server : localhost_3306
Source Server Type : MySQL
Source Server Version : 80021
Source Host : localhost:3306
Source Schema : quant_test
Target Server Type : MySQL
Target Server Version : 80021
File Encoding : 65001
Date: 12/06/2021 13:55:38
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for stock_hfq_temp
-- ----------------------------
DROP TABLE IF EXISTS `stock_hfq_temp`;
CREATE TABLE `stock_hfq_temp` (
`ts_code` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`trade_date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
`amount` double DEFAULT NULL,
`vol` double DEFAULT NULL,
`close` double DEFAULT NULL,
`h1` double DEFAULT NULL,
`mid` double DEFAULT NULL,
`l1` double DEFAULT NULL,
INDEX `ts_code`(`ts_code`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
SET FOREIGN_KEY_CHECKS = 1;
I am trying to build an index in mysql to support a keyset pagination query. My query looks like this:
SELECT * FROM invoice
WHERE company_id = 'someguid'
AND id > 'lastguidfromlastpage'
ORDER BY id
LIMIT 10
Common knowledge on this says an index on company_id would contain the PRIMARY KEY of the table (id). Because of this I would expect to be able to use rows directly from the index without any need for the query to sort results first however my explain plan shows a filesort and an index merge:
mysql> explain SELECT *
-> FROM invoice
-> WHERE company_id = '37687714-2e9d-4daa-aee6-f7d56962f903'
-> AND id > '525ae038-0cc3-4f9a-85e6-6f36d43fae40'
-> ORDER BY id
-> LIMIT 10;
+----+-------------+---------+------------+-------------+-----------------------------+-----------------------------+---------+------+------+----------+---------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------------+-----------------------------+-----------------------------+---------+------+------+----------+---------------------------------------------------------------------------+
| 1 | SIMPLE | invoice | NULL | index_merge | PRIMARY,invoice__company_id | invoice__company_id,PRIMARY | 76,38 | NULL | 48 | 100.00 | Using intersect(invoice__company_id,PRIMARY); Using where; Using filesort |
+----+-------------+---------+------------+-------------+-----------------------------+-----------------------------+---------+------+------+----------+---------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
If I explicitly add the id to the index then I get the explain plan I would expect:
mysql> explain SELECT *
-> FROM invoice
-> WHERE company_id = '37687714-2e9d-4daa-aee6-f7d56962f903'
-> AND id > '525ae038-0cc3-4f9a-85e6-6f36d43fae40'
-> ORDER BY id
-> LIMIT 10;
+----+-------------+---------+------------+-------+--------------------------------+--------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+--------------------------------+--------------------------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | invoice | NULL | range | PRIMARY,invoice__company_id_id | invoice__company_id_id,PRIMARY | 76 | NULL | 98 | 100.00 | Using index condition |
+----+-------------+---------+------------+-------+--------------------------------+--------------------------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
SHOW CREATE TABLE:
CREATE TABLE `invoice` (
`id` varchar(36) NOT NULL,
`company_id` varchar(36) NOT NULL DEFAULT '0',
`invoice_number` varchar(36) NOT NULL DEFAULT '0',
`identifier` varchar(255) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`created_by` varchar(36) DEFAULT NULL,
`data_source` varchar(36) NOT NULL,
`type` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice__company_id_id` (`company_id`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
select ##optimizer_switch;
use_index_extensions=on
MySQL version:
version: 5.7.26-29-57-log
innodb_version: 5.7.26-29
version_comment: Percona XtraDB Cluster (GPL), Release rel29, Revision 03540a3, WSREP version 31.37, wsrep_31.37
SHOW VARIABLES LIKE 'char%';
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
There are a few sources explaining that the company_id index on it's own should be sufficient for this:
https://stackoverflow.com/a/30152513/64023
https://dba.stackexchange.com/a/136029/166838
I've been unable to find official documentation about exactly what to expect. Is this related to the datatypes for the id? Is the common knowledge about mysql+innodb behavior incorrect?
I have encountered this problem before. Here is my analysis of it.
It occurs in MySQL 5.7 and 8.0, but apparently not in older versions and not in MariaDB.
The "solution" I prefer is to change the indexes thus:
INDEX(company_id) -- DROP this
INDEX(company_id, id) -- ADD this
Although the 2-column index is theoretically identical to the one-column index for InnoDB (assuming id is the PK`), the Optimizer seems to ignore this fact in some situations.
Also, I like to explicitly add the PK when I see a need. This signals future readers of the schema (including myself) that some query benefits from the PK being appended.
I have yet to find a case where "index merge intersect" is faster than an equivalent composite index.
I dislike ever using index "hints" for fear that the data distribution will change in the future and my "hint" will make things worse.
This won't work.
For keyset pagination to take effect, you need to have autoincrement integer as your primary id/key. Right now you are using VARCHAR and store UIDs.
Your query won't select "next" UID "larger than" (... AND id > '525ae038-0cc3-4f9a-85e6-6f36d43fae40' ... ).
When you change primary ID to be number, then, this will work.
If you still have issues with indexes, you can try forcing mysql to use your index:
SELECT * FROM invoice USE INDEX (invoice__company_id_id)
WHERE company_id = 'someguid'
AND id > 12345
ORDER BY id
LIMIT 10
We have MySQL table with utf8mb4 strings:
CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `test_code_unique` (`code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
When inserting special characters there appears to be wrong conversion:
mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');
mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');
ERROR 1062 (23000): Duplicate entry 'münster' for key 'test_code_unique'
mysql> SELECT * FROM test WHERE code='münster';
+----+---------+---------+
| id | name | code |
+----+---------+---------+
| 1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)
mysql> SELECT * FROM test WHERE code='munster';
+----+---------+---------+
| id | name | code |
+----+---------+---------+
| 1 | Munster | munster |
+----+---------+---------+
1 row in set (0.00 sec)
If unique key is removed second insert works but a search returns 2 rows even if query is different:
mysql> drop table test;
CREATE TABLE `test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
mysql> insert into `test` (`code`, `name`) values ('munster', 'Munster');
mysql> insert into `test` (`code`, `name`) values ('münster', 'Münster');
mysql> SELECT * FROM test WHERE code='münster';
+----+----------+----------+
| id | name | code |
+----+----------+----------+
| 1 | Munster | munster |
| 2 | Münster | münster |
+----+----------+----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM test WHERE code='munster';
+----+----------+----------+
| id | name | code |
+----+----------+----------+
| 1 | Munster | munster |
| 2 | Münster | münster |
+----+----------+----------+
2 rows in set (0.00 sec)
This has been tested on both MySQL 5.7 and MariaDB 10.2 and they both give same results.
What could be going wrong?
The reason for this seemingly mysterious problem is that you're using utf8mb4_unicode_ci collation, and that collation intentionally ignores differences in accented characters vs non-accented characters. See: https://dev.mysql.com/doc/refman/5.7/en/charset-general.html
To resolve this, change collation on code column to utf8mb4_bin, which will distinguish between accented characters and non-accented characters, and also between caSe.
I have a rather large database that I am trying to convert from charset and collation latin1/latin1_swedish_ci to utf8mb4/utf8mb4_unicode_ci. I am hoping to setup replication to a slave, run the conversion, and then promote the slave when finished as to avoid downtime.
I noticed that when running the query...
ALTER TABLE `sometable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
...MySQL automatically converts text to mediumtext or mediumtext to longtext, etc.
Is there a way to turn this feature off? It is nice that MySQL has this feature, but the problem is that it breaks replication because the structure of the tables on the slave is different from master.
As documented under ALTER TABLE Syntax:
For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT column's length bytes, so MySQL will convert the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT.
To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example:
ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;
(Not really an answer, but some illustrative examples.)
Case 1: Text is correctly stored as latin1 in latin1 column; use CONVERT TO
mysql> CREATE TABLE alters (
-> c VARCHAR(11) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')), (UNHEX('61e16263'));
mysql> SELECT c, HEX(c) from alters;
+-------+----------+
| c | HEX(c) |
+-------+----------+
| aabc | 61616263 |
| aàbc | 61E06263 |
| aábc | 61E16263 |
+-------+----------+
mysql> ALTER TABLE alters CONVERT TO CHARACTER SET utf8;
mysql> SELECT c, HEX(c) from alters;
+-------+------------+
| c | HEX(c) |
+-------+------------+
| aabc | 61616263 |
| aàbc | 61C3A06263 |
| aábc | 61C3A16263 |
+-------+------------+
mysql> -- Observation: text was correctly converted to utf8.
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Case 2: Text is correctly stored as latin1 in latin1 column; use "Double ALTER"
mysql> CREATE TABLE alters (
-> c VARCHAR(11) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')), (UNHEX('61e16263'));
mysql> ALTER TABLE alters MODIFY c VARBINARY(11) NOT NULL;
mysql> ALTER TABLE alters MODIFY c VARCHAR(11) CHARACTER SET utf8 NOT NULL;
Query OK, 3 rows affected, 2 warnings (0.10 sec)
Records: 3 Duplicates: 0 Warnings: 2
mysql> SHOW WARNINGS;
+---------+------+----------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xE0bc' for column 'c' at row 2 |
| Warning | 1366 | Incorrect string value: '\xE1bc' for column 'c' at row 3 |
+---------+------+----------------------------------------------------------+
mysql> SELECT c, HEX(c) from alters;
+------+----------+
| c | HEX(c) |
+------+----------+
| aabc | 61616263 |
| a | 61 |
| a | 61 |
+------+----------+
mysql> -- Observation: text was truncated ! BAD
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(11) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Case 3: Text was incorrectly stored as utf8 in a latin1 column; use the "Double ALTER to fix it
mysql> CREATE TABLE alters (
-> c VARCHAR(11) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61c3a06263')), (UNHEX('61c3a16263'));
mysql> ALTER TABLE alters MODIFY c VARBINARY(11) NOT NULL;
mysql> ALTER TABLE alters MODIFY c VARCHAR(11) CHARACTER SET utf8 NOT NULL;
mysql> SELECT c, HEX(c) from alters;
+-------+------------+
| c | HEX(c) |
+-------+------------+
| aabc | 61616263 |
| aàbc | 61C3A06263 |
| aábc | 61C3A16263 |
+-------+------------+
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(11) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Case 4: Using ALTER ... MODIFY; note the LENGTH and CHAR_LENGTH
mysql> CREATE TABLE alters (
-> c VARCHAR(9) CHARACTER SET latin1 NOT NULL
-> );
mysql> INSERT INTO alters (c) VALUES ('aabc'), (UNHEX('61e06263')),
-> (UNHEX('61e16263')),
-> (UNHEX('61e162633536373839'));
mysql> SELECT c, HEX(c), LENGTH(c), CHAR_LENGTH(c) from alters;
+------------+--------------------+-----------+----------------+
| c | HEX(c) | LENGTH(c) | CHAR_LENGTH(c) |
+------------+--------------------+-----------+----------------+
| aabc | 61616263 | 4 | 4 |
| aàbc | 61E06263 | 4 | 4 |
| aábc | 61E16263 | 4 | 4 |
| aábc56789 | 61E162633536373839 | 9 | 9 |
+------------+--------------------+-----------+----------------+
mysql> ALTER TABLE alters MODIFY c VARCHAR(9) CHARACTER SET utf8 NOT NULL;
mysql> SELECT c, HEX(c), LENGTH(c), CHAR_LENGTH(c) from alters;
+------------+----------------------+-----------+----------------+
| c | HEX(c) | LENGTH(c) | CHAR_LENGTH(c) |
+------------+----------------------+-----------+----------------+
| aabc | 61616263 | 4 | 4 |
| aàbc | 61C3A06263 | 5 | 4 |
| aábc | 61C3A16263 | 5 | 4 |
| aábc56789 | 61C3A162633536373839 | 10 | 9 |
+------------+----------------------+-----------+----------------+
mysql> SHOW CREATE TABLE alters\G
Create Table: CREATE TABLE `alters` (
`c` varchar(9) CHARACTER SET utf8 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Notes:
No Warnings except for the one case where I did SHOW.
Default table CHARSET was not changed, but that is not a problem.
I see a weird behavior where my auto-increment column number is only increasing in a step of 2 instead of 1. So I end up with row ids as 1, 3, 5, 7. I use MySQL 5.6 + InnoDB as the engine. Any idea why this weirdness?
mysql> select version();
+-----------------+
| version() |
+-----------------+
| 5.6.20-68.0-log |
+-----------------+
1 row in set (0.00 sec)
mysql> show create table temp_table;
| superset_version | CREATE TABLE `temp_table` (
`_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'comm1',
`name` varchar(100) NOT NULL COMMENT 'comm2',
`start_time` bigint(20) NOT NULL COMMENT 'comm3',
`updated_at` bigint(20) NOT NULL COMMENT 'comm4',
`status` varchar(50) NOT NULL COMMENT 'comm5',
PRIMARY KEY (`_id`)
) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=utf8 |
Notice that insert incremented the _id column by a difference of 2.
mysql> insert into superset_version(name, start_time, updated_at, status) value("TEMP ROW", -1, -1, "erro");
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> select * from superset_version order by _id desc limit 3;
+-----+----------+------------+------------+--------+
| _id | name | start_time | updated_at | status |
+-----+----------+------------+------------+--------+
| 33 | TEMP ROW | -1 | -1 | erro |
| 31 | TEMP ROW | -1 | -1 | erro |
| 29 | TEMP ROW | -1 | -1 | erro |
+-----+----------+------------+------------+--------+
3 rows in set (0.00 sec)
auto_increment_increment setting is most likely set to 2, therefore mysql increases auto increment numbers by 2. Use show varibles like ... command to check the setting.
for double check you could try:
SELECT AUTO_INCREMENT
From `information_schema`.`TABLES`
WHERE TABLE_NAME = '<<YOUR TABLE NAME HERE>>' AND
TABLE_SCHEMA = '<< YOUR DATABASE NAME HERE >>'
if the step is really equals 2, it's possible to replace it.
or one another way:
Check values in:
mysql.cnf / ini:
auto-increment-increment
auto-increment-offset