I have a large table (over 2 billion records) which is partitioned. Each partition contains roughly 500 million records. I have recently moved from physical hardware to AWS, i used a mysqldump to backup and restore the MySQL data. I have also recently created a new partition (p108). Querying data from old partitions (created on the old server) are running as normal, very quick, returning data in seconds. However querying records in the newly created partition (p108) is very slow - minutes.
show create table results
CREATE TABLE `termusage`
(
`id` BIGINT(20) NOT NULL auto_increment,
`terminal` BIGINT(20) DEFAULT NULL,
`date` DATETIME DEFAULT NULL,
`dest` VARCHAR(255) DEFAULT NULL,
`feattrans` BIGINT(20) DEFAULT NULL,
`cost_type` TINYINT(4) DEFAULT NULL,
`cost` DECIMAL(16, 6) DEFAULT NULL,
`gprsup` BIGINT(20) DEFAULT NULL,
`gprsdown` BIGINT(20) DEFAULT NULL,
`duration` TIME DEFAULT NULL,
`file` BIGINT(20) DEFAULT NULL,
`custcost` DECIMAL(16, 6) DEFAULT '0.000000',
`invoice` BIGINT(20) NOT NULL DEFAULT '99999999',
`carriertrans` BIGINT(20) DEFAULT NULL,
`session_start` DATETIME DEFAULT NULL,
`session_end` DATETIME DEFAULT NULL,
`mt_mo` VARCHAR(4) DEFAULT NULL,
`grps_rounded` BIGINT(20) DEFAULT NULL,
`gprs_rounded` BIGINT(20) DEFAULT NULL,
`country` VARCHAR(25) DEFAULT NULL,
`network` VARCHAR(25) DEFAULT NULL,
`ctn` VARCHAR(20) DEFAULT NULL,
`pricetrans` BIGINT(20) DEFAULT NULL,
PRIMARY KEY (`id`, `invoice`),
KEY `idx_terminal` (`invoice`, `terminal`),
KEY `idx_feattrans` (`invoice`, `feattrans`),
KEY `idx_file` (`invoice`, `file`),
KEY `termusage_carriertrans_idx` (`carriertrans`),
KEY `idx_ctn` (`invoice`, `ctn`),
KEY `idx_pricetrans` (`invoice`, `pricetrans`)
)
engine=innodb
auto_increment=17449438880
DEFAULT charset=latin1
/*!50500 PARTITION BY RANGE COLUMNS(invoice)
(PARTITION p103 VALUES LESS THAN (621574) ENGINE = InnoDB,
PARTITION p104 VALUES LESS THAN (628214) ENGINE = InnoDB,
PARTITION p106 VALUES LESS THAN (634897) ENGINE = InnoDB,
PARTITION p107 VALUES LESS THAN (649249) ENGINE = InnoDB,
PARTITION p108 VALUES LESS THAN (662763) ENGINE = InnoDB,
PARTITION plast VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */
I created the partition p108 using the following query
ALTER TABLE termusage reorganize partition plast
INTO ( partition p108 VALUES less than (662763),
partition plast VALUES less than maxvalue )
I can see the file termusage#p#p108.ibd and looks to be "normal" and the data is there as i can get results from the query.
information_schema.PARTITIONS shows the following for the table - which indicates there is some kind of issue
Name Pos Rows Avg Data Length Method
p103 1 412249206 124 51124371456 RANGE COLUMNS
p104 2 453164890 133 60594061312 RANGE COLUMNS
p106 3 542767414 135 73562849280 RANGE COLUMNS
p107 4 587042147 129 76288098304 RANGE COLUMNS
p108 5 0 0 16384 RANGE COLUMNS
plast 6 0 0 16384 RANGE COLUMNS
How can i fix the partition ?
Updated
Explain for good query
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, t, p107, ref, idx_terminal,idx_feattrans,idx_file,idx_ctn,idx_pricetrans, idx_terminal, 17, const,const, 603, 100.00, Using index condition; Using temporary; Using filesort
Explain for poor query
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, t, p108, ALL, idx_terminal,idx_feattrans,idx_file,idx_ctn,idx_pricetrans, , , , 1, 100.00, Using where; Using temporary; Using filesort
For future readers, the issue was resolved by running ALTER TABLE ... ANALYZE PARTITION p108.
The table and index statistics that guide the optimizer to choose the best way to read the table were out of date. It's common to use ANALYZE to make sure these statistics are updated after a significant data load or delete.
Related
I have partitioned a MySQL table containing 53 rows. Now when I query number of records in all partitions, the records are almost 3 times the expected. Even phpMyAdmin thinks there are 156 records.
Have I done somthing wrong in my table design and partitioning?
Below picture shows count of records in partitions:
phpMyAdmin:
Finally, this is my table:
CREATE TABLE cl_inbox (
id int(11) NOT NULL AUTO_INCREMENT,
user int(11) NOT NULL,
contact int(11) DEFAULT NULL,
sdate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
body text NOT NULL,
userstatus tinyint(4) NOT NULL DEFAULT 1 COMMENT '0: new, 1:read, 2: deleted',
contactstatus tinyint(4) NOT NULL DEFAULT 0,
class tinyint(4) NOT NULL DEFAULT 0,
attachtype tinyint(4) NOT NULL DEFAULT 0,
attachsrc varchar(255) DEFAULT NULL,
PRIMARY KEY (id, user),
INDEX i_class (class),
INDEX i_contact_user (contact, user),
INDEX i_contactstatus (contactstatus),
INDEX i_user_contact (user, contact),
INDEX i_userstatus (userstatus)
)
ENGINE = INNODB
AUTO_INCREMENT = 69
AVG_ROW_LENGTH = 19972
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = DYNAMIC
PARTITION BY KEY (`user`)
(
PARTITION partition1 ENGINE = INNODB,
PARTITION partition2 ENGINE = INNODB,
PARTITION partition3 ENGINE = INNODB,
.....
PARTITION partition128 ENGINE = INNODB
);
Those numbers are approximations, just as with SHOW TABLE STATUS and EXPLAIN.
Meanwhile, you will probably find that PARTITION BY KEY provides no performance improvement. If you find otherwise, I would be very interested to hear about it.
I'm trying to understand the massive difference in query time between the following two queries on my InnoDB table:
SELECT *
FROM db_telemetry.monitor_data
WHERE monitor_id = 6
AND created_at > '2019/11/14'
AND created_at < '2019/11/29';
4317 rows returned in 37.672s
SELECT *
FROM db_telemetry.monitor_data USE INDEX(ix_monitor_data_created_at)
WHERE monitor_id = 6
AND created_at > '2019/11/14'
AND created_at < '2019/11/29';
4317 rows returned in 0.110s
According to EXPLAIN the optimizer in the first (slow) query is choosing monitor_id for its index key. From what I've read this is strange because monitor_id has comparatively low cardinality (see below)
My table:
SHOW CREATE TABLE monitor_data
CREATE TABLE `monitor_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`monitor_id` int(11) NOT NULL DEFAULT '0',
`vbattery` float DEFAULT NULL,
`rssi` float DEFAULT NULL,
`ecio` float DEFAULT NULL,
`tboard` float DEFAULT NULL,
`txbytes` float DEFAULT NULL,
`rxbytes` float DEFAULT NULL,
`satelite_count` float DEFAULT NULL,
`gps_fix` float DEFAULT NULL,
`drive_space_remaining` float DEFAULT NULL,
`other` text,
`daq_reachable` tinyint(1) DEFAULT NULL,
`monitor_reachable` tinyint(1) DEFAULT NULL,
`clock_reset_flag` tinyint(1) DEFAULT NULL,
`site_key` varchar(50) DEFAULT NULL,
`internal_temp` float DEFAULT NULL,
`vin` float DEFAULT NULL,
`webrelay_reachable` tinyint(1) DEFAULT NULL,
`daq_current_time` datetime DEFAULT NULL,
`webrelay_current_time` datetime DEFAULT NULL,
`latitude` float DEFAULT NULL,
`longitude` float DEFAULT NULL,
`speed` float DEFAULT NULL,
PRIMARY KEY (`id`,`monitor_id`),
KEY `monitor_id` (`monitor_id`),
KEY `ix_monitor_data_site_key` (`site_key`),
KEY `ix_monitor_data_created_at` (`created_at`),
CONSTRAINT `monitor_data_ibfk_1` FOREIGN KEY (`monitor_id`) REFERENCES `monitors` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10839466 DEFAULT CHARSET=latin1
Its indexes:
SHOW INDEX FROM monitor_data
Table Non_unique Key_name Seq_in_index Column_name Cardinality
------------------------------------------------------------------------------------------------
monitor_data 0 PRIMARY 1 id 11311240
monitor_data 0 PRIMARY 2 monitor_id 11311240
monitor_data 1 monitor_id 1 monitor_id 110
monitor_data 1 ix_monitor_data_site_key 1 site_key 28137
monitor_data 1 ix_monitor_data_created_at 1 created_at 11311240
Sub_part and Packed all NULL
Index_type all BTREE
Collation all 'A'
This is MySQL version 5.6.40 running on an AWS RDS t2.small instance with a 20GB general purpose SSD.
If I use only the monitor_id condition:
SELECT *
FROM db_telemetry.monitor_data
WHERE monitor_id = 6;
274324 rows returned in 0.078s
If I use only the created_at condition:
SELECT *
FROM db_telemetry.monitor_data
WHERE created_at > '2019/11/14'
AND created_at < '2019/11/29';
202976 rows returned in 0.109s
So, questions:
Why does the optimizer choose monitor_id by default for index, and is there a likely problem with my schema making USE INDEX() necessary?
Since in isolation both indexes reduce dataset to a similar # of
rows why is the multi-condition query so much slower using monitor_id for
index?
NOTE: I've observed for certain smaller date ranges that the optimizer flips over to picking ix_monitor_data_created_at
PRIMARY KEY (`id`,`monitor_id`),
does not make sense when id is the AUTO_INCREMENT. Perhaps the only difference with PRIMARY KEY(id) is that you are allowing duplicate values for id. (But you would have to explicitly set id to get a dup.) Either way, the PK is 'clustered' with the data, and the data is ordered by id.
For the query, you need this composite index:
INDEX(monitor_id, created_at)
Why did the Optimizer pick the 'wrong' index? A lot of possible reasons, but mostly because it does not have sufficient statistics. Another possible reason for the wide difference in timings is ...
What order were the rows inserted into the table? Presumably 'chronologically'? That is, the rows for that date range are "near" each other, making the use of that index "fast". Meanwhile, looking up by monitor_id implies jumping all over the table.
My composite index defeats all the issues by dipping into the BTree for the index at (6, '2019/11/14'), then scanning forward, until exactly all the 4317 index rows are found. Meanwhile, it reaches over into the data (via id) to get SELECT *.
Another issue... You probably ran the 4 queries in the order shown, and starting with a 'cold' cache (the buffer_pool). That is, the first query had the overhead of 4317 disk reads. (Note: that takes about 43.17 seconds on a HDD.) Then the other SELECTs found all that cached.
So... When runing timing test, run the query twice.
I apologize for the ambiguity of the column and table names.
My database has two tables A and B. Its a many to many relationship between these tables.
Table A has around 200 records
Table A structure
Id. Definition
12 Def1
42 Def2 .... etc.
Table B has around 5 Billion records
Column 1 . Associated Id(from table A)
eg . abc 12
abc 21
pqr 42
I am trying to optimize the way data is stored in table B, as it has a lot of redundant data. The structure am thinking of, is as follows
Column 1 Associated Ids
abc 12, 21
pqr 42
The "Associated Id" column can have updates when new rows are added to table A.
Is this a good structure to create in this scenario? If yes what should the column type be for the "Associated Id"? I am using mysql database.
Create table statements.
CREATE TABLE `A` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(100) DEFAULT NULL,
`name` varchar(100) DEFAULT NULL,
`creat_usr_id` varchar(20) NOT NULL,
`creat_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`modfd_usr_id` varchar(20) DEFAULT NULL,
`modfd_ts` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `A_ak1` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=277 DEFAULT CHARSET=utf8;
CREATE TABLE `B`(
`col1` varchar(128) NOT NULL,
`id` int(11) NOT NULL,
`added_dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`creat_usr_id` varchar(20) NOT NULL,
`creat_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`col1`,`id`,`added_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(added_dt))
(PARTITION Lessthan_2016 VALUES LESS THAN (1451606400) ENGINE = InnoDB,
PARTITION L`Ω`essthan_201603 VALUES LESS THAN (1456790400) ENGINE = InnoDB,
PARTITION Lessthan_201605 VALUES LESS THAN (1462060800) ENGINE = InnoDB,
PARTITION Lessthan_201607 VALUES LESS THAN (1467331200) ENGINE = InnoDB,
PARTITION Lessthan_201609 VALUES LESS THAN (1472688000) ENGINE = InnoDB,
PARTITION Lessthan_201611 VALUES LESS THAN (1477958400) ENGINE = InnoDB,
PARTITION Lessthan_201701 VALUES LESS THAN (1483228800) ENGINE = InnoDB,
PARTITION pfuture VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
Indexes.
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Index_type Comment Index_comment
B 0 PRIMARY 1 col1 A
2 NULL NULL BTREE
B 0 PRIMARY 2 id A
6 NULL NULL BTREE
B 0 PRIMARY 3 added_dt A
6 NULL NULL BTREE
5 billion rows here. Let me walk through things:
col1 varchar(128) NOT NULL,
How often is this column repeated? That is, is is worth it to 'normalize it?
id int(11) NOT NULL,
Cut the size of this column in half (4 bytes -> 2), since you have only 200 distinct ids:
a_id SMALLINT UNSIGNED NOT NULL
Range of values: 0..65535
added_dt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
Please explain why this is part of the PK. That is a rather odd thing to do.
creat_usr_id varchar(20) NOT NULL,
creat_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
Toss these as clutter, unless you can justify keeping track of 5 billion actions this way.
PRIMARY KEY (col1,id,added_dt)
I'll bet you will eventually get two rows in the same second. A PK is 'unique'. Perhaps you need only (col, a_id)`? Else, you are allowing a col-a_id pair to be added multiple times. Or maybe you want IODKU to add a new row versus update the timestamp?
PARTITION...
This is useful if (and probably only if) you intend to remove 'old' rows. Else please explain why you picked partitioning.
It is hard to review a schema without seeing the main SELECTs. In the case of large tables, we should also review the INSERTs, UPDATEs, and DELETEs, since each of them could pose serious performance problems.
At 100 rows inserted per second, it will take more than a year to add 5B rows. How fast will the rows be coming in? This may be a significant performance issue, too.
Hello this query is producing this explain which is odd considering I have a index set up for both the columns
'1', 'SIMPLE', 'vtr_video_transactions', 'ALL', 'user_standard,user_date', NULL, NULL, NULL, '5', 'Using where; Using filesort'
CREATE TABLE `vtr_video_transactions` (
`vtr_id` int(11) NOT NULL AUTO_INCREMENT,
`vtr_transaction_id` int(11) unsigned DEFAULT NULL,
`vtr_user_id` int(11) unsigned DEFAULT NULL,
`vtr_standards_id` int(11) unsigned DEFAULT NULL,
`vtr_video_date` datetime DEFAULT NULL,
PRIMARY KEY (`vtr_id`),
KEY `user_standard` (`vtr_user_id`,`vtr_standards_id`),
KEY `user_date` (`vtr_user_id`,`vtr_video_date`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
See user_date for the index. I have it set to DESC on MYSQL Workbench. But I get filesort with the explain. Not sure why. Cheers
Data for table
LOCK TABLES `vtr_video_transactions` WRITE;
/*!40000 ALTER TABLE `vtr_video_transactions` DISABLE KEYS */;
INSERT INTO `vtr_video_transactions` VALUES (1,1,1,2,'2015-09-05 17:18:59'),(2,2,1,3,'2015-08-27 19:04:12'),(3,2,1,4,'2015-08-27 18:55:53'),(4,10,1,119,'2015-08-27 19:04:12'),(5,11,1,10,'2015-08-27 19:04:12');
See the manual page here
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.
I update MySQL versition from 5.0 to 5.5. and I am new for studying mysql partition. firstly, I type:
SHOW VARIABLES LIKE '%partition%'
Variable_name Value
have_partitioning YES
Make sure that the new version support partition. I tried to partition my table by every 10 minutes, then INSERT, UPDATE, QUERY huge data into this table for a test.
First, I need create a new table, I type my code:
CREATE TABLE test (
`id` int unsigned NOT NULL auto_increment,
`words` varchar(100) collate utf8_unicode_ci NOT NULL,
`date` varchar(10) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `index` (`words`)
)
ENGINE=MyISAM
DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci
AUTO_INCREMENT=0
PARTITION BY RANGE (MINUTE(`date`))
(
PARTITION p0 VALUES LESS THAN (1322644000),
PARTITION p1 VALUES LESS THAN (1322644600) ,
PARTITION p2 VALUES LESS THAN (1322641200) ,
PARTITION p3 VALUES LESS THAN (1322641800) ,
PARTITION p4 VALUES LESS THAN MAXVALUE
);
It return alert: #1564 - This partition function is not allowed, so what is this problem? thanks.
UPDATE
Modify date into int NOT NULL, and PARTITION BY RANGE MINUTE(date) into PARTITION BY RANGE COLUMNS(date)
CREATE TABLE test (
`id` int unsigned NOT NULL auto_increment,
`words` varchar(100) collate utf8_unicode_ci NOT NULL,
`date` int NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `index` (`words`)
)
ENGINE=MyISAM
DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci
AUTO_INCREMENT=0
PARTITION BY RANGE COLUMNS(`date`)
(
PARTITION p0 VALUES LESS THAN (1322644000),
PARTITION p1 VALUES LESS THAN (1322644600) ,
PARTITION p2 VALUES LESS THAN (1322641200) ,
PARTITION p3 VALUES LESS THAN (1322641800) ,
PARTITION p4 VALUES LESS THAN MAXVALUE
);
Then caused new error: #1214 - The used table type doesn't support FULLTEXT indexes
I am so sorry, mysql not support fulltext and partition at the same time.
See partitioning limitations
FULLTEXT indexes. Partitioned tables do not support FULLTEXT indexes or searches. This includes partitioned tables employing the MyISAM storage engine.
One issue might be
select MINUTE('2008-10-10 56:56:98') returns null, the reason is Minute function returns minute from time or datetime value, where as in your case date is varchar
MINUTE function returns in either date/datetime expression. Again, A partitioning key must be either an integer column or an expression that resolves to an
integer but inyour case it's VARCHAR