I am having issue to partition a table using partition by range on a datetime column.
the test search result is still on full partition scan.
I saw some posts on the net in regards to this issue, but not sure if there is any way to fix it or bypass the issue.
mysql server: Percona 5.5.24-55.
table:
id bigint(20) unsigned NOT NULL,
time datatime unsigned NOT NULL,
....
....
KEY id_time (id,time)
engine=InnoDB
partition statement:
alter table summary_201204
partition by range (day(time))
subpartition by key(id)
subpartitions 5 (
partition p0 values less than (6),
partition p1 values less than (11),
partition p2 values less than (16),
partition p3 values less than (21),
partition p4 values less than (26),
partition p5 values less than (MAXVALUE) );
check:
explain partitions select * from summary_201204 where time < '2012-07-21';
result: p0_p0sp0,p0_p0sp1,p0_p0sp2,p0_p0sp3,p0_p0sp4,p1_p1sp0,p1_p1sp1,p1_p1sp2,p1_p1sp3,p1_p1sp4,p2_p2sp0,p2_p2sp1,p2_p2sp2,p2_p2sp3,p2_p2sp4,p3_p3sp0,p3_p3sp1,p3_p3sp2,p3_p3sp3,p3_p3sp4,p4_p4sp0,p4_p4sp1,p4_p4sp2,p4_p4sp3,p4_p4sp4,p5_p5sp0,p5_p5sp1,p5_p5sp2,p5_p5sp3,p5_p5sp4.
I think here is the answer: Visit enter link description here
So, the documentation within the mysql official site is not clear enough about the data types required for partition. In this case, if the table data type is datetime, then we should use to_seconds, whilst if the data type is DATE then we can use YEA
Related
I have a table that contains a month and a year column.
I have a query which usually looks something like WHERE month=1 AND year=2022
Given how large this table is i would like to make it more efficient using partitions and sub partitions.
table 1
Querying the data i need took around 2 minutes and 30 seconds.
CREATE TABLE `table_1` (
`id` int NOT NULL AUTO_INCREMENT,
`entity_id` varchar(36) NOT NULL,
`entity_type` varchar(36) NOT NULL,
`score` decimal(4,3) NOT NULL,
`month` int NOT NULL DEFAULT '0',
`year` int NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_month_year` (`month`,`year`, `entity_type`)
)
Partitioning by "month"
Querying the data i need took around 21 seconds (big improvement).
CREATE TABLE `table_1` (
`id` int NOT NULL AUTO_INCREMENT,
`entity_id` varchar(36) NOT NULL,
`entity_type` varchar(36) NOT NULL,
`score` decimal(4,3) NOT NULL,
`month` int NOT NULL DEFAULT '0',
`year` int NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`,`month`),
KEY `idx_month_year` (`month`,`year`, `entity_type`)
) ENGINE=InnoDB AUTO_INCREMENT=21000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY LIST (`month`)
(PARTITION p0 VALUES IN (0) ENGINE = InnoDB,
PARTITION p1 VALUES IN (1) ENGINE = InnoDB,
PARTITION p2 VALUES IN (2) ENGINE = InnoDB,
PARTITION p3 VALUES IN (3) ENGINE = InnoDB,
PARTITION p4 VALUES IN (4) ENGINE = InnoDB,
PARTITION p5 VALUES IN (5) ENGINE = InnoDB,
PARTITION p6 VALUES IN (6) ENGINE = InnoDB,
PARTITION p7 VALUES IN (7) ENGINE = InnoDB,
PARTITION p8 VALUES IN (8) ENGINE = InnoDB,
PARTITION p9 VALUES IN (9) ENGINE = InnoDB,
PARTITION p10 VALUES IN (10) ENGINE = InnoDB,
PARTITION p11 VALUES IN (11) ENGINE = InnoDB,
PARTITION p12 VALUES IN (12) ENGINE = InnoDB) */
I would like to see if i can improve the performance even further by partitioning by year and then subpartitioning by month. How can i do that?
I'm not sure the following question Partition by year and sub-partition by month mysql is relevant with no marked answers and that question looks to be particular to mysql 5* and php. Im asking about mysql 8, are there no changes since then regarding partioning/subpartioning/list columns/range columns etc? which could help me.
Broader query im making
SELECT
table_1.entity_id AS entity_id,
table_1.entity_type,
table_1.score
FROM table_1
WHERE table_1.month = 12 AND table_1.year = 2022
AND table_1.score > 0
AND table_1.entity_type IN ('type1', 'type2', 'type3', 'type4') # only ever 4 types usually all 4 are present in the query
To answer your question directly, below is example syntax that accomplishes the subpartitioning. Notice the PRIMARY KEY must include all columns used for partitioning or subpartitioning. Read the manual on subpartitioning for more information: https://dev.mysql.com/doc/refman/8.0/en/partitioning-subpartitions.html
Schema (MySQL v8.0)
CREATE TABLE `table_1` (
`id` int NOT NULL AUTO_INCREMENT,
`entity_id` varchar(36) NOT NULL,
`entity_type` varchar(36) NOT NULL,
`score` decimal(4,3) NOT NULL,
`month` int NOT NULL DEFAULT '0',
`year` int NOT NULL DEFAULT '0',
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`,`month`, `year`),
KEY `idx_month_year` (`month`,`year`, `score`, `entity_type`)
) ENGINE=InnoDB AUTO_INCREMENT=21000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY LIST (`month`)
SUBPARTITION BY HASH(`year`)
SUBPARTITIONS 10 (
PARTITION p0 VALUES IN (0) ENGINE = InnoDB,
PARTITION p1 VALUES IN (1) ENGINE = InnoDB,
PARTITION p2 VALUES IN (2) ENGINE = InnoDB,
PARTITION p3 VALUES IN (3) ENGINE = InnoDB,
PARTITION p4 VALUES IN (4) ENGINE = InnoDB,
PARTITION p5 VALUES IN (5) ENGINE = InnoDB,
PARTITION p6 VALUES IN (6) ENGINE = InnoDB,
PARTITION p7 VALUES IN (7) ENGINE = InnoDB,
PARTITION p8 VALUES IN (8) ENGINE = InnoDB,
PARTITION p9 VALUES IN (9) ENGINE = InnoDB,
PARTITION p10 VALUES IN (10) ENGINE = InnoDB,
PARTITION p11 VALUES IN (11) ENGINE = InnoDB,
PARTITION p12 VALUES IN (12) ENGINE = InnoDB
);
Using EXPLAIN on your query reveals that the query references only one subpartition.
Query #1
EXPLAIN
SELECT
table_1.entity_id AS entity_id,
table_1.entity_type,
table_1.score
FROM table_1
WHERE table_1.month = 12
AND table_1.year = 2022
AND table_1.score > 0
AND table_1.entity_type IN ('type1', 'type2', 'type3', 'type4');
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
table_1
p12_p12sp2
range
idx_month_year
idx_month_year
11
1
100
Using index condition
The partitions field of the EXPLAIN shows that it accesses only partition p12_p12sp2. The year the query references, 2022, modulus the number of subpartitions, 10, will read from the subpartition 2.
In addition to the partitioning by month and year, it is also helpful to use an index. In this case, I added score to the index so it would filter out rows where score <= 0. The note in the EXPLAIN "Using index condition" shows that it is delegating further filtering on entity_type to the storage engine. Though in your example, you said there are only four values for entity type, and all four are selected, so that condition won't filter out any rows anyway.
View on DB Fiddle
Re your questions in comments below:
a little bit confused on SUBPARTITIONS 10 , why 10
It's just an example. You can choose a different number of subpartitions. Whatever you feel is required to reduce the search as much as you want.
To be honest, I've never encountered a situation that required subpartitioning at all, if the search is also optimized with indexes. So I have no guidance on what is an appropriate number of subpartitions.
It's your responsibility to test performance until you are satisfied.
also bit confusd on the partition name p12_p12sp2 how do i know it selected the partition with year 2022 from looking at that?
The query has a condition year = 2022.
There are 10 subpartitions in my example.
Hash partitioning just uses the integer value to be partitioned, modulus the number of partitions.
2022 modulus 10 is 2. Hence the partition ending in ...sp2 is the one used.
I also came across this anothermysqldba.blogspot.com/2014/12/… do you know how yours differs from what it shown here ( bare in mind that blog is from 2014)
They chose to name the subpartitions. There's no need to do that.
would there be any performance difference in having a single date e.g (2022-12-21) instead of sepreate columns month and year.
That depends on the query, and I'll leave it to you to test. Any predictions I make won't be accurate with your data on your server.
i can also see that you partition by month and subpartition by year, as oppose to partition by year and subpartition by month. can you explain the reasoning?
Subpartitioning works only if the outer partitions are LIST or RANGE partitions, and the subpartitions are HASH or KEY partitions. This is in the manual page I linked to.
There are a finite number of months (12). This makes it easy to partition by LIST as you did. You won't ever need more partitions. If you had partitioned by YEAR as the outer partition, you would have needed to specify year values in the list, and this is a growing set, so you would periodically have to alter the table to extend the list or range to account for new years.
Whereas when partitioning by HASH for the subpartitioning, the new year values are mapped into the finite set of subpartitions, so it's okay that it's not a finite list. You won't have to alter table to repartition (unless you want to change the number of subpartitions).
Splitting a date into columns is usually counterproductive. It is much easier to split during SELECT.
PARTITIONing is usually useless for performance of any SELECT.
When partitioning (or unpartitioning), the indexes usually need changing.
For that query, I recommend a combined date column,
WHERE date >= '2022-01-01'
AND date < '2022-01-01' + INTERVAL 1 MONTH
and some INDEX starting with date.
(You probably have other queries; let's see some of them; they may need a different index.)
Covering index -- This is an index that contains all the columns found anywhere in the SELECT. It is may be better (faster) than having only the columns needed for WHERE or WHERE + GROUP BY + ORDER BY. It depends on a lot of variables.
Order of columns in an index (or PK): The leftmost column(s) have priority. That is the order of the index rows on disk. PK(id, date) is useful if looking up by id (in the WHERE), but not if you are just searching by date.
Sargable -- sargable -- Hiding a column in a function disables the use of an index. That is MONTH(date) cannot use INDEX(date).
Blogs -- Index Cookbook and Partition
Test plan
I recommend you time all your queries against a variety of Create Tables.
For the WHERE clause:
The order of ANDs does not matter.
When using IN, a single value os equivalent to = and optimizes better. Multiple values may optimize more poorly. As Bill hints at, when the IN list contains all the options, you should eliminate the clause since the Optimizer is not smart enough. So, be sure to test with 1 and/or many items, so as to be realistic to your app.
For the table
Try Partition BY year + Subpartition by month.
Try Partition by a column that is the combination of year and month.
Try without partitioning.
For indexes
Order of the columns (in a composite index) does matter, so try different orderings.
When partitioning, be sure to tack onto the end of the PK the partition key(s).
A partitioned table needs different indexes than a non-partitioned table. That is, what works well for one may work poorly for the other.
Simply use something like this pattern to test various layouts:
CREATE TABLE (( a new layout with or without partitioning and with indexes ))
INSERT INTO test_table SELECT ... FROM real_table;
Change the "..." to adapt to any extra/missing columns in test_table
SELECT ...
Run various 'real' queries
Run each query twice (caching sometimes messes with the timing)
Report the results -- If you provide sufficient info (CREATE TABLE and SELECT), I may have suggestions on further speeding up the test (whether it is partitioned or not).
I have created a table in MYSQL using following syntax:
CREATE TABLE `demo` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID',
`date` datetime NOT NULL COMMENT 'date',
`desc` enum('error','audit','info') NOT NULL,
PRIMARY KEY (`id`,`date`)
)
PARTITION BY RANGE (MONTH(`date`))
(
PARTITION JAN VALUES LESS THAN (2),
PARTITION FEB VALUES LESS THAN (3),
PARTITION MAR VALUES LESS THAN (4),
PARTITION APR VALUES LESS THAN (5),
PARTITION MAY VALUES LESS THAN (6),
PARTITION JUN VALUES LESS THAN (7),
PARTITION JUL VALUES LESS THAN (8),
PARTITION AUG VALUES LESS THAN (9),
PARTITION SEP VALUES LESS THAN (10),
PARTITION OCT VALUES LESS THAN (11),
PARTITION NOV VALUES LESS THAN (12),
PARTITION `DEC` VALUES LESS THAN (MAXVALUE)
);
Here id and date is the combined primary key and I have used date as the partitioning column. I am making the partitions based on month in the date.
The table is created successfully and the data is getting inserted properly into it as per the partitions.
What will be the effect on the performance if I fire a query which needs to fetch records across multiple partitions?
Consider following query:
SELECT * FROM `demo` WHERE `between` '2015-02-01 00:00:00' AND '2015-05-31 00:00:00';
The query will need to look at ALL the partitions. The optimizer is not smart enough to understand the basic principles of date ranges when they are "wrapped" by the MONTH() function.
You can see this by doing EXPLAIN PARTITIONS SELECT ...;.
Even if it were smart enough to touch only 4 partitions, you would gain no performance benefit for that SELECT. You may as well get rid of partitions and add an index on date.
Since this table is called demo, I suspect it is not the final version. If you would like to talk about whether PARTITIONing is useful for your application, let's see the real schema and the important queries.
I am exploring ways of partitioning a MySQL table by year and month. Can you please analyze my table creation below and see if this method of partitioning would end up putting data by month and year in these sub partitions? I'm using MySQL 5.5 and I can't use
SELECT * FROM points_log PARTITION (p0_p0sp0);
to validate if the partitioning is working. If there is a way to validate this in MySQL 5.5 please comment. I appreciate your feedback and criticisms on this table partitioning.
Here is my table creation:
CREATE TABLE `points_log` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`nick` char(25) NOT NULL,
`amount` decimal(7,4) NOT NULL,
`stream_online` tinyint(1) NOT NULL,
`modification_type` tinyint(3) unsigned NOT NULL,
`dt` datetime NOT NULL,
PRIMARY KEY (`id`,`dt`,`nick`),
KEY `nick_idx` (`nick`),
KEY `amount_idx` (`amount`),
KEY `modification_type_idx` (`modification_type`),
KEY `dt_idx` (`dt`),
KEY `stream_online_idx` (`stream_online`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1
PARTITION BY RANGE( YEAR(dt) )
SUBPARTITION BY HASH( MONTH(dt) )
SUBPARTITIONS 12 (
PARTITION p0 VALUES LESS THAN (2014),
PARTITION p1 VALUES LESS THAN (2015),
PARTITION p2 VALUES LESS THAN (2016),
PARTITION p3 VALUES LESS THAN (2017),
PARTITION p4 VALUES LESS THAN (2018),
PARTITION p5 VALUES LESS THAN (2019),
PARTITION p6 VALUES LESS THAN (2020),
PARTITION p7 VALUES LESS THAN MAXVALUE
);
SUBPARTITIONs are probably useless. (That is, I have yet to find any advantage to their use. That especially applies to performance.)
Don't split the date; keep it as a single field.
Use BY RANGE(TO_DAYS(dt)) VALUES LESS THAN (TO_DAYS('2015-02-01'))
BY HASH is probably totally useless for performance.
WHERE dt BETWEEN .. AND .. cannot do partition pruning in the structure you have.
Do not use more than about 50 partitions (for performance reasons).
Do not create more than one 'future' partition; build them as needed. (This is a minor performance improvement.)
Do not use CHAR for variable length fields. Use VARCHAR.
I've a 30M rows table and I want to partition it by dates.
mysql > SHOW CREATE TABLE `parameters`
CREATE TABLE `parameters` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`add_time` datetime DEFAULT NULL,
...(etc)
) ENGINE=MyISAM AUTO_INCREMENT=28929477 DEFAULT CHARSET=utf8
Table stores data for last 5 years and rows count increases dramatically. I want partition it by years(2009, 2010, 2011, 2012, 2013).
ALTER TABLE parameters DROP PRIMARY KEY, ADD INDEX(id);
ALTER TABLE parameters PARTITION BY RANGE (TO_DAYS(id)) (
PARTITION y2009 VALUES LESS THAN (TO_DAYS('2010-01-01')),
PARTITION y2010 VALUES LESS THAN (TO_DAYS('2011-01-01')),
PARTITION y2011 VALUES LESS THAN (TO_DAYS('2012-03-01')),
PARTITION y2012 VALUES LESS THAN (TO_DAYS('2013-01-01')),
PARTITION y2013 VALUES LESS THAN MAXVALUE
);
Everyting works on dev-server, but there is a problem on production-server.
The problem: almost all of the rows moved to the first partition(y2009). But data is uniformly distributed by years. Physically there is large y2009.myd file in DATA folder and others partitions have much less size.
Also I tried to reorganize first partition in order to exclude Null dates:
alter table raw
reorganize partition y2012 into (
PARTITION y0 VALUES LESS THAN (0),
PARTITION y2012 VALUES LESS THAN (TO_DAYS('2013-01-01')),
);
P.S.: production and dev servers have same version of MySQL 5.1.37
You need to use date column in RANGE not id for partition.
I have changed TO_DAYS(id) to TO_DAYS(add_time)
Try below:
ALTER TABLE parameters PARTITION BY RANGE (TO_DAYS(add_time)) (
PARTITION y0 VALUES LESS THAN (TO_DAYS('2009-01-01')),
PARTITION y2009 VALUES LESS THAN (TO_DAYS('2010-01-01')),
PARTITION y2010 VALUES LESS THAN (TO_DAYS('2011-01-01')),
PARTITION y2011 VALUES LESS THAN (TO_DAYS('2012-03-01')),
PARTITION y2012 VALUES LESS THAN (TO_DAYS('2013-01-01')),
PARTITION y2013 VALUES LESS THAN MAXVALUE
);
I'm trying to alter an existing table to add year and week subpartitions, like so:
CREATE TABLE test_table(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
dtime DATETIME NOT NULL);
ALTER TABLE test_table
PARTITION BY RANGE ( YEAR(dtime) )
SUBPARTITION BY RANGE( WEEK(dtime) ) (
PARTITION y0 VALUES LESS THAN (2013) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
),
PARTITION y1 VALUES LESS THAN (2014) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
),
PARTITION y2 VALUES LESS THAN (2015) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
),
PARTITION y3 VALUES LESS THAN (2016) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
)
);
However, this gives me the vague and unhelpful response of:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'RANGE( WEEK(DTIME) ) (
PARTITION y0 VALUES LESS THAN (2013) (
SUBPARTITION ' at line 3
I've checked the docs: MySQL ALTER TABLE Partition operations and MySQL RANGE and LIST Partitions. However, neither of these describe how to alter a table to create subpartitions.
The second part of my question is for feedback on this partitioning scheme. The data that will go into this is sensor readings that are recorded every minute, and the most common query operation is for data in the last week. I think this should greatly speed up my queries, since a "WHERE dtime > date" is very common, without having to manually move data out of the table periodically into archive tables.
If you want to add a partition BY LIST to an already existing table, drop the primary key and create a composite primary key:
alter table test_table drop primary key, add primary key (id,<some other key>);
alter table orders partition by list(<some other key>) (
partition p0 values IN (1),
partition p1 values IN (2),
partition p2 values IN (3),
partition p3 values IN (4),
partition p4 values IN (5),
partition p5 values IN (6),
partition p6 values IN (7),
partition p7 values IN (8),
partition p8 values IN (9),
partition p9 values IN (10)
);
After further investigation, I have discovered several problems with this approach.
It is impossible to range partition on a DATETIME value (which dtime in the example is). http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-functions.html
The table I was partitioning had a primary key on an auto increment id column, and you cannot partition on an index if there is a different primary key.
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
See also http://blog.mclaughlinsoftware.com/2011/05/09/mysqls-real-partition-key/
http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-partitioning-keys-unique-keys.html
WEEK() is not allowed as a partitioning function. http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-functions.html
From what I now know, if you have a UNIQUE AUTO_INCREMENT id as the primary key, it is impossible to partition on anything except that value.
My queries all use the dtime column in the WHERE conditions, so it seems that unless I can partition somehow on dtime still, there is no benefit to partitioning this table (from a performance perspective).