I have a table with field which is action_time primary key and type is datetime
I try to break it on partitions
ALTER TABLE foo PARTITION BY RANGE (MONTH(action_time))
(
PARTITION p01 VALUES LESS THAN (02) ,
PARTITION p02 VALUES LESS THAN (03) ,
PARTITION p03 VALUES LESS THAN (04) ,
PARTITION p04 VALUES LESS THAN (05) ,
PARTITION p05 VALUES LESS THAN (06) ,
PARTITION p06 VALUES LESS THAN (07) ,
PARTITION p07 VALUES LESS THAN (08) ,
PARTITION p08 VALUES LESS THAN (09) ,
PARTITION p09 VALUES LESS THAN (10) ,
PARTITION p10 VALUES LESS THAN (11) ,
PARTITION p11 VALUES LESS THAN (12) ,
PARTITION p12 VALUES LESS THAN (13) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
in phpmyadmin I see partitions with rows
but when I execute
explain partitions select * from foo where action_time between '2017-01-01 20:34:08' and '2017-01-21 20:34:08';
or
explain partitions select * from foo where action_time > '2017-01-01 20:34:08' && action_time < '2017-01-21 20:34:08'
it hits all partitions (p01,p02,p03,p04,p05,p06,p07,p08,p09,p10,p11,p12,pmaxval)
what I'm doing wrong ?
I also try this way the same result
ALTER TABLE foo
PARTITION BY RANGE( YEAR(action_time) )
SUBPARTITION BY HASH( MONTH(action_time) )
SUBPARTITIONS 12 (
PARTITION p2015 VALUES LESS THAN (2016),
PARTITION p2016 VALUES LESS THAN (2017),
PARTITION p2017 VALUES LESS THAN (2018),
PARTITION p2018 VALUES LESS THAN (2019),
PARTITION p2019 VALUES LESS THAN (2020),
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p2026 VALUES LESS THAN (2027),
PARTITION p2027 VALUES LESS THAN (2028),
PARTITION p2028 VALUES LESS THAN (2029),
PARTITION p2029 VALUES LESS THAN (2030),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
I need to break the table by year and month for improve select time, when I'm selecting between dates it sholdn't search in whole table it should search in the relevant partitions. how can I do this?
You have found yet another reason why PARTITIONing is virtually useless.
Supposed you had specified BETWEEN '2015-11-05' AND '2017-02-02'. Which partitions would it need to hit? All of them.
Supposed you had specified BETWEEN '2015-11-05' AND '2016-02-02'. Which partitions would it need to hit? 4, but it is not smart enough to wrap around. So it will (I think) hit all.
There are a limited number of patterns (MONTH() is not one of them) where partitioning will "get it right".
To make BY RANGE( some date ) work, you are limited to BY RANGE(TO_DAYS(date)) (and a few others). But then you have to create a new partition every month (or however often). And, optionally, DROP the oldest partition.
Now for another reason why you plan is probably useless. What benefit to you expect to get from partitioning? Perhaps performance? Probably won't give you any performance benefit. Let's see your queries so I can explain why.
A simple
SELECT ...
WHERE date >= '...'
AND date < '...' + INTERVAL 20 DAY
will work just as fast with INDEX(date) as with partitioning. Possibly even faster.
If there is something else in the WHERE, then that changes everything.
My PARTITION blog
Why PARTITIONing does not speed up simple queries
Let's say you have a simple SELECT that has a very good index, such as you specify the exact value for the PRIMARY KEY. (This is called a "point query".)
Case 1: Non-partitioned table. Indexes use a BTree structure. Locating a specific record in a million rows requires drilling down the BTree, which will be about 3 levels deep. For a billion rows, it might be 5 levels.
Case 2: Partitioned table. Partitioning splits the table into multiple tables, each of which have indexes. Locating a specific row will first have to locate the particular partition (sub-table), then drill down the shallower BTree for that partition.
Think if it as (perhaps) removing one level from the BTree, but adding the extra effort of reaching for the partition. The performance difference is minuscule. And it is not clear whether you gain or lose. (Caching, data structures, etc, make this analysis complex.)
Conclusion: For Point Queries, Partitioning never helps, assuming you have a suitable index on the non-partitioned equivalent.
Your particular query is a simple "range" query: WHERE action_time BETWEEN ... AND ...
The optimal table structure (including partitioning and indexing) is
No partitions
INDEX(action_time)
Another note: If multiple partitions are involved, the SELECT will fetch rows (if any) from each partition (after pruning), put them together, and then might have to sort the results (depending on other clauses in the SELECT). Alas there is no parallelism in the execution of the query, so the partitioned variant is more involved, hence, probably slower.
MONTH() is not supported for partition pruning. Currently, only four functions are supported by MySQL 5.7/8.0.
In MySQL 8.0, partition pruning is supported for the TO_DAYS(),
TO_SECONDS(), YEAR(), and UNIX_TIMESTAMP() functions. See Chapter 5,
Partition Pruning, for more information.
You have to use TO_DAYS() instead. e.g.
ALTER TABLE foo PARTITION BY RANGE (TO_DAYS(action_time))
(
PARTITION p01 VALUES LESS THAN (TO_DAYS('2017-02-01')) ,
PARTITION p02 VALUES LESS THAN (TO_DAYS('2017-03-01')) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
Related
For my Table, I need to partition based on created timestamp field by Month.
I am evaluating the following two approaches:
RANGE
ALTER TABLE my_table
PARTITION BY RANGE ( MONTH(created) ) (
PARTITION p1 VALUES LESS THAN (2),
PARTITION p2 VALUES LESS THAN (3),
PARTITION p3 VALUES LESS THAN (4),
PARTITION p4 VALUES LESS THAN (5),
PARTITION p5 VALUES LESS THAN (6),
PARTITION p6 VALUES LESS THAN (7),
PARTITION p7 VALUES LESS THAN (8),
PARTITION p8 VALUES LESS THAN (9),
PARTITION p9 VALUES LESS THAN (10),
PARTITION p10 VALUES LESS THAN (11),
PARTITION p11 VALUES LESS THAN (12),
PARTITION p12 VALUES LESS THAN (13)
);
HASH
ALTER TABLE my_table
PARTITION BY HASH((YEAR(created) * 100) + MONTH(created))
PARTITIONS 13;
Use case:
My use case is that I want to archive by month, for the month which has crosses 1 year. For example, if the current month is july-2020, then the parition corresponsing to july-2019 would be archived, also the secondary use case is the partition pruning to improve the performance as most of the queries include this timestamp column.
Why 13 partitions in the HASH one?
As stated above, I will be archiving the 13th month from current month.
For this use case, which approach would suit better? As far as I understand, when I'm defining it by RANGE, I have the directly control on which data goes into which partition, and in case of HASH, it would be defined by MySQL HASH function (mod) and that will make things difficult to identify the "over the year" partition and archive it specifically.
Or is there any totally different approach for this use case?
PARTITION BY HASH is useless. Period.
PARTITION BY RANGE can be useful if you want to purge "old" data. Details: http://mysql.rjweb.org/doc.php/partitionmaint
What will you do next January?
Show me your SELECTs and SHOW CREATE TABLE. I'll help you optimize the INDEXes for a non-partitioned version. It will run as fast or faster than you think your schema.
More
BY HASH is useless when you have a "range". The Optimizer will always pick all partitions, thereby slowing down the query. (This flaw applies to most partitioning methods.)
If you always use WHERE month=constant, you may as well have the column month early in indexes. MONTH(date_col) = constant is a different matter. (I have not thought through all the implications. Let's see your queries.)
As a general rule, you can build an index on a non-partitioned table that will provide the equivalent functionality as partition pruning. (The link lists only 4 exceptions to the rule. I've spent a decade looking for more use cases.) Correlary: When switching to/from partitioning, all the indexes, including the PRIMARY KEY, should be redesigned.
One of my use cases is to use "transportable tablespaces" to archive one whole partition. You might be able to use that with BY HASH; it's rather clear how to do it with BY RANGE.
The main focus of my blog is to explain DROPping (or 'transporting') the oldest of a 13-month partitions and REORGANIZE to get a new "month" (or other time range).
I couldn't find an example like mine, so here's the thing:
I have a big data set that I need to aggregate on top of.
We're talking about ~ %500M rows with a date field ranging from 2y ago until now.
My first instinct was to partition the table by this field (creating a partition on the date field), which leaves roughly 20M rows per partition.
Then I have indexes on the other fields I will aggregate/group by.
Here's my table definition (simplified for brevity sake):
create table t1(
date_field datetime not null,
additional_id int not null,
category_id int not null,
value_field1 double,
value_field2 double,
primary key(additional_id,date_field)
)
ENGINE=InnoDB
PARTITION BY RANGE(YEAR(date_field)*100 + MONTH(date_field)) (
PARTITION p_201411 VALUES LESS THAN (201411),
PARTITION p_201412 VALUES LESS THAN (201412),
#all the partitions until the current month...
PARTITION p_201610 VALUES LESS THAN (201610),
PARTITION p_201611 VALUES LESS THAN (201610),
PARTITION p_catchall VALUES LESS THAN MAXVALUE );
If I execute a query that gets a date directly, only the partition for the month is used, based on the output of explain partitions on top of a query such as the following one:
select value_field1 where additional_id=x and date_field='2014-11-05'
However, if I use a date range (even if inside the same partition), all partitions are scanned
select value_field1 where additional_id=x and date_field> '2014-11-05' and date_field <'2014-11-10'
(Same result if I use between).
What am I missing here? Is this really the right way to partition this table?
Thanks in advance
Short answer: Do not use complex expressions for PARTITION BY RANGE.
Long answer: (Aside from criticizing the implementation of BY RANGE with range queries.)
Instead, do this:
PARTITION BY RANGE (TO_DAYS(date_field)) (
PARTITION p_201411 VALUES LESS THAN (TO_DAYS('2014-11-01')),
...
PARTITION p_catchall VALUES LESS THAN MAXVALUE ); -- unchanged
Newer versions of MySQL have slightly more friendly expressions you can use.
If this is your typical query:
additional_id=x and date_field> '2014-11-05'
and date_field <'2014-11-10'
then partitioning is no faster than the equivalent non-partitioned table. You even have the perfect index for the non-partitioned version.
If, on the other hand, you are DROPping old partitions when they 'expire', the PARTITIONing is excellent.
25 partitions is good.
More discussion .
A side note: additional_id int is limited to 2 billion, so you are 1/4 of the way to overflowing. INT UNSIGNED would get you to 4 billion; you might consider an ALTER. (Of course, I don't know whether additional_id is unique in this table; so maybe it is not an issue.)
MySQL throwing error while creating partitions on table.
Error Code : 1486
Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed.
I have tried following query :
alter table test.tbl1
partition by range(unix_timestamp(sys_time))
(
PARTITION p20151001 VALUES LESS THAN (unix_timestamp('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (unix_timestamp('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (unix_timestamp('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (unix_timestamp('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (unix_timestamp('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (unix_timestamp('2016-03-01'))
);
How can I round this problem.
Thanks in Advance
Reading here it may be possible that you are using MYSQL 5.1:
https://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html
Another pain point in MySQL 5.1 is the handling of date columns. You
can't use them directly, but you need to convert such columns using
either YEAR or TO_DAYS
If your column sys_time is a DATETIME, you dont need to specify the timestamp in order to partition it, you just need to do TO_DAYS, since you're not doing it by year:
alter table test.tbl1
partition by range (TO_DAYS(sys_time))
(
PARTITION p20151001 VALUES LESS THAN (TO_DAYS('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (TO_DAYS('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (TO_DAYS('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (TO_DAYS('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (TO_DAYS('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (TO_DAYS('2016-03-01'))
);
if sys_time is a TIMESTAMP then you dont need to convert your timestamp to a timestamp, I have taken that out of the range parameter:
alter table test.tbl1
partition by range(sys_time)
(
PARTITION p20151001 VALUES LESS THAN (unix_timestamp('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (unix_timestamp('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (unix_timestamp('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (unix_timestamp('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (unix_timestamp('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (unix_timestamp('2016-03-01'))
);
Can someone tell me pros and cons of HASH PARITION vs RANGE PARTITION on a DATETIME column?
Let consider we have POS table with 20 milion records and would want to create partitions based on transaction date's year like
PARTITION BY HASH(YEAR(TRANSACTION_DATE)) PARTITIONS 4;
or
PARTITION BY RANGE(YEAR(TRANSACTION_DATE)) (
PARTITION p0 VALUES LESS THAN (2010),
PARTITION p1 VALUES LESS THAN (2012),
PARTITION p2 VALUES LESS THAN (2013),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
to improve performance of queries with TRANSACTION_DATE BETWEEN '2013-03-01' AND '2013-09-29'
Which one better over the other? and why?
There are some significant differences. If you have a where clause that refers to a range of years, such as:
where year(transaction_date) between 2009 and 2011
then I don't think the hash partitioning will recognize this as hitting just one, two, or three partitions. The range partitioning should recognize this, reducing the I/O for such a query.
The more important difference has to do with managing the data. With range partitioning, once a partition has been created -- and the year has past -- presumably the partition will not be touched again. That means that you only have to back up one partition, the current partition. And, next year, you'll only need to back up one partition.
A similar situation arises if you want to move data offline. Dropping a partition containing the oldest year of data is pretty easy, compared to deleting the rows one-by-one.
When the number of partitions is only four, these considerations may not make much of a difference. The key idea is that range partitioning assigns a each row to a known partition. Hash partitioning assigns each row to a partition, but you don't know exactly which one.
EDIT:
The particular optimization that reduces the reading of partitions is called "partition pruning". MySQL documents this pretty well here. In particular:
For tables that are partitioned by HASH or KEY, partition pruning is
also possible in cases in which the WHERE clause uses a simple =
relation against a column used in the partitioning expression.
It would appear that partition pruning for inequalities (and even in) requires range partitioning.
I have a table named edr on mysql 5.1.6* version. I have partitioned the table using alter -
ALTER TABLE edr PARTITION BY RANGE (TO_DAYS(eventDate))
(
PARTITION apr25 VALUES LESS THAN (TO_DAYS('2014-04-26')),
PARTITION apr26_30 VALUES LESS THAN (TO_DAYS('2014-05-01')),
PARTITION may01_05 VALUES LESS THAN (TO_DAYS('2014-05-06')),
PARTITION may06_10 VALUES LESS THAN (TO_DAYS('2014-05-11')),
PARTITION may11_15 VALUES LESS THAN (TO_DAYS('2014-05-16')),
PARTITION may16_20 VALUES LESS THAN (TO_DAYS('2014-05-21')),
PARTITION may21_25 VALUES LESS THAN (TO_DAYS('2014-05-26')),
PARTITION may26_31 VALUES LESS THAN (TO_DAYS('2014-06-01')),
PARTITION june01_05 VALUES LESS THAN (TO_DAYS('2014-06-06')),
PARTITION june06_10 VALUES LESS THAN (TO_DAYS('2014-06-11')),
PARTITION june11_15 VALUES LESS THAN (TO_DAYS('2014-06-16')));
now when I am running any query for example:
explain partitions select count(*) from edr where eventdate > '2014-05-21';
it gives me output for partitions as - apr25,may21_25, may26_31, jun01_05,jun_06_10,jun11_15.
Here in partition apr25 there is no record for such where condition.
please let me know is any thing wrong in above query or its a partition problem.
It is MySQL bug: explained here.
Try to create a first partition that contains values less than (0)
PARTITION unused VALUES LESS THAN (0);