I have searched lot about automation of mysql partitioning.
But unfortunately nothing matches to problem.
I want delete an older partitions which are not needed but at the same time add new upcoming data to partition.
What I can do here is every day drop an older partition and create new partitions with some automated functions of mysql such as MONTH(NOW()-interval 2 month) etc.
But what it will do is increase the cost of operation as every night I need recreate the partitions for new data.
I found that i can use partitioning by range but there are all hardcoded examples suggest's that i might need to do partitioning every time new data gets added.
Here is an example I found but not much similar to me :
ALTER TABLE t1 PARTITION BY
RANGE(TO_DAYS(FROM_UNIXTIME(transaction_date)))(
PARTITION JAN VALUES LESS THAN (TO_DAYS('2013-02-01')),
PARTITION FEB VALUES LESS THAN (TO_DAYS('2013-03-01')),
PARTITION MAR VALUES LESS THAN (TO_DAYS('2013-04-01')),
PARTITION APR VALUES LESS THAN (TO_DAYS('2013-05-01')),
PARTITION MAY VALUES LESS THAN (TO_DAYS('2013-06-01')),
PARTITION JUN VALUES LESS THAN (TO_DAYS('2013-07-01')),
PARTITION JUL VALUES LESS THAN (TO_DAYS('2013-08-01')),
PARTITION AUG VALUES LESS THAN (TO_DAYS('2013-09-01')),
PARTITION SEP VALUES LESS THAN (TO_DAYS('2013-10-01')),
PARTITION `OCT` VALUES LESS THAN (TO_DAYS('2013-11-01')),
PARTITION NOV VALUES LESS THAN (TO_DAYS('2013-12-01')),
PARTITION `DEC` VALUES LESS THAN (TO_DAYS('2014-01-01'))
);
Please suggest me a proper way to do it.
There is no fully automated way -- You need to write code.
But first, let's fix an issue. Have another partition:
PARTITION future VALUES LESS THAN (MAXVALUE)
This will come in handy if you accidentally fail to roll the partitions some night.
And how about a bug: Your table essentially never has a full 12 months of data. Just after a sliding of the partitions, you will have only 11 months. Is that OK? If not, keep 13 months, not 12.
Now for some code to do the work, plus perhaps some more tips: http://mysql.rjweb.org/doc.php/partitionmaint
You can do it like this. This will automatically store the data in the corresponding partitions. Regarding the automation of truncating them, I too am exploring the creation of scheduled events.
PARTITION BY RANGE ( month(creationDate))
(PARTITION p0 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (12) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (13) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
This can be extended to creating partitions based on a week as well.
Related
For my Table, I need to partition based on created timestamp field by Month.
I am evaluating the following two approaches:
RANGE
ALTER TABLE my_table
PARTITION BY RANGE ( MONTH(created) ) (
PARTITION p1 VALUES LESS THAN (2),
PARTITION p2 VALUES LESS THAN (3),
PARTITION p3 VALUES LESS THAN (4),
PARTITION p4 VALUES LESS THAN (5),
PARTITION p5 VALUES LESS THAN (6),
PARTITION p6 VALUES LESS THAN (7),
PARTITION p7 VALUES LESS THAN (8),
PARTITION p8 VALUES LESS THAN (9),
PARTITION p9 VALUES LESS THAN (10),
PARTITION p10 VALUES LESS THAN (11),
PARTITION p11 VALUES LESS THAN (12),
PARTITION p12 VALUES LESS THAN (13)
);
HASH
ALTER TABLE my_table
PARTITION BY HASH((YEAR(created) * 100) + MONTH(created))
PARTITIONS 13;
Use case:
My use case is that I want to archive by month, for the month which has crosses 1 year. For example, if the current month is july-2020, then the parition corresponsing to july-2019 would be archived, also the secondary use case is the partition pruning to improve the performance as most of the queries include this timestamp column.
Why 13 partitions in the HASH one?
As stated above, I will be archiving the 13th month from current month.
For this use case, which approach would suit better? As far as I understand, when I'm defining it by RANGE, I have the directly control on which data goes into which partition, and in case of HASH, it would be defined by MySQL HASH function (mod) and that will make things difficult to identify the "over the year" partition and archive it specifically.
Or is there any totally different approach for this use case?
PARTITION BY HASH is useless. Period.
PARTITION BY RANGE can be useful if you want to purge "old" data. Details: http://mysql.rjweb.org/doc.php/partitionmaint
What will you do next January?
Show me your SELECTs and SHOW CREATE TABLE. I'll help you optimize the INDEXes for a non-partitioned version. It will run as fast or faster than you think your schema.
More
BY HASH is useless when you have a "range". The Optimizer will always pick all partitions, thereby slowing down the query. (This flaw applies to most partitioning methods.)
If you always use WHERE month=constant, you may as well have the column month early in indexes. MONTH(date_col) = constant is a different matter. (I have not thought through all the implications. Let's see your queries.)
As a general rule, you can build an index on a non-partitioned table that will provide the equivalent functionality as partition pruning. (The link lists only 4 exceptions to the rule. I've spent a decade looking for more use cases.) Correlary: When switching to/from partitioning, all the indexes, including the PRIMARY KEY, should be redesigned.
One of my use cases is to use "transportable tablespaces" to archive one whole partition. You might be able to use that with BY HASH; it's rather clear how to do it with BY RANGE.
The main focus of my blog is to explain DROPping (or 'transporting') the oldest of a 13-month partitions and REORGANIZE to get a new "month" (or other time range).
Can someone explain the difference between below commands?
ALTER TABLE A DROP PARTITION p0;
and
ALTER TABLE A TRUNCATE PARTITION p0;
In which scenarios should we use DROP/TRUNCATE partition?
Both throw the data away. And it is not 'transactional', so you cannot recover the data with a ROLLBACK.
DROP PARTITION also removes the partition from the list of partitions.
TRUNCATE PARTITION leaves the partition in place, but empty.
A common usage of DROP PARTITION is to remove "old" rows. Think of a table of of information that needs to be kept for only 90 days. Use PARTITION BY RANGE(TO_DAYS(...)) and have weekly partitions. Then, every week DROP the oldest and ADD a new partition. More discussion here.
I have not seen a need for TRUNCATE.
Be aware that there are very few use cases where you can get any benefit from PARTITIONing. So far, I have found uses only for PARTITION BY RANGE.
TRUNCATING a partition will be good choice when you have LIST partitions on the table.
It will remove all rows which are part of LIST partition but will not remove the partition entry from the table structure.
Take a scenario where you want to store credit card transactions/orders placed etc., in a MySQL table. Since the data volume is huge, you might want to partition it. Say you have partitioned the table based on the month of transaction.
PARTITION BY RANGE ( month(transactionDate))
(PARTITION p0 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (12) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (13) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
You can also do it week wise if your data volume is huge.
Now you have to clean the old data from time to time. This is where the difference between Drop and Truncate comes. The list of partitions you have initially is p0,p1...p12.
When you drop a partition p1, the list becomes p0,p2,p3,p4...p12. So effectively the data for both Feb and March would go into p2.
But when you do a truncate, p1 is still intact but the data is evicted. So the list remains p0,p1...p12.
So I have decent size table with transactions. I can not redesign it or add more indexes. Data as is. Half of business logic do full scan anyway.
I get an idea to move some old/unused data to 'archive table' to speed up stuff a little and be able once in a while scan all data anyway. Or other direction around: create 'fast table' where I will have only fresh (last month or so) data, and normal table which will have all data.
Is anyone aware of this kind of technic or anything? Where to read? better approaches? How to implement in code or something.
For now we have around 50 mil rows of data with crazy partitions which made more harm then use.
PARTITION BY LIST (user_id%20)
SUBPARTITION BY KEY (user_role)
SUBPARTITIONS 4
(PARTITION p0 VALUES IN (0) ENGINE = InnoDB,
PARTITION p1 VALUES IN (1) ENGINE = InnoDB,
PARTITION p2 VALUES IN (2) ENGINE = InnoDB,
PARTITION p3 VALUES IN (3) ENGINE = InnoDB,
PARTITION p4 VALUES IN (4) ENGINE = InnoDB,
PARTITION p5 VALUES IN (5) ENGINE = InnoDB,
PARTITION p6 VALUES IN (6) ENGINE = InnoDB,
PARTITION p7 VALUES IN (7) ENGINE = InnoDB,
PARTITION p8 VALUES IN (8) ENGINE = InnoDB,
PARTITION p9 VALUES IN (9) ENGINE = InnoDB,
PARTITION p10 VALUES IN (10) ENGINE = InnoDB,
PARTITION p11 VALUES IN (11) ENGINE = InnoDB,
PARTITION p12 VALUES IN (12) ENGINE = InnoDB,
PARTITION p13 VALUES IN (13) ENGINE = InnoDB,
PARTITION p14 VALUES IN (14) ENGINE = InnoDB,
PARTITION p15 VALUES IN (15) ENGINE = InnoDB,
PARTITION p16 VALUES IN (16) ENGINE = InnoDB,
PARTITION p17 VALUES IN (17) ENGINE = InnoDB,
PARTITION p18 VALUES IN (18) ENGINE = InnoDB,
PARTITION p19 VALUES IN (19) ENGINE = InnoDB)
Where 3 different user_role and around 20 rows per uniq user_id. Do not ask me what the hell, it just is what it is....
Thank you.
PS: I absolutely understand it is better invest time to fully analyze and redesign table itself, however sometimes it's impossible 'cuz of politics and simple shitty people who above you in food chain.
You may want to look into Table Partitioning
, specifically Range Partitioning:
If you partition you table by the date column/timestamp, Mysql will then only scan an appropriate partition, during the query:
You frequently run queries that depend directly on the column used for partitioning the table. For example, when executing a query such as EXPLAIN PARTITIONS SELECT COUNT(*) FROM employees WHERE separated BETWEEN '2000-01-01' AND '2000-12-31' GROUP BY store_id;, MySQL can quickly determine that only partition p2 needs to be scanned because the remaining partitions cannot contain any records satisfying the WHERE clause.
I have a table with field which is action_time primary key and type is datetime
I try to break it on partitions
ALTER TABLE foo PARTITION BY RANGE (MONTH(action_time))
(
PARTITION p01 VALUES LESS THAN (02) ,
PARTITION p02 VALUES LESS THAN (03) ,
PARTITION p03 VALUES LESS THAN (04) ,
PARTITION p04 VALUES LESS THAN (05) ,
PARTITION p05 VALUES LESS THAN (06) ,
PARTITION p06 VALUES LESS THAN (07) ,
PARTITION p07 VALUES LESS THAN (08) ,
PARTITION p08 VALUES LESS THAN (09) ,
PARTITION p09 VALUES LESS THAN (10) ,
PARTITION p10 VALUES LESS THAN (11) ,
PARTITION p11 VALUES LESS THAN (12) ,
PARTITION p12 VALUES LESS THAN (13) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
in phpmyadmin I see partitions with rows
but when I execute
explain partitions select * from foo where action_time between '2017-01-01 20:34:08' and '2017-01-21 20:34:08';
or
explain partitions select * from foo where action_time > '2017-01-01 20:34:08' && action_time < '2017-01-21 20:34:08'
it hits all partitions (p01,p02,p03,p04,p05,p06,p07,p08,p09,p10,p11,p12,pmaxval)
what I'm doing wrong ?
I also try this way the same result
ALTER TABLE foo
PARTITION BY RANGE( YEAR(action_time) )
SUBPARTITION BY HASH( MONTH(action_time) )
SUBPARTITIONS 12 (
PARTITION p2015 VALUES LESS THAN (2016),
PARTITION p2016 VALUES LESS THAN (2017),
PARTITION p2017 VALUES LESS THAN (2018),
PARTITION p2018 VALUES LESS THAN (2019),
PARTITION p2019 VALUES LESS THAN (2020),
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p2026 VALUES LESS THAN (2027),
PARTITION p2027 VALUES LESS THAN (2028),
PARTITION p2028 VALUES LESS THAN (2029),
PARTITION p2029 VALUES LESS THAN (2030),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
I need to break the table by year and month for improve select time, when I'm selecting between dates it sholdn't search in whole table it should search in the relevant partitions. how can I do this?
You have found yet another reason why PARTITIONing is virtually useless.
Supposed you had specified BETWEEN '2015-11-05' AND '2017-02-02'. Which partitions would it need to hit? All of them.
Supposed you had specified BETWEEN '2015-11-05' AND '2016-02-02'. Which partitions would it need to hit? 4, but it is not smart enough to wrap around. So it will (I think) hit all.
There are a limited number of patterns (MONTH() is not one of them) where partitioning will "get it right".
To make BY RANGE( some date ) work, you are limited to BY RANGE(TO_DAYS(date)) (and a few others). But then you have to create a new partition every month (or however often). And, optionally, DROP the oldest partition.
Now for another reason why you plan is probably useless. What benefit to you expect to get from partitioning? Perhaps performance? Probably won't give you any performance benefit. Let's see your queries so I can explain why.
A simple
SELECT ...
WHERE date >= '...'
AND date < '...' + INTERVAL 20 DAY
will work just as fast with INDEX(date) as with partitioning. Possibly even faster.
If there is something else in the WHERE, then that changes everything.
My PARTITION blog
Why PARTITIONing does not speed up simple queries
Let's say you have a simple SELECT that has a very good index, such as you specify the exact value for the PRIMARY KEY. (This is called a "point query".)
Case 1: Non-partitioned table. Indexes use a BTree structure. Locating a specific record in a million rows requires drilling down the BTree, which will be about 3 levels deep. For a billion rows, it might be 5 levels.
Case 2: Partitioned table. Partitioning splits the table into multiple tables, each of which have indexes. Locating a specific row will first have to locate the particular partition (sub-table), then drill down the shallower BTree for that partition.
Think if it as (perhaps) removing one level from the BTree, but adding the extra effort of reaching for the partition. The performance difference is minuscule. And it is not clear whether you gain or lose. (Caching, data structures, etc, make this analysis complex.)
Conclusion: For Point Queries, Partitioning never helps, assuming you have a suitable index on the non-partitioned equivalent.
Your particular query is a simple "range" query: WHERE action_time BETWEEN ... AND ...
The optimal table structure (including partitioning and indexing) is
No partitions
INDEX(action_time)
Another note: If multiple partitions are involved, the SELECT will fetch rows (if any) from each partition (after pruning), put them together, and then might have to sort the results (depending on other clauses in the SELECT). Alas there is no parallelism in the execution of the query, so the partitioned variant is more involved, hence, probably slower.
MONTH() is not supported for partition pruning. Currently, only four functions are supported by MySQL 5.7/8.0.
In MySQL 8.0, partition pruning is supported for the TO_DAYS(),
TO_SECONDS(), YEAR(), and UNIX_TIMESTAMP() functions. See Chapter 5,
Partition Pruning, for more information.
You have to use TO_DAYS() instead. e.g.
ALTER TABLE foo PARTITION BY RANGE (TO_DAYS(action_time))
(
PARTITION p01 VALUES LESS THAN (TO_DAYS('2017-02-01')) ,
PARTITION p02 VALUES LESS THAN (TO_DAYS('2017-03-01')) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
Please could you tell me the problem with this query:
ALTER TABLE
`phar_bills`
PARTITION BY RANGE COLUMNS (YEAR(bill_date))
(
PARTITION p0 VALUES LESS THAN (2014),
PARTITION p1 VALUES LESS THAN (2015),
PARTITION p2 VALUES LESS THAN (2016),
PARTITION p3 VALUES LESS THAN (2017),
PARTITION p4 VALUES LESS THAN (2018),
PARTITION p5 VALUES LESS THAN (2019),
PARTITION p6 VALUES LESS THAN (2020),
PARTITION p7 VALUES LESS THAN (2021),
PARTITION p8 VALUES LESS THAN (2022),
PARTITION p9 VALUES LESS THAN (2023),
PARTITION p10 VALUES LESS THAN (2024),
PARTITION p11 VALUES LESS THAN (2025),
PARTITION p12 VALUES LESS THAN (2026),
PARTITION p13 VALUES LESS THAN (2027),
PARTITION p14 VALUES LESS THAN (2028),
PARTITION p15 VALUES LESS THAN (2029),
PARTITION p16 VALUES LESS THAN (2030)
)
SUBPARTITION BY LIST COLUMNS(pharmacy_code)
(
PARTITION phar1 VALUES IN('1'),
PARTITION phar2 VALUES IN('2'),
PARTITION phar3 VALUES IN ('3')
)
What is the purpose of the word COLUMNS between RANGE and the partitioning expression?
BY RANGE COLUMNS (YEAR(bill_date))
^^^^^^^
What the plastic is that doing there? I don't believe that's valid syntax, but maybe you're running a newer version of MySQL.
YEAR(bill_date) is an expression, not the name of a column.
According to the MySQL 5.5 Reference Manual:
"RANGE COLUMNS does not accept expressions, only names of columns."
Reference: http://dev.mysql.com/doc/refman/5.5/en/partitioning-columns-range.html
But if that's not valid syntax, we'd fully expect MySQL to throw an error, most likely, a "#1064 You have an error in your syntax".
Aside from that, there are all sorts of other possible issues... but we'd expect most of those to also throw an actual MySQL error message. "partitioning not enabled", "storage engine doesn't support partitioning", "foreign keys not supported on partitioned tables", or some such.