Dynamic MySQL partitioning based on UnixTime - mysql

My DB design includes multiple MYISAM tables with measurements collected online,
Each row record contains auto-incremented id, some data and an integer representing unixtime.
I am designing an aging mechanism, and i am interested to use MySQL partitioning to partition each such table based on unixtime dynamically.
Say that i am interested that each partition will represent single month of data, last partition should represent 2 months, if records arrive for the next not represented month, the partition that represented 2 months should be reorganized to represent single month, and new partition should be created representing 2 month (1 taken from the last partition and 1 for future measurements),
Additionally, when a new partition is created i am interested that the oldest partition will be dropped.
What type of partitioning i should use (my unixtime is not a unique key, and how would i use unixtime for partitioning purposes)?
How would i design the partitioning to be fully dynamical based on new records added to the tables?
UPDATE 12.12.12
I have found and interesting link to similar approach to what i have described your-magical-range-partitioning-maintenance-query.

Partitioning does not need to be based solely on a unique key. However if unique key is present, then it should be included in columns used to partition the table on. To partition table on UNIXTIME column do:
ALTER TABLE MyTable
PARTITION BY RANGE COLUMNS (UNIX_TIMESTAMP(datetime_column))
(
PARTITION p01 VALUES LESS THAN (2),
PARTITION p02 VALUES LESS THAN (3),
PARTITION p03 VALUES LESS THAN (4),
PARTITION p04 VALUES LESS THAN (MAXVALUE));
Or you can partition on datetime column stright away in MySQL 5.5+ :
ALTER TABLE MyTable
PARTITION BY RANGE COLUMNS (datetime_column)
(
PARTITION p01 VALUES LESS THAN ('2013-01-01'),
PARTITION p02 VALUES LESS THAN ('2013-02-01'),
PARTITION p03 VALUES LESS THAN ('2013-03-01'),
PARTITION p04 VALUES LESS THAN (MAXVALUE));
Fully automated version (it would keep every month in its own partition, 5 months of data held):
ALTER TABLE MyTable
PARTITION BY RANGE COLUMNS (YEAR(datetime_column)*100 + MONTH(datetime_column))
(
PARTITION p201301 VALUES LESS THAN (201301),
PARTITION p201302 VALUES LESS THAN (201302),
PARTITION p201303 VALUES LESS THAN (201303),
PARTITION p201304 VALUES LESS THAN (201304),
PARTITION p201305 VALUES LESS THAN (201305),
PARTITION p_MAXVALUE VALUES LESS THAN (MAXVALUE));
DECLARE #Min_Part int
DECLARE #Last_Part int
DECLARE #SQL varchar (1000)
If (select count (distinct MONTH(datetime_column)) from MyTable) > 5 THEN
BEGIN
select #Min_Part = (select min(year(datetime_column)*100 + month(datetime_column)) from MyTable),
#Last_Part = (select max(year(datetime_column)*100 + month(datetime_column)) from MyTable)
set #SQL = 'Alter table MyTable REORGANIZE PARTITION p_MAXVALUE (into partition p' +TO_CHAR (#Last_Part) + 'values less than (' + TO_CHAR (#Last_Part) + ')'
call common_schema.eval (#sql)
set #SQL = 'Alter table MyTable DROP PARTITION p' + TO_CHAR (#Min_Part)
call common_schema.eval (#sql)
END
P.S. Apologies if SQL is not exactly correct - cannot parse it right now.

Related

Mysql8 partition by month, partition pruning not working

I have a large table that I have partitioned by month
CREATE TABLE `log` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`logdate` date NOT NULL,
...
,
PRIMARY KEY (`id`,`logdate`)
) PARTITION BY RANGE (month(`logdate`))
(PARTITION part0 VALUES LESS THAN (2),
PARTITION part1 VALUES LESS THAN (3),
PARTITION part2 VALUES LESS THAN (4),
PARTITION part3 VALUES LESS THAN (5),
PARTITION part4 VALUES LESS THAN (6),
PARTITION part5 VALUES LESS THAN (7),
PARTITION part6 VALUES LESS THAN (8),
PARTITION part7 VALUES LESS THAN (9),
PARTITION part8 VALUES LESS THAN (10),
PARTITION part9 VALUES LESS THAN (11),
PARTITION part10 VALUES LESS THAN (12),
PARTITION part11 VALUES LESS THAN MAXVALUE);
I have inserted 3 months of data and can see that the rows have been put into their respective partitions.
When I query specifying the partition, the correct data is returned and the explain shows that it is selecting from the correct partition
select logdate, sum(total) from log partition(part10)
where logdate between '2020-11-01' and '2020-11-30' group by 1 order by 1 desc;
However when not specifying the partition no partition pruning is occurring for the below.
select logdate, sum(total) from log
where logdate between '2020-11-01' and '2020-11-30' group by 1 order by 1 desc;
select logdate, sum(total) from log
where month(logdate) = 11 group by 1 order by 1 desc;
According to mysql-8 documentation https://dev.mysql.com/doc/refman/8.0/en/partitioning-range.html
Partitioning schemes based on time intervals. If you wish to implement a partitioning scheme based on ranges or intervals of time in MySQL 8.0, you have two options:
Partition the table by RANGE, and for the partitioning expression, employ a function operating on a DATE, TIME, or DATETIME column and returning an integer value - as shown here in my code
Partition the table by RANGE COLUMNS, using a DATE or DATETIME column as the partitioning column.
What am I missing?
According to 24.6.3 Partitioning Limitations Relating to Functions:
In MySQL 8.0, partition pruning is supported for the TO_DAYS(), TO_SECONDS(), YEAR(), and UNIX_TIMESTAMP() functions. See Section 24.4, “Partition Pruning”, for more information.
Since you are partitioning by MONTH(), partition pruning won't work.

How to create a partition in MySql if it doesn't exist?

I have a partitioned table, in which I'm inserting data from a stored procedure,
I have partitioning on the table by a column named year,
The stored procedure is able to insert data into the partitioned table properly.
But now I have a case where inserts might happen, for which partitions may not be present,
I need a solution to find if a particular partition name exists for the table.
Eg. My table name is backups
I have 3 Partitions for now -
2018, 2019 and 2020
But in the year 2021 which the stored procedure runs,
there may not be a partition for the year
So I wish my stored procedure handle the checking and creation of the partition at run time.
Following is my table structure -
Partition creation query -
ALTER TABLE backups
partition by list columns(year)
(partition backup_2018 values IN (2018),
partition backup_2019 values IN (2019),
partition backup_2020 values IN (2020));
Following is my stored procedure -
CREATE DEFINER=`root`#`localhost` PROCEDURE `daily_backup`()
BEGIN
DECLARE backuptime INT;
#Need Partition checking and creation here
SET backuptime = UNIX_TIMESTAMP(CONCAT(DATE_SUB(DATE_FORMAT(NOW(),'%Y-%m-%d'), INTERVAL 1 DAY),' 23:59:59'));
INSERT into backups
(user_id, latest_transaction_id, balance, last_transaction_timestamp, last_transaction_date, snapshot_date, year)
SELECT
T2.user_id,
T2.transaction_id AS latest_transaction_id,
T2.new_balance AS balance,
T2.created_date AS last_transaction_timestamp,
DATE_FORMAT(FROM_UNIXTIME(T2.created_date), '%Y-%m-%d %I:%i:%S') AS last_transaction_date,
DATE_FORMAT(NOW(), '%Y-%m-%d') AS snapshot_date,
DATE_FORMAT(NOW(), '%Y') AS year
FROM
(SELECT
user_id, MAX(transaction_id) maxTransID
FROM
transaction
WHERE
created_date < #backuptime
GROUP BY user_id) Tmp
JOIN
transaction T2 ON Tmp.MaxTransID = T2.Transaction_ID;
END
I suggest partitioning using RANGE COLUMNS instead of LIST COLUMNS. That way you have the option of adding a last column for any years beyond the partitions you have defined so far.
ALTER TABLE backups
partition by range columns(year)
(partition backup_2018 values LESS THAN (2019),
partition backup_2019 values LESS THAN (2020),
partition backup_2020 values LESS THAN (2021),
partition backup_other values LESS THAN MAXVALUE);
As you get closer to the end of 2020, you'd use ALTER TABLE backups REORGANIZE PARTITION backup_other INTO ( ...new partitions... ) to split the last partition and make new partitions for subsequent years.
See https://dev.mysql.com/doc/refman/5.7/en/partitioning-management-range-list.html for more details.
If you forget, no harm done, your data will just fill up backup_other for a while until you remember to reorganize. It's to your advantage though to do it proactively, because reorganizing an empty partition is quick, and reorganizing a partition with data in it will take more time.

mysql partitioning does not work

I have a table with field which is action_time primary key and type is datetime
I try to break it on partitions
ALTER TABLE foo PARTITION BY RANGE (MONTH(action_time))
(
PARTITION p01 VALUES LESS THAN (02) ,
PARTITION p02 VALUES LESS THAN (03) ,
PARTITION p03 VALUES LESS THAN (04) ,
PARTITION p04 VALUES LESS THAN (05) ,
PARTITION p05 VALUES LESS THAN (06) ,
PARTITION p06 VALUES LESS THAN (07) ,
PARTITION p07 VALUES LESS THAN (08) ,
PARTITION p08 VALUES LESS THAN (09) ,
PARTITION p09 VALUES LESS THAN (10) ,
PARTITION p10 VALUES LESS THAN (11) ,
PARTITION p11 VALUES LESS THAN (12) ,
PARTITION p12 VALUES LESS THAN (13) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);
in phpmyadmin I see partitions with rows
but when I execute
explain partitions select * from foo where action_time between '2017-01-01 20:34:08' and '2017-01-21 20:34:08';
or
explain partitions select * from foo where action_time > '2017-01-01 20:34:08' && action_time < '2017-01-21 20:34:08'
it hits all partitions (p01,p02,p03,p04,p05,p06,p07,p08,p09,p10,p11,p12,pmaxval)
what I'm doing wrong ?
I also try this way the same result
ALTER TABLE foo
PARTITION BY RANGE( YEAR(action_time) )
SUBPARTITION BY HASH( MONTH(action_time) )
SUBPARTITIONS 12 (
PARTITION p2015 VALUES LESS THAN (2016),
PARTITION p2016 VALUES LESS THAN (2017),
PARTITION p2017 VALUES LESS THAN (2018),
PARTITION p2018 VALUES LESS THAN (2019),
PARTITION p2019 VALUES LESS THAN (2020),
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p2026 VALUES LESS THAN (2027),
PARTITION p2027 VALUES LESS THAN (2028),
PARTITION p2028 VALUES LESS THAN (2029),
PARTITION p2029 VALUES LESS THAN (2030),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
I need to break the table by year and month for improve select time, when I'm selecting between dates it sholdn't search in whole table it should search in the relevant partitions. how can I do this?
You have found yet another reason why PARTITIONing is virtually useless.
Supposed you had specified BETWEEN '2015-11-05' AND '2017-02-02'. Which partitions would it need to hit? All of them.
Supposed you had specified BETWEEN '2015-11-05' AND '2016-02-02'. Which partitions would it need to hit? 4, but it is not smart enough to wrap around. So it will (I think) hit all.
There are a limited number of patterns (MONTH() is not one of them) where partitioning will "get it right".
To make BY RANGE( some date ) work, you are limited to BY RANGE(TO_DAYS(date)) (and a few others). But then you have to create a new partition every month (or however often). And, optionally, DROP the oldest partition.
Now for another reason why you plan is probably useless. What benefit to you expect to get from partitioning? Perhaps performance? Probably won't give you any performance benefit. Let's see your queries so I can explain why.
A simple
SELECT ...
WHERE date >= '...'
AND date < '...' + INTERVAL 20 DAY
will work just as fast with INDEX(date) as with partitioning. Possibly even faster.
If there is something else in the WHERE, then that changes everything.
My PARTITION blog
Why PARTITIONing does not speed up simple queries
Let's say you have a simple SELECT that has a very good index, such as you specify the exact value for the PRIMARY KEY. (This is called a "point query".)
Case 1: Non-partitioned table. Indexes use a BTree structure. Locating a specific record in a million rows requires drilling down the BTree, which will be about 3 levels deep. For a billion rows, it might be 5 levels.
Case 2: Partitioned table. Partitioning splits the table into multiple tables, each of which have indexes. Locating a specific row will first have to locate the particular partition (sub-table), then drill down the shallower BTree for that partition.
Think if it as (perhaps) removing one level from the BTree, but adding the extra effort of reaching for the partition. The performance difference is minuscule. And it is not clear whether you gain or lose. (Caching, data structures, etc, make this analysis complex.)
Conclusion: For Point Queries, Partitioning never helps, assuming you have a suitable index on the non-partitioned equivalent.
Your particular query is a simple "range" query: WHERE action_time BETWEEN ... AND ...
The optimal table structure (including partitioning and indexing) is
No partitions
INDEX(action_time)
Another note: If multiple partitions are involved, the SELECT will fetch rows (if any) from each partition (after pruning), put them together, and then might have to sort the results (depending on other clauses in the SELECT). Alas there is no parallelism in the execution of the query, so the partitioned variant is more involved, hence, probably slower.
MONTH() is not supported for partition pruning. Currently, only four functions are supported by MySQL 5.7/8.0.
In MySQL 8.0, partition pruning is supported for the TO_DAYS(),
TO_SECONDS(), YEAR(), and UNIX_TIMESTAMP() functions. See Chapter 5,
Partition Pruning, for more information.
You have to use TO_DAYS() instead. e.g.
ALTER TABLE foo PARTITION BY RANGE (TO_DAYS(action_time))
(
PARTITION p01 VALUES LESS THAN (TO_DAYS('2017-02-01')) ,
PARTITION p02 VALUES LESS THAN (TO_DAYS('2017-03-01')) ,
PARTITION pmaxval VALUES LESS THAN MAXVALUE
);

Mysql partitioning over the time

I have a table which will grow large over time, moreover I need only small amount of data say last 7 days.
I want to configure it such that the data of 7 days goes in one partition, and then in next. This way I would keep only two partitions and archive others.
I read about MySQL partitions here but the way in article to create partitions is that we specify all partitions while creating table only.
I am not sure is this the best way to do it where we add partitioning logic for long time.
Any ideas?
Unfortunately, it'll be a fairly manual process. Your best bet is to create the partitions, week by week ahead of time, then have a job that runs periodically to archive the old data into the 'catchall' partition.
e.g. with:
PARTITION BY RANGE ( TO_DAYS(date) ) (
PARTITION pmin VALUES LESS THAN ( TO_DAYS('2016-10-02 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( TO_DAYS('2016-10-09 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( TO_DAYS('2016-10-16 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( TO_DAYS('2016-10-23 00:00:00') ),
PARTITION pmax VALUES LESS THAN (MAXVALUE)
);
There's no real harm having a few empty partitions sitting there with higher dates then doing a 'shift' once a week. It'll be fast enough as long as when you change the partitioning definition, the data window shifts by the partition size.
Your job would do something like
ALTER TABLE x REORGANIZE PARTITION pmin,p1 INTO (
PARTITION pmin VALUES LESS THAN ('2016-10-09 00:00:00')
);
ALTER TABLE x
ADD PARTITION px VALUES LESS THAN ( TO_DAYS('2016-10-30 00:00:00') )
);
There is no "automatic" partition management in MySQL. We have to run some specific SQL statements to add and drop partitions from a partitioned table.
We automated the task with a cron job which runs a MySQL PROCEDURE we wrote to drop (swap out) old partitions, and another PROCEDURE to add new partitions. The procedures are specific to a particular table.
Our table is partitioned by RANGE on a TIMESTAMP column. The partition expression is like UNIX_TIMESTAMP(col).
To add a new partition, we reorganize the MAXVALUE partition, which is always (or should always be) empty, so the operation is very quick. We dynamically prepare and execute a statement of the form:
ALTER TABLE ourtable REORGANIZE PARTITION pmax
INTO ( PARTITION pn_name VALUES LESS THAN (UNIX_TIMESTAMP(pn_date))
, PARTITION pmax VALUES LESS THAN MAXVALUE)
To get a new date value for the new partition (pn_name), we take the partition_description value from the second to last partition (the last partition is the MAXVALUE partition), and add 7 days to it to get the pn_date string to use. We use that same value to generate the pn_name for the new partition. (We name the partitions following a pattern like this: p20161030 based on the date value in the partition_description e.g. UNIX_TIMESTAMP('2016-10-30').
(This information is obtained from a fairly involved query with a couple of references to information_schema.partitions view.
With the other procedure to drop old partitions, we actually "swap out" the old partition to an archive table. (The archive table is later backed up, and dropped by a different task.)
The procedure basically runs a series of statements like this:
DROP TABLE IF EXISTS `_et` ;
CREATE TABLE `_et` LIKE `rdg_point_value` ;
ALTER TABLE `_et` REMOVE PARTITIONING ;
ALTER TABLE `ourtable` EXCHANGE PARTITION `oldest_partition` WITH TABLE `_et` ;
ALTER TABLE `ourtable` DROP PARTITION `oldest_partition` ;
RENAME TABLE `et` TO `archive_oldest_partition` ;
(I wish there was a cleaner way to create a new un-partitioned table, in a single statement, such as a a CREATE TABLE ... LIKE ... WITHOUT PARTITIONING, but absent that, we settled on the two separate statements.)
Just dropping the oldest partition would be a simpler process.
To obtain information about the oldest partition, our query is probably overkill. But it's where most of the "magic" happens. Just to give you an idea of what that query looks like...
FROM information_schema.partitions p1
JOIN information_schema.partitions px
ON px.table_schema = 'ourdatabase'
AND px.table_name = 'ourtable'
AND px.partition_method = 'RANGE'
AND px.partition_expression = 'UNIX_TIMESTAMP(ourcol)'
AND px.partition_description = 'MAXVALUE'
WHERE p1.table_schema = 'ourdatabase'
AND p1.table_name = 'ourtable'
AND p1.partition_method = 'RANGE'
AND p1.partition_expression = 'UNIX_TIMESTAMP(ourcol)'
AND p1.partition_description <> 'MAXVALUE'
AND p1.partition_description + 0 <= UNIX_TIMESTAMP(DATE(NOW()) + INTERVAL -187 DAY)
AND p1.partition_ordinal_position = 1
You could probably get away with a simpler query. (Our query is designed to only return the "oldest" partition only if all of the timestamp values in it are at least six months old, and only if there is a MAXVALUE partition defined.
Each of the procedures use the current date to see if "its time" to add or drop a partition. (The amount of time forward and back is hardcoded into the queries in the procedure... the query returns 0 rows if its not time yet.
The procedures only need to be executed once per week, and we designed them so that any "extra" runs won't add or drop partitions outside of the specified time ranges.
We have the procedures scheduled to execute every day, and on most days, the procedure runs a query which returns zero rows, and exits. Only when the query returns a row is there any work to do.

MySQL Partitioning Error - Error Code : 1486

MySQL throwing error while creating partitions on table.
Error Code : 1486
Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed.
I have tried following query :
alter table test.tbl1
partition by range(unix_timestamp(sys_time))
(
PARTITION p20151001 VALUES LESS THAN (unix_timestamp('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (unix_timestamp('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (unix_timestamp('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (unix_timestamp('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (unix_timestamp('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (unix_timestamp('2016-03-01'))
);
How can I round this problem.
Thanks in Advance
Reading here it may be possible that you are using MYSQL 5.1:
https://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html
Another pain point in MySQL 5.1 is the handling of date columns. You
can't use them directly, but you need to convert such columns using
either YEAR or TO_DAYS
If your column sys_time is a DATETIME, you dont need to specify the timestamp in order to partition it, you just need to do TO_DAYS, since you're not doing it by year:
alter table test.tbl1
partition by range (TO_DAYS(sys_time))
(
PARTITION p20151001 VALUES LESS THAN (TO_DAYS('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (TO_DAYS('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (TO_DAYS('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (TO_DAYS('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (TO_DAYS('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (TO_DAYS('2016-03-01'))
);
if sys_time is a TIMESTAMP then you dont need to convert your timestamp to a timestamp, I have taken that out of the range parameter:
alter table test.tbl1
partition by range(sys_time)
(
PARTITION p20151001 VALUES LESS THAN (unix_timestamp('2015-10-01')),
PARTITION p20151101 VALUES LESS THAN (unix_timestamp('2015-11-01')),
PARTITION p20151201 VALUES LESS THAN (unix_timestamp('2015-12-01')),
PARTITION p20160101 VALUES LESS THAN (unix_timestamp('2016-01-01')),
PARTITION p20160201 VALUES LESS THAN (unix_timestamp('2016-02-01')),
PARTITION p20160301 VALUES LESS THAN (unix_timestamp('2016-03-01'))
);