I have date ranged partitioning in Mysql db, but did not create enough partitions to hold data. Eventually MAXVALUE partition started filling up, and has 100M records.
How do I fix this and split it to weekly partitions?
You can use ALTER TABLE ... REORGANIZE PARTITION to rewrite one or more partitions into a new set of partitions.
There is documentation and examples here: https://dev.mysql.com/doc/refman/8.0/en/partitioning-management-range-list.html
Related
MySQL 5.7
Question -
Can I make a table partition for every data that does not fit to any of predefined partitions? Or is it against the purpose of the partitioning concept?
for example,
ALTER TABLE sample_table PARTITION BY LIST COLUMNS(GROUP_CODE)(
PARTITION A_001 IN ('A001'),
PARTITION B_101 VALUES IN ('B101'),
PARTITION B_102 VALUES IN ('B102'),
PARTITION B_202 VALUES IN ('B202'),
PARTITION C_101 IN ('C101'),
PARTITION C_201 IN ('C201'),
PARTITION D_000 IN ('D000'),
);
If I make partitions as such, 'G525' won't be allowed to added to the table. If the partitioning key was integer I can go for the RANGE, but is it possible with the LIST?
I'm using MySQL 5.7 Percona.
My current design uses naive day-by-day partitioning, which adds new partition for next time period on regular basis.
CREATE TABLE `foo` (
...
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC
PARTITION BY RANGE (UNIX_TIMESTAMP(`created_at`)) (
PARTITION `foo_1640995200` VALUES LESS THAN (1640995200) ENGINE = InnoDB, # 2022-01-01 00:00:00
PARTITION `foo_1641081600` VALUES LESS THAN (1641081600) ENGINE = InnoDB, # 2022-01-02 00:00:00
PARTITION `foo_1641168000` VALUES LESS THAN (1641168000) ENGINE = InnoDB # 2022-01-03 00:00:00
);
The issue with that approach is that my data distribution is uneven. Some partitions have 1M rows, some have 50M.
Which leads to another issue - amount of opened tables during some long range selects like SELECT * FROM foo WHERE created_at > NOW() - INTERVAL 1 YEAR.
I want to optimize it to simply extend last partition if amount of rows is below some threshold instead of creating partition for next day. Like:
SELECT `table_rows`
FROM `information_schema`.`partitions`
WHERE table_schema = DATABASE()
AND partition_name = 'foo_1641168000';
-- only 1M rows, no need for new partition, extend existing one:
ALTER TABLE `foo` REORGANIZE PARTITION `foo_1641168000` INTO (
PARTITION `foo_1641254400` VALUES LESS THAN (1641254400) ENGINE = InnoDB # 2022-01-04 00:00:00
);
However this operation despite being simple range change completly rewrites partition foo_1641168000 data. Despite the fact that all data from existing partition fit into new definition.
Which is no-go due to table locks and excessive I/O usage.
Is there any way to achieve this without rewriting data?
BTW: My hacky idea was to add recent data to another table foo_recent and when it grows to certain size install it as partition in foo using EXCHANGE PARTITION .. WITHOUT VALIDATION. But this is dirty and worse both in terms of performance and syntax - queries must work on tables union or be ran on two tables independently with result merging.
REORGANIZE will read the 'from' partitions and write the 'to' partitions. Costly -- unless the 'froms' are empty.
Have a partition called 'future' that is LESS THAN MAXVALUE and is 'always' empty.
You are stuck with copying over lots of data.
Plan A:
Each night, before midnight, do this if the 'last' partition (before 'future') is getting "big":
REORGANIZE last, future
INTO last, soon, future;
Set the LESS THAN (for 'last') to end at midnight tonight. Set the LESS THAN for 'soon' to, say, a month from now. (This is the only big copy.)
Plan B:
The following may be a viable alternative. (I just thought of it; I have not tried it.) Each night, see if the "last" (before "future") is "big enough". When it is, do these steps (just before each midnight):
Use "transportable tablespaces" to remove the big partition from the table. (Note: a partition is essentially a table, so this action is only touching "meta" information. I'm pretty sure no data is copied.)
Turn right around and again use "transportable tablespaces" to put it back into the partitioned table, but with a different LESS THAN -- set to midnight tonight.
REORGANIZE future INTO soon, future; -- Both of those are empty, so this is quite fast. (The LESS THAN for 'soon' is some time in the future. I hesitate to make it "MAXVALUE", but that might work and be even simpler.)
If you try it and it works, let me know. I would like to add it to my Partition Maintenance blog
Can someone tell me pros and cons of HASH PARITION vs RANGE PARTITION on a DATETIME column?
Let consider we have POS table with 20 milion records and would want to create partitions based on transaction date's year like
PARTITION BY HASH(YEAR(TRANSACTION_DATE)) PARTITIONS 4;
or
PARTITION BY RANGE(YEAR(TRANSACTION_DATE)) (
PARTITION p0 VALUES LESS THAN (2010),
PARTITION p1 VALUES LESS THAN (2012),
PARTITION p2 VALUES LESS THAN (2013),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
to improve performance of queries with TRANSACTION_DATE BETWEEN '2013-03-01' AND '2013-09-29'
Which one better over the other? and why?
There are some significant differences. If you have a where clause that refers to a range of years, such as:
where year(transaction_date) between 2009 and 2011
then I don't think the hash partitioning will recognize this as hitting just one, two, or three partitions. The range partitioning should recognize this, reducing the I/O for such a query.
The more important difference has to do with managing the data. With range partitioning, once a partition has been created -- and the year has past -- presumably the partition will not be touched again. That means that you only have to back up one partition, the current partition. And, next year, you'll only need to back up one partition.
A similar situation arises if you want to move data offline. Dropping a partition containing the oldest year of data is pretty easy, compared to deleting the rows one-by-one.
When the number of partitions is only four, these considerations may not make much of a difference. The key idea is that range partitioning assigns a each row to a known partition. Hash partitioning assigns each row to a partition, but you don't know exactly which one.
EDIT:
The particular optimization that reduces the reading of partitions is called "partition pruning". MySQL documents this pretty well here. In particular:
For tables that are partitioned by HASH or KEY, partition pruning is
also possible in cases in which the WHERE clause uses a simple =
relation against a column used in the partitioning expression.
It would appear that partition pruning for inequalities (and even in) requires range partitioning.
I've just started reading on MySQL partitions, they kind of look too good to be true, please bear with me.
I have a table which I would like to partition (which I hope would bring better performance).
This is the case / question:
We have a column which stores Unix timestamp values, is it possible to partition the table in that way, that based on the unix timestamp the partitions are separated on a single date? Or do I have to use range based partitioning by defining the ranges before?
Cheers
You can do whatever you feel like, See: http://dev.mysql.com/doc/refman/5.5/en/partitioning-types.html
And example of partitioning by unix_timestamp would be:
ALTER TABLE table1 PARTITION BY KEY myINT11timestamp PARTITIONS 1000;
-- or
ALTER TABLE table1 PARTITION BY HASH (myINT11timestamp/1000) PARTITIONS 10;
Everything you wanted to know about partitions in MySQL 5.5: http://dev.mysql.com/tech-resources/articles/mysql_55_partitioning.html
I want to keep the last 45 days of log data in a MySQL table for statistical reporting purposes. Each day could be 20-30 million rows. I'm planning on creating a flat file and using load data infile to get the data in there each day. Ideally I'd like to have each day on it's own partition without having to write a script to create a partition every day.
Is there a way in MySQL to just say each day gets it's own partition automatically?
thanks
I would strongly suggest using Redis or Cassandra rather than MySQL to store high traffic data such as logs. Then you could stream it all day long rather than doing daily imports.
You can read more on those two (and more) in this comparison of "NoSQL" databases.
If you insist on MySQL, I think the easiest would just be to create a new table per day, like logs_2011_01_13 and then load it all in there. It makes dropping older dates very easy and you could also easily move different tables on different servers.
er.., number them in Mod 45 with a composite key and cycle through them...
Seriously 1 table per day was a valid suggestion, and since it is static data I would create packed MyISAM, depending upon my host's ability to sort.
Building queries to union some or all of them would be only moderately challenging.
1 table per day, and partition those to improve load performance.
Yes, you can partition MySQL tables by date:
CREATE TABLE ExampleTable (
id INT AUTO_INCREMENT,
d DATE,
PRIMARY KEY (id, d)
) PARTITION BY RANGE COLUMNS(d) (
PARTITION p1 VALUES LESS THAN ('2014-01-01'),
PARTITION p2 VALUES LESS THAN ('2014-01-02'),
PARTITION pN VALUES LESS THAN (MAXVALUE)
);
Later, when you get close to overflowing into partition pN, you can split it:
ALTER TABLE ExampleTable REORGANIZE PARTITION pN INTO (
PARTITION p3 VALUES LESS THAN ('2014-01-03'),
PARTITION pN VALUES LESS THAN (MAXVALUE)
);
This doesn't automatically partition by date, but you can reorganize when you need to. Best to reorganize before you fill the last partition, so the operation will be quick.
I have stumbled on this question while looking for something else and wanted to point out the MERGE storage engine (http://dev.mysql.com/doc/refman/5.7/en/merge-storage-engine.html).
The MERGE storage is more or less a simple pointer to multiple tables, and can be redone in seconds. For cycling logs, it can be very powerfull! Here's what I'd do:
Create one table per day, use LOAD DATA as OP mentionned to fill it up. Once it is done, drop the MERGE table and recreate it including that new table while ommiting the oldest one. Once done, I could delete/archive the old table. This would allow me to rapidly query a specific day, or all as both the orignal tables and the MERGE are valid.
CREATE TABLE logs_day_46 LIKE logs_day_45 ENGINE=MyISAM;
DROP TABLE IF EXISTS logs;
CREATE TABLE logs LIKE logs_day_46 ENGINE=MERGE UNION=(logs_day_2,[...],logs_day_46);
DROP TABLE logs_day_1;
Note that a MERGE table is not the same as a PARTIONNED one and offer some advantages and inconvenients. But do remember that if you are trying to aggregate from all tables it will be slower than if all data was in only one table (same is true for partitions, as they are basically different tables under the hood). If you are going to query mostly on specific days, you will need to choose the table yourself, but if partitions are done on the day values, MySQL will automatically grab the correct table(s) which might come out faster and easier to write.