I have a partitioned table, in which I'm inserting data from a stored procedure,
I have partitioning on the table by a column named year,
The stored procedure is able to insert data into the partitioned table properly.
But now I have a case where inserts might happen, for which partitions may not be present,
I need a solution to find if a particular partition name exists for the table.
Eg. My table name is backups
I have 3 Partitions for now -
2018, 2019 and 2020
But in the year 2021 which the stored procedure runs,
there may not be a partition for the year
So I wish my stored procedure handle the checking and creation of the partition at run time.
Following is my table structure -
Partition creation query -
ALTER TABLE backups
partition by list columns(year)
(partition backup_2018 values IN (2018),
partition backup_2019 values IN (2019),
partition backup_2020 values IN (2020));
Following is my stored procedure -
CREATE DEFINER=`root`#`localhost` PROCEDURE `daily_backup`()
BEGIN
DECLARE backuptime INT;
#Need Partition checking and creation here
SET backuptime = UNIX_TIMESTAMP(CONCAT(DATE_SUB(DATE_FORMAT(NOW(),'%Y-%m-%d'), INTERVAL 1 DAY),' 23:59:59'));
INSERT into backups
(user_id, latest_transaction_id, balance, last_transaction_timestamp, last_transaction_date, snapshot_date, year)
SELECT
T2.user_id,
T2.transaction_id AS latest_transaction_id,
T2.new_balance AS balance,
T2.created_date AS last_transaction_timestamp,
DATE_FORMAT(FROM_UNIXTIME(T2.created_date), '%Y-%m-%d %I:%i:%S') AS last_transaction_date,
DATE_FORMAT(NOW(), '%Y-%m-%d') AS snapshot_date,
DATE_FORMAT(NOW(), '%Y') AS year
FROM
(SELECT
user_id, MAX(transaction_id) maxTransID
FROM
transaction
WHERE
created_date < #backuptime
GROUP BY user_id) Tmp
JOIN
transaction T2 ON Tmp.MaxTransID = T2.Transaction_ID;
END
I suggest partitioning using RANGE COLUMNS instead of LIST COLUMNS. That way you have the option of adding a last column for any years beyond the partitions you have defined so far.
ALTER TABLE backups
partition by range columns(year)
(partition backup_2018 values LESS THAN (2019),
partition backup_2019 values LESS THAN (2020),
partition backup_2020 values LESS THAN (2021),
partition backup_other values LESS THAN MAXVALUE);
As you get closer to the end of 2020, you'd use ALTER TABLE backups REORGANIZE PARTITION backup_other INTO ( ...new partitions... ) to split the last partition and make new partitions for subsequent years.
See https://dev.mysql.com/doc/refman/5.7/en/partitioning-management-range-list.html for more details.
If you forget, no harm done, your data will just fill up backup_other for a while until you remember to reorganize. It's to your advantage though to do it proactively, because reorganizing an empty partition is quick, and reorganizing a partition with data in it will take more time.
Related
My database is on AWS RDS and getting bigger day by day.
The reason for it is we have several cron jobs that fetches data through various API's and add it into our database. The data is increasing and affecting the SQL SEARCH operations.
I am thinking of archiving the previous year's data so that the WHERE clause keeps running without any latency and it does not traverse the complete record set (the previous years data).
I came to know recently about MYSQL Partitioning concept and by using the RANGE filter, we can partition the data of each year. My only concern about is if I have columns in the table like:-
id, first_name, last_name, email, created_date
and the partitioning is done as:
PARTITION BY RANGE(YEAR(created_date)) (
PARTITION p0 VALUES LESS THAN (2019),
PARTITION p1 VALUES LESS THAN (2020),
PARTITION p2 VALUES LESS THAN MAXVALUE
)
If I run the SQL query as:
select * from table where email = "abc#....com"
Here, the partition is created on column created_date but WHERE clause is applied on email column so from which partition it will fetch the result?
I have a huge table that stores many tracked events, such as a user click.
The table is already in the 10s of millions, and it's growing larger every day.
The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.
What I want to do is partition the table on a per month basis.
I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?
If not, what is the command to do it manually considering my partitioned by column is a datetime?
As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
This is easily possible by hash partitioning of the month output.
CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
ENGINE=INNODB
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.
And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):
ALTER TABLE ti
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Querying can be done both from the entire table:
SELECT * from ti;
Or from specific partitions:
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
CREATE TABLE `mytable` (
`post_id` int DEFAULT NULL,
`viewid` int DEFAULT NULL,
`user_id` int DEFAULT NULL,
`post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Be aware of the "lazy" effect doing it partitioning by hash:
As docs says:
You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.
The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.
For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).
HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?
Don't bother partitioning, index the table.
Assuming these are the only two queries you use:
SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
then start the PRIMARY KEY with the_date.
The first query simply reads the entire table; no change between partitioned and not.
The second query, assuming you want a single month, not all the months that map into the same partition, would need to be
SELECT * FROM ti WHERE the_date >= '2019-03-01'
AND the_date < '2019-03-01' + INTERVAL 1 MONTH;
If you have other queries, let's see them.
(I have not found any performance justification for ever using PARTITION BY HASH.)
I have a table which will grow large over time, moreover I need only small amount of data say last 7 days.
I want to configure it such that the data of 7 days goes in one partition, and then in next. This way I would keep only two partitions and archive others.
I read about MySQL partitions here but the way in article to create partitions is that we specify all partitions while creating table only.
I am not sure is this the best way to do it where we add partitioning logic for long time.
Any ideas?
Unfortunately, it'll be a fairly manual process. Your best bet is to create the partitions, week by week ahead of time, then have a job that runs periodically to archive the old data into the 'catchall' partition.
e.g. with:
PARTITION BY RANGE ( TO_DAYS(date) ) (
PARTITION pmin VALUES LESS THAN ( TO_DAYS('2016-10-02 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( TO_DAYS('2016-10-09 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( TO_DAYS('2016-10-16 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( TO_DAYS('2016-10-23 00:00:00') ),
PARTITION pmax VALUES LESS THAN (MAXVALUE)
);
There's no real harm having a few empty partitions sitting there with higher dates then doing a 'shift' once a week. It'll be fast enough as long as when you change the partitioning definition, the data window shifts by the partition size.
Your job would do something like
ALTER TABLE x REORGANIZE PARTITION pmin,p1 INTO (
PARTITION pmin VALUES LESS THAN ('2016-10-09 00:00:00')
);
ALTER TABLE x
ADD PARTITION px VALUES LESS THAN ( TO_DAYS('2016-10-30 00:00:00') )
);
There is no "automatic" partition management in MySQL. We have to run some specific SQL statements to add and drop partitions from a partitioned table.
We automated the task with a cron job which runs a MySQL PROCEDURE we wrote to drop (swap out) old partitions, and another PROCEDURE to add new partitions. The procedures are specific to a particular table.
Our table is partitioned by RANGE on a TIMESTAMP column. The partition expression is like UNIX_TIMESTAMP(col).
To add a new partition, we reorganize the MAXVALUE partition, which is always (or should always be) empty, so the operation is very quick. We dynamically prepare and execute a statement of the form:
ALTER TABLE ourtable REORGANIZE PARTITION pmax
INTO ( PARTITION pn_name VALUES LESS THAN (UNIX_TIMESTAMP(pn_date))
, PARTITION pmax VALUES LESS THAN MAXVALUE)
To get a new date value for the new partition (pn_name), we take the partition_description value from the second to last partition (the last partition is the MAXVALUE partition), and add 7 days to it to get the pn_date string to use. We use that same value to generate the pn_name for the new partition. (We name the partitions following a pattern like this: p20161030 based on the date value in the partition_description e.g. UNIX_TIMESTAMP('2016-10-30').
(This information is obtained from a fairly involved query with a couple of references to information_schema.partitions view.
With the other procedure to drop old partitions, we actually "swap out" the old partition to an archive table. (The archive table is later backed up, and dropped by a different task.)
The procedure basically runs a series of statements like this:
DROP TABLE IF EXISTS `_et` ;
CREATE TABLE `_et` LIKE `rdg_point_value` ;
ALTER TABLE `_et` REMOVE PARTITIONING ;
ALTER TABLE `ourtable` EXCHANGE PARTITION `oldest_partition` WITH TABLE `_et` ;
ALTER TABLE `ourtable` DROP PARTITION `oldest_partition` ;
RENAME TABLE `et` TO `archive_oldest_partition` ;
(I wish there was a cleaner way to create a new un-partitioned table, in a single statement, such as a a CREATE TABLE ... LIKE ... WITHOUT PARTITIONING, but absent that, we settled on the two separate statements.)
Just dropping the oldest partition would be a simpler process.
To obtain information about the oldest partition, our query is probably overkill. But it's where most of the "magic" happens. Just to give you an idea of what that query looks like...
FROM information_schema.partitions p1
JOIN information_schema.partitions px
ON px.table_schema = 'ourdatabase'
AND px.table_name = 'ourtable'
AND px.partition_method = 'RANGE'
AND px.partition_expression = 'UNIX_TIMESTAMP(ourcol)'
AND px.partition_description = 'MAXVALUE'
WHERE p1.table_schema = 'ourdatabase'
AND p1.table_name = 'ourtable'
AND p1.partition_method = 'RANGE'
AND p1.partition_expression = 'UNIX_TIMESTAMP(ourcol)'
AND p1.partition_description <> 'MAXVALUE'
AND p1.partition_description + 0 <= UNIX_TIMESTAMP(DATE(NOW()) + INTERVAL -187 DAY)
AND p1.partition_ordinal_position = 1
You could probably get away with a simpler query. (Our query is designed to only return the "oldest" partition only if all of the timestamp values in it are at least six months old, and only if there is a MAXVALUE partition defined.
Each of the procedures use the current date to see if "its time" to add or drop a partition. (The amount of time forward and back is hardcoded into the queries in the procedure... the query returns 0 rows if its not time yet.
The procedures only need to be executed once per week, and we designed them so that any "extra" runs won't add or drop partitions outside of the specified time ranges.
We have the procedures scheduled to execute every day, and on most days, the procedure runs a query which returns zero rows, and exits. Only when the query returns a row is there any work to do.
I have a huge table that stores many tracked events, such as a user click.
The table is already in the 10s of millions, and it's growing larger every day.
The queries are starting to get slower when I try to fetch events from a large timeframe, and after reading quite a bit on the subject I understand that partitioning the table may boost the performance.
What I want to do is partition the table on a per month basis.
I have only found guides that show how to partition manually each month, is there a way to just tell MySQL to partition by month and it will do that automatically?
If not, what is the command to do it manually considering my partitioned by column is a datetime?
As explained by the manual: http://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
This is easily possible by hash partitioning of the month output.
CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)
ENGINE=INNODB
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Do note that this only partitions by month and not by year, also there are only 6 partitions (so 6 months) in this example.
And for partitioning an existing table (manual: https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html):
ALTER TABLE ti
PARTITION BY HASH( MONTH(tr_date) )
PARTITIONS 6;
Querying can be done both from the entire table:
SELECT * from ti;
Or from specific partitions:
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
CREATE TABLE `mytable` (
`post_id` int DEFAULT NULL,
`viewid` int DEFAULT NULL,
`user_id` int DEFAULT NULL,
`post_Date` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
PARTITION BY RANGE (extract(year_month from `post_Date`))
(PARTITION P0 VALUES LESS THAN (202012) ENGINE = InnoDB,
PARTITION P1 VALUES LESS THAN (202104) ENGINE = InnoDB,
PARTITION P2 VALUES LESS THAN (202108) ENGINE = InnoDB,
PARTITION P3 VALUES LESS THAN (202112) ENGINE = InnoDB,
PARTITION P4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Be aware of the "lazy" effect doing it partitioning by hash:
As docs says:
You should also keep in mind that this expression is evaluated each time a row is inserted or updated (or possibly deleted); this means that very complex expressions may give rise to performance issues, particularly when performing operations (such as batch inserts) that affect a great many rows at one time.
The most efficient hashing function is one which operates upon a single table column and whose value increases or decreases consistently with the column value, as this allows for “pruning” on ranges of partitions. That is, the more closely that the expression varies with the value of the column on which it is based, the more efficiently MySQL can use the expression for hash partitioning.
For example, where date_col is a column of type DATE, then the expression TO_DAYS(date_col) is said to vary directly with the value of date_col, because for every change in the value of date_col, the value of the expression changes in a consistent manner. The variance of the expression YEAR(date_col) with respect to date_col is not quite as direct as that of TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).
HASHing by month with 6 partitions means that two months a year will land in the same partition. What good is that?
Don't bother partitioning, index the table.
Assuming these are the only two queries you use:
SELECT * from ti;
SELECT * from ti PARTITION (HASH(MONTH(some_date)));
then start the PRIMARY KEY with the_date.
The first query simply reads the entire table; no change between partitioned and not.
The second query, assuming you want a single month, not all the months that map into the same partition, would need to be
SELECT * FROM ti WHERE the_date >= '2019-03-01'
AND the_date < '2019-03-01' + INTERVAL 1 MONTH;
If you have other queries, let's see them.
(I have not found any performance justification for ever using PARTITION BY HASH.)
My DB design includes multiple MYISAM tables with measurements collected online,
Each row record contains auto-incremented id, some data and an integer representing unixtime.
I am designing an aging mechanism, and i am interested to use MySQL partitioning to partition each such table based on unixtime dynamically.
Say that i am interested that each partition will represent single month of data, last partition should represent 2 months, if records arrive for the next not represented month, the partition that represented 2 months should be reorganized to represent single month, and new partition should be created representing 2 month (1 taken from the last partition and 1 for future measurements),
Additionally, when a new partition is created i am interested that the oldest partition will be dropped.
What type of partitioning i should use (my unixtime is not a unique key, and how would i use unixtime for partitioning purposes)?
How would i design the partitioning to be fully dynamical based on new records added to the tables?
UPDATE 12.12.12
I have found and interesting link to similar approach to what i have described your-magical-range-partitioning-maintenance-query.
Partitioning does not need to be based solely on a unique key. However if unique key is present, then it should be included in columns used to partition the table on. To partition table on UNIXTIME column do:
ALTER TABLE MyTable
PARTITION BY RANGE COLUMNS (UNIX_TIMESTAMP(datetime_column))
(
PARTITION p01 VALUES LESS THAN (2),
PARTITION p02 VALUES LESS THAN (3),
PARTITION p03 VALUES LESS THAN (4),
PARTITION p04 VALUES LESS THAN (MAXVALUE));
Or you can partition on datetime column stright away in MySQL 5.5+ :
ALTER TABLE MyTable
PARTITION BY RANGE COLUMNS (datetime_column)
(
PARTITION p01 VALUES LESS THAN ('2013-01-01'),
PARTITION p02 VALUES LESS THAN ('2013-02-01'),
PARTITION p03 VALUES LESS THAN ('2013-03-01'),
PARTITION p04 VALUES LESS THAN (MAXVALUE));
Fully automated version (it would keep every month in its own partition, 5 months of data held):
ALTER TABLE MyTable
PARTITION BY RANGE COLUMNS (YEAR(datetime_column)*100 + MONTH(datetime_column))
(
PARTITION p201301 VALUES LESS THAN (201301),
PARTITION p201302 VALUES LESS THAN (201302),
PARTITION p201303 VALUES LESS THAN (201303),
PARTITION p201304 VALUES LESS THAN (201304),
PARTITION p201305 VALUES LESS THAN (201305),
PARTITION p_MAXVALUE VALUES LESS THAN (MAXVALUE));
DECLARE #Min_Part int
DECLARE #Last_Part int
DECLARE #SQL varchar (1000)
If (select count (distinct MONTH(datetime_column)) from MyTable) > 5 THEN
BEGIN
select #Min_Part = (select min(year(datetime_column)*100 + month(datetime_column)) from MyTable),
#Last_Part = (select max(year(datetime_column)*100 + month(datetime_column)) from MyTable)
set #SQL = 'Alter table MyTable REORGANIZE PARTITION p_MAXVALUE (into partition p' +TO_CHAR (#Last_Part) + 'values less than (' + TO_CHAR (#Last_Part) + ')'
call common_schema.eval (#sql)
set #SQL = 'Alter table MyTable DROP PARTITION p' + TO_CHAR (#Min_Part)
call common_schema.eval (#sql)
END
P.S. Apologies if SQL is not exactly correct - cannot parse it right now.