I have created a table in MYSQL using following syntax:
CREATE TABLE `demo` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID',
`date` datetime NOT NULL COMMENT 'date',
`desc` enum('error','audit','info') NOT NULL,
PRIMARY KEY (`id`,`date`)
)
PARTITION BY RANGE (MONTH(`date`))
(
PARTITION JAN VALUES LESS THAN (2),
PARTITION FEB VALUES LESS THAN (3),
PARTITION MAR VALUES LESS THAN (4),
PARTITION APR VALUES LESS THAN (5),
PARTITION MAY VALUES LESS THAN (6),
PARTITION JUN VALUES LESS THAN (7),
PARTITION JUL VALUES LESS THAN (8),
PARTITION AUG VALUES LESS THAN (9),
PARTITION SEP VALUES LESS THAN (10),
PARTITION OCT VALUES LESS THAN (11),
PARTITION NOV VALUES LESS THAN (12),
PARTITION `DEC` VALUES LESS THAN (MAXVALUE)
);
Here id and date is the combined primary key and I have used date as the partitioning column. I am making the partitions based on month in the date.
The table is created successfully and the data is getting inserted properly into it as per the partitions.
What will be the effect on the performance if I fire a query which needs to fetch records across multiple partitions?
Consider following query:
SELECT * FROM `demo` WHERE `between` '2015-02-01 00:00:00' AND '2015-05-31 00:00:00';
The query will need to look at ALL the partitions. The optimizer is not smart enough to understand the basic principles of date ranges when they are "wrapped" by the MONTH() function.
You can see this by doing EXPLAIN PARTITIONS SELECT ...;.
Even if it were smart enough to touch only 4 partitions, you would gain no performance benefit for that SELECT. You may as well get rid of partitions and add an index on date.
Since this table is called demo, I suspect it is not the final version. If you would like to talk about whether PARTITIONing is useful for your application, let's see the real schema and the important queries.
Related
I have a table that is constantly growing.
I want to delete rows that are older than 1 year (periodically - each 12 hours)
At first I thought using the ordinary delete statement, but it's not good as there are many entries and the database will get stuck.
Then I read that I can use another approach - moving the "undeleted" entries to a new table, renaiming it and using drop for the old table.
The approach that I wanted to try (and not so sure how to do) is using partitioning.
I want to take my field - created and devide it to months, then each month, delete the same month - a year ago.
example : once a month, 1.1.2016 - > delete all entries from jan 2015.
I removed the primary key and added it as index (as I got error 1503).
But still can't figure out how to do it..
can you please advise?
This is the table:
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
`file_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
adding - I tried this :
ALTER TABLE myTable
PARTITION BY RANGE( YEAR(created) )
SUBPARTITION BY HASH( MONTH(created) )
SUBPARTITIONS 12 (
PARTITION january VALUES LESS THAN (2),
PARTITION february VALUES LESS THAN (3),
PARTITION march VALUES LESS THAN (4),
PARTITION april VALUES LESS THAN (5),
PARTITION may VALUES LESS THAN (6),
PARTITION june VALUES LESS THAN (7),
PARTITION july VALUES LESS THAN (8),
PARTITION august VALUES LESS THAN (9),
PARTITION september VALUES LESS THAN (10),
PARTITION october VALUES LESS THAN (11),
PARTITION november VALUES LESS THAN (12),
PARTITION december VALUES LESS THAN (13)
);
but I always get an error : Table has no partition for value 2016 when trying to set created to 2016-01-26 15:37:22
HASH partitioning does not do anything useful.
RANGE partitioning needs specific ranges.
To keep a year's worth of data, but delete in 12-hour chunks, would require 730 partitions; this is impractical.
Instead, I suggest PARTITION BY RANGE with 14 monthly ranges (or 54 weekly) ranges and DROP a whole month (or week). For example, it is now mid-January, so monthly would have: Jan'15, Feb'15, ..., Jan'16, Future.
Near the end of Jan'16, REORGANIZE Future into Feb'16 and Future.
Early in Feb'16, DROP Jan'15.
Yes, you would have up to a month (or week) of data waiting to be deleted, but that probably is not a big deal. And it would be very efficient.
I would write a daily cron job to do "if it is time to drop, do so" and "if it is time to reorganize, do so".
More details.
I've a 30M rows table and I want to partition it by dates.
mysql > SHOW CREATE TABLE `parameters`
CREATE TABLE `parameters` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`add_time` datetime DEFAULT NULL,
...(etc)
) ENGINE=MyISAM AUTO_INCREMENT=28929477 DEFAULT CHARSET=utf8
Table stores data for last 5 years and rows count increases dramatically. I want partition it by years(2009, 2010, 2011, 2012, 2013).
ALTER TABLE parameters DROP PRIMARY KEY, ADD INDEX(id);
ALTER TABLE parameters PARTITION BY RANGE (TO_DAYS(id)) (
PARTITION y2009 VALUES LESS THAN (TO_DAYS('2010-01-01')),
PARTITION y2010 VALUES LESS THAN (TO_DAYS('2011-01-01')),
PARTITION y2011 VALUES LESS THAN (TO_DAYS('2012-03-01')),
PARTITION y2012 VALUES LESS THAN (TO_DAYS('2013-01-01')),
PARTITION y2013 VALUES LESS THAN MAXVALUE
);
Everyting works on dev-server, but there is a problem on production-server.
The problem: almost all of the rows moved to the first partition(y2009). But data is uniformly distributed by years. Physically there is large y2009.myd file in DATA folder and others partitions have much less size.
Also I tried to reorganize first partition in order to exclude Null dates:
alter table raw
reorganize partition y2012 into (
PARTITION y0 VALUES LESS THAN (0),
PARTITION y2012 VALUES LESS THAN (TO_DAYS('2013-01-01')),
);
P.S.: production and dev servers have same version of MySQL 5.1.37
You need to use date column in RANGE not id for partition.
I have changed TO_DAYS(id) to TO_DAYS(add_time)
Try below:
ALTER TABLE parameters PARTITION BY RANGE (TO_DAYS(add_time)) (
PARTITION y0 VALUES LESS THAN (TO_DAYS('2009-01-01')),
PARTITION y2009 VALUES LESS THAN (TO_DAYS('2010-01-01')),
PARTITION y2010 VALUES LESS THAN (TO_DAYS('2011-01-01')),
PARTITION y2011 VALUES LESS THAN (TO_DAYS('2012-03-01')),
PARTITION y2012 VALUES LESS THAN (TO_DAYS('2013-01-01')),
PARTITION y2013 VALUES LESS THAN MAXVALUE
);
I'm trying to alter an existing table to add year and week subpartitions, like so:
CREATE TABLE test_table(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
dtime DATETIME NOT NULL);
ALTER TABLE test_table
PARTITION BY RANGE ( YEAR(dtime) )
SUBPARTITION BY RANGE( WEEK(dtime) ) (
PARTITION y0 VALUES LESS THAN (2013) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
),
PARTITION y1 VALUES LESS THAN (2014) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
),
PARTITION y2 VALUES LESS THAN (2015) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
),
PARTITION y3 VALUES LESS THAN (2016) (
SUBPARTITION w0 VALUES LESS THAN (2),
...
SUBPARTITION w52 VALUES LESS THAN (54)
)
);
However, this gives me the vague and unhelpful response of:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'RANGE( WEEK(DTIME) ) (
PARTITION y0 VALUES LESS THAN (2013) (
SUBPARTITION ' at line 3
I've checked the docs: MySQL ALTER TABLE Partition operations and MySQL RANGE and LIST Partitions. However, neither of these describe how to alter a table to create subpartitions.
The second part of my question is for feedback on this partitioning scheme. The data that will go into this is sensor readings that are recorded every minute, and the most common query operation is for data in the last week. I think this should greatly speed up my queries, since a "WHERE dtime > date" is very common, without having to manually move data out of the table periodically into archive tables.
If you want to add a partition BY LIST to an already existing table, drop the primary key and create a composite primary key:
alter table test_table drop primary key, add primary key (id,<some other key>);
alter table orders partition by list(<some other key>) (
partition p0 values IN (1),
partition p1 values IN (2),
partition p2 values IN (3),
partition p3 values IN (4),
partition p4 values IN (5),
partition p5 values IN (6),
partition p6 values IN (7),
partition p7 values IN (8),
partition p8 values IN (9),
partition p9 values IN (10)
);
After further investigation, I have discovered several problems with this approach.
It is impossible to range partition on a DATETIME value (which dtime in the example is). http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-functions.html
The table I was partitioning had a primary key on an auto increment id column, and you cannot partition on an index if there is a different primary key.
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
See also http://blog.mclaughlinsoftware.com/2011/05/09/mysqls-real-partition-key/
http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-partitioning-keys-unique-keys.html
WEEK() is not allowed as a partitioning function. http://dev.mysql.com/doc/refman/5.1/en/partitioning-limitations-functions.html
From what I now know, if you have a UNIQUE AUTO_INCREMENT id as the primary key, it is impossible to partition on anything except that value.
My queries all use the dtime column in the WHERE conditions, so it seems that unless I can partition somehow on dtime still, there is no benefit to partitioning this table (from a performance perspective).
I am having issue to partition a table using partition by range on a datetime column.
the test search result is still on full partition scan.
I saw some posts on the net in regards to this issue, but not sure if there is any way to fix it or bypass the issue.
mysql server: Percona 5.5.24-55.
table:
id bigint(20) unsigned NOT NULL,
time datatime unsigned NOT NULL,
....
....
KEY id_time (id,time)
engine=InnoDB
partition statement:
alter table summary_201204
partition by range (day(time))
subpartition by key(id)
subpartitions 5 (
partition p0 values less than (6),
partition p1 values less than (11),
partition p2 values less than (16),
partition p3 values less than (21),
partition p4 values less than (26),
partition p5 values less than (MAXVALUE) );
check:
explain partitions select * from summary_201204 where time < '2012-07-21';
result: p0_p0sp0,p0_p0sp1,p0_p0sp2,p0_p0sp3,p0_p0sp4,p1_p1sp0,p1_p1sp1,p1_p1sp2,p1_p1sp3,p1_p1sp4,p2_p2sp0,p2_p2sp1,p2_p2sp2,p2_p2sp3,p2_p2sp4,p3_p3sp0,p3_p3sp1,p3_p3sp2,p3_p3sp3,p3_p3sp4,p4_p4sp0,p4_p4sp1,p4_p4sp2,p4_p4sp3,p4_p4sp4,p5_p5sp0,p5_p5sp1,p5_p5sp2,p5_p5sp3,p5_p5sp4.
I think here is the answer: Visit enter link description here
So, the documentation within the mysql official site is not clear enough about the data types required for partition. In this case, if the table data type is datetime, then we should use to_seconds, whilst if the data type is DATE then we can use YEA
I have a log table that gets processed every night. Processing will be done on data that was logged yesterday. Once the processing is complete I want to delete the data for that day. At the same time, I have new data coming into the table for the current day. I partitioned the table based on day of week. My hope was that I could delete data and insert data at the same time without contention. There could be as many as 3 million rows of data a day being processed. I have searched for information but haven't found anything to confirm my assumption.
I don't want to have the hassles of writing a job that adds partitions and drop partitions as I have seen in other examples. I was hoping to implement a solution using seven partions. eg.
CREATE TABLE `professional_scoring_log` (
`professional_id` int(11) NOT NULL,
`score_date` date NOT NULL,
`scoring_category_attribute_id` int(11) NOT NULL,
`displayable_score` decimal(7,3) NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`professional_id`,`score_date`,`scoring_category_attribute_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (DAYOFWEEK(`score_date`))
(PARTITION Sun VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION Mon VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION Tue VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION Wed VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION Thu VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION Fri VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION Sat VALUES LESS THAN (8) ENGINE = InnoDB) */
When my job that processes yesterday's data is complete, it would delete all records where score_date = current_date-1. At any one time, I am likely only going to have data in one or two partitions, depending on time of day.
Are there any holes in my assumptions?
Charlie, I don't see any holes in your logic/assumptions.
I guess my one comment would be why not use the drop/add partition syntax? It has to be more efficient than DELETE FROM .. Where ..; and it's just two calls - no big deal -- store "prototype" statements and substitute for "Sun" and "2" as required for each day of the week -- I often use sprintf for doing just that
ALTER TABLE `professional_scoring_log` DROP PARTITION Sun;
ALTER TABLE `professional_scoring_log` ADD PARTITION (
PARTITION Sun VALUES LESS THAN (2)
);