I need to optimize a MYSQL query doing an order by. No matter what I do, mysql ends up doing a filesort instead of using the index.
Here's my table ddl... (Yes, In this case the DAYSTAMP and TIMESTAMP columns are exactly the same).
CREATE TABLE DB_PROBE.TBL_PROBE_DAILY (
DAYSTAMP date NOT NULL,
TIMESTAMP date NOT NULL,
SOURCE_ADDR varchar(64) NOT NULL,
SOURCE_PORT int(10) NOT NULL,
DEST_ADDR varchar(64) NOT NULL,
DEST_PORT int(10) NOT NULL,
PACKET_COUNT int(20) NOT NULL,
BYTES int(20) NOT NULL,
UNIQUE KEY IDX_TBL_PROBE_DAILY_05 (DAYSTAMP,SOURCE_ADDR(16),SOURCE_PORT,
DEST_ADDR(16),DEST_PORT,TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_01 (SOURCE_ADDR(16),TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_02 (DEST_ADDR(16),TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_03 (SOURCE_PORT,TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_04 (DEST_PORT,TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_06 (DAYSTAMP,TIMESTAMP,BYTES)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (to_days(DAYSTAMP))
(PARTITION TBL_PROBE_DAILY_P20100303 VALUES LESS THAN (734200) ENGINE = InnoDB,
PARTITION TBL_PROBE_DAILY_P20100304 VALUES LESS THAN (734201) ENGINE = InnoDB,
PARTITION TBL_PROBE_DAILY_P20100305 VALUES LESS THAN (734202) ENGINE = InnoDB,
PARTITION TBL_PROBE_DAILY_P20100306 VALUES LESS THAN (734203) ENGINE = InnoDB) */;
The partitions are daily and I've added IDX_TBL_PROBE_DAILY_06 especially for the query I'm trying to get working, which is:
select SOURCE_ADDR as 'Source_IP',
SOURCE_PORT as 'Source_Port',
DEST_ADDR as 'Destination_IP',
DEST_PORT as 'Destination_Port',
BYTES
from TBL_PROBE_DAILY
where DAYSTAMP >= '2010-03-04' and DAYSTAMP <= '2010-03-04'
and TIMESTAMP >= FROM_UNIXTIME(1267653600) and TIMESTAMP <= FROM_UNIXTIME(1267687228)
order by bytes desc limit 20;
The explain plan as follows:
+----+-------------+-----------------+---------------------------+-------+-----------------------------------------------+------------------------+---------+------+--------+-----------------------------+ | id | select_type | table |
partitions | type | possible_keys |
key | key_len | ref | rows | Extra |
+----+-------------+-----------------+---------------------------+-------+-----------------------------------------------+------------------------+---------+------+--------+-----------------------------+ | 1 | SIMPLE | TBL_PROBE_DAILY |
TBL_PROBE_DAILY_P20100304 | range |
IDX_TBL_PROBE_DAILY_05,IDX_TBL_PROBE_DAILY_06 | IDX_TBL_PROBE_DAILY_05 | 3 | NULL |
216920 | Using where; Using filesort |
+----+-------------+-----------------+---------------------------+-------+-----------------------------------------------+------------------------+---------+------+--------+-----------------------------+
I've also tried to FORCE INDEX (IDX_TBL_PROBE_DAILY_06) , in which case it happily uses IDX_06 to satisfy the where constraints, but still does a filesort :(
I cant imagine index sorting impossible on partitioned tables? InnoDB behaves different to MyISAM in this regard? I would have thought InnoDBs index+data caching to be ideal for index sorting.
Any help will be much appreciated... I've been trying all week to optimize this query in different ways, without much success.
Ok. Looks like swapping the columns in the index did the trick.
I don't really know why... maybe someone else has an explanation?
Either way, if I add an index
create index IDX_TBL_PROBE_DAILY_07 on TBL_PROBE_DAILY(BYTES,DAYSTAMP)
then mysql favors IDX07 (even without the force index) and does an index sort instead of file sort.
I couldn't read the definition. Here it is formatted:
CREATE TABLE DB_PROBE.TBL_PROBE_DAILY (
DAYSTAMP date NOT NULL,
TIMESTAMP date NOT NULL,
SOURCE_ADDR varchar(64) NOT NULL,
SOURCE_PORT int(10) NOT NULL,
DEST_ADDR varchar(64) NOT NULL,
DEST_PORT int(10) NOT NULL,
PACKET_COUNT int(20) NOT NULL,
BYTES int(20) NOT NULL,
UNIQUE KEY IDX_TBL_PROBE_DAILY_05 (DAYSTAMP,SOURCE_ADDR(16),SOURCE_PORT,
DEST_ADDR(16),DEST_PORT,TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_01 (SOURCE_ADDR(16),TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_02 (DEST_ADDR(16),TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_03 (SOURCE_PORT,TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_04 (DEST_PORT,TIMESTAMP),
KEY IDX_TBL_PROBE_DAILY_06 (DAYSTAMP,TIMESTAMP,BYTES)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (to_days(DAYSTAMP))
(PARTITION TBL_PROBE_DAILY_P20100303 VALUES LESS THAN (734200) ENGINE = InnoDB,
PARTITION TBL_PROBE_DAILY_P20100304 VALUES LESS THAN (734201) ENGINE = InnoDB,
PARTITION TBL_PROBE_DAILY_P20100305 VALUES LESS THAN (734202) ENGINE = InnoDB,
PARTITION TBL_PROBE_DAILY_P20100306 VALUES LESS THAN (734203) ENGINE = InnoDB) */;
The Query:
select SOURCE_ADDR as 'Source_IP',
SOURCE_PORT as 'Source_Port',
DEST_ADDR as 'Destination_IP',
DEST_PORT as 'Destination_Port',
BYTES
from TBL_PROBE_DAILY
where DAYSTAMP >= '2010-03-04' and DAYSTAMP <= '2010-03-04'
and TIMESTAMP >= FROM_UNIXTIME(1267653600) and TIMESTAMP <= FROM_UNIXTIME(1267687228)
order by bytes desc limit 20;
I suspect the problem is that your query contains two range queries. I my experience, MySQL cannot optimise beyond the first range query it encounters, and so as far as it is concerned, any index beginning with DAYSTAMP is equivalent to any other.
The clue in the explain is key length: this shows how much of the index value actually gets used. It is probably the same value (3) even when you force it to use the index you want.
Using an open ended equality in where always forces a filesort. Simply put, an open ended < or > makes MySQL get the rows and order them to eliminate the ones not in matching your query. If logically this query can be changed into a range (between timestamp X and timestamp Y) THEN MySQL can use those bookend values to get results directly from the index and then either filesort if you still want the return sorted or not if you only want to match the values
Swapping did worked because
To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 8.3.1.11, “ORDER BY Optimization”, and Section 8.3.1.12, “GROUP BY Optimization”.
Related
I create 8 key partitions, but each partitions row count is not flat.
The row counts of each partition has pattern: p0, p2, p4, p6 partition have 99.98% of rows, and p1, p3, p5, p7 partition have 0.02% of rows.
I want to fix it, so I wonder how MySQL determine the target partition when execute select statement.
Or, is there any better solution that can flatten this partition?
The mysql version is 5.7
Thanks.
Edit: I know the key partition works with md5() and mod. but I want to know how MySQL ACTUALLY calculate it.
Edit:
Schema
CREATE TABLE `WD` (
`dId` varchar(120) NOT NULL,
`wId` varchar(120) NOT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime NOT NULL,
PRIMARY KEY (`wId`,`dId`),
KEY `idx_WD_w_d` (`wId`,`dId`),
KEY `idx_WD_d_w` (`dId`,`wId`),
KEY `idx_WD_w_u` (`wId`,`updatedAt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY KEY (workspaceId)
PARTITIONS 11 */
CREATE TABLE `DA` (
`id` varchar(120) NOT NULL,
`wId` varchar(120) NOT NULL,
`subject` varchar(180) NOT NULL,
`dId` varchar(120) NOT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime NOT NULL,
PRIMARY KEY (`id`,`workspaceId`),
KEY `idx_DA_w_s_d` (`workspaceId`,`subject`,`documentId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY KEY (wId)
PARTITIONS 11 */
Explain:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE WD p1 ALL PRIMARY,idx_WD_w_d,idx_WD_d_w,idx_WD_w_u NULL NULL NULL 1 100.00 Using where; Using filesort
1 SIMPLE DA p1 ref idx_documentAcl_w_s_d idx_documentAcl_w_s_d 1266 const,const,DocumentService.WD.documentId 1 100.00 Using index
The computation for picking the partition us quite lame, alas. It is a simple modulo the number of partitions.
Key partitioning, in my opinion, is useless. I know of no case where it helps performance, nor anything else.
Please provide the main queries that you will use with this table; I will explain how to make optimal indexes without using partitioning. Or, in the rare case that partitioning is useful, I will explain what to do instead of what you are trying.
A particular query
Reformulating that query this way may help wit performance.
SELECT WD.*
FROM WD
JOIN DA ON WD.did = DA.did
WHERE WD.wid = '...'
AND DA.wid = '...'
AND DA.subject = '...'
ORDER BY WD.updatedAt DESC -- (per Comment)
LIMIT 50;
And have these composite indexes, most of which you already have:
WD: INDEX(wid, did)
WD: INDEX(did, wid)
WD: INDEX(wid, upddatedAt)
DA: INDEX(wid, subject, did)
Be aware that UUIDs do not scale well.
Meanwhile, I see no performance benefit in Partitioning since the indexes should work quite well.
One more thing. A LIMIT without an ORDER BY gives you some random set of rows. Note that adding an ORDER BY is likely to alter my advice on indexing.
You mentioned UUIDs -- does that mean you expect them to be Unique? If so, do you really need DA.id? (There may be a benefit to changing the PK of DA.)
I have a MySQL table structured like this:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
`message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date.
"id" is auto-incrementing and date is also always increasing.
The queries generated by Grafana look like this:
SELECT
UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120
This query takes over 30 seconds to complete with 27 million records in the table.
Explaining the query results in this output:
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | messages | NULL | ALL | PRIMARY | NULL | NULL | NULL | 26952821 | 11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?
Plan A:
PRIMARY KEY(date, id), -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy
Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.
Plan B:
PRIMARY KEY(id),
INDEX(date, serverid)
Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.
But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.
Plan C: There may be a still better way:
PRIMARY KEY(id),
INDEX(server_id, date)
In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.
Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.
The index on (id, date) doesn't help because the first key is id not date.
You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.
I'm just experimenting a bit with partitions with some dummy data, and am not having any luck optimizing my queries so far.
I downloaded a dataset from the Internet, which consists of a single table of measurements:
CREATE TABLE `partitioned_measures` (
`measure_timestamp` datetime NOT NULL,
`station_name` varchar(255) DEFAULT NULL,
`wind_mtsperhour` int(11) NOT NULL,
`windgust_mtsperhour` int(11) NOT NULL,
`windangle` int(3) NOT NULL,
`rain_mm` decimal(5,2) DEFAULT NULL,
`temperature_dht11` int(5) DEFAULT NULL,
`humidity_dht11` int(5) DEFAULT NULL,
`barometric_pressure` decimal(10,2) NOT NULL,
`barometric_temperature` decimal(10,0) NOT NULL,
`lux` decimal(7,2) DEFAULT NULL,
`is_plugged` tinyint(1) DEFAULT NULL,
`battery_level` int(3) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (TO_DAYS(measure_timestamp))
(PARTITION `slow` VALUES LESS THAN (736634) ENGINE = InnoDB,
PARTITION `fast` VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Just as a learning exercise I wanted to try to partition the measurements by measure_timestamp (without help of indexing). Specifically, I thought it would be interesting to try and put the most recent month in a partition by itself. (I understand that it's best to have equally-sized partitions, but I just wanted to experiment)
I used the following command to add the partition (Note that the dataset ends in Dec of 2016, and the vast majority of the datapoints are in prior months):
ALTER TABLE partitioned_measures
PARTITION BY RANGE(TO_DAYS(measure_timestamp)) (
PARTITION slow VALUES LESS THAN(TO_DAYS('2016-12-01')),
PARTITION fast VALUES LESS THAN (MAXVALUE)
);
To query, I'm looking at all entries from the 2nd and onward (just to be sure that I'm only looking in the latest partition):
select SQL_NO_CACHE COUNT(*) FROM partitioned_measures
WHERE measure_timestamp >= '2016-12-02'
AND DAYOFWEEK(measure_timestamp) = 1;
When I add an EXPLAIN to the front of that, I get the following:
+----+-------------+----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | partitioned_measures | slow,fast | ALL | NULL | NULL | NULL | NULL | 1835458 | 33.33 | Using where |
+----+-------------+----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
But the query time is about the same as it was before the partition (~1.6 seconds). I've never used partitions before so I feel like there's something conceptual that I'm missing.
Tricky but i found a working solution or should i say a workaround, it seams to be a MySQL bug?
ALTER TABLE partitioned_measures
PARTITION BY RANGE COLUMNS(measure_timestamp) (
PARTITION slow VALUES LESS THAN('2016-12-01'),
PARTITION fast VALUES LESS THAN(MAXVALUE)
);
see demo which does use Partition Pruning correctly
i noticed that syntax here
I still find it wierd the partioning puning does not work correct, with
ALTER TABLE partitioned_measures
PARTITION BY RANGE(TO_DAYS(measure_timestamp)) (
PARTITION slow VALUES LESS THAN(TO_DAYS('2016-12-01')),
PARTITION fast VALUES LESS THAN (MAXVALUE)
);
MySQL 5.7 should be able to do the Partition Pruning which TO_DAYS() just fine
Pruning can also be applied for tables partitioned on a DATE or
DATETIME column when the partitioning expression uses the YEAR() or
TO_DAYS() function. In addition, in MySQL 5.7
source
see demo which does not use Partition Pruning correct, i've tryed alot to get it working all methods failed which i could think off.
The explanation:
It did do the pruning you requested, but it added the first partition. Why? Because there is where bad dates are put.
The workaround is to have a bogus first partition:
/*!50100 PARTITION BY RANGE (TO_DAYS(measure_timestamp))
({ARTITION bogus VALUES LESS THAN (0) ENGINE = InnoDB, -- any small value
PARTITION `slow` VALUES LESS THAN (736634) ENGINE = InnoDB,
PARTITION `fast` VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Reference is buried in https://dev.mysql.com/doc/refman/5.7/en/partitioning-handling-nulls.html
If you had more than a trivial number of partitions you might have been more obvious that it picked the desired partition, plus always the first.
With rare exceptions, partitioning does not provide better performance than you can get from a non-partitioned table with a suitable index. In this case, INDEX(measure_timestamp). (Or a virtual column with INDEX(dow, measure_timestamp).)
I want to partition a very large table. As the business is growing, partitioning by date isn't really that good because each year the partitions get bigger and bigger. What I'd really like is a partition for every 10 million records.
The Mysql manual show this simple example:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
But this means that everything larger than 16 and less than MAXVALUE gets thrown in the last partition. Is there a way to auto-generate a new partition every interval (in my case, 10 million records) so I won't have to keep modifying an active database? I am running Mysql 5.5
Thanks!
EDIT: Here is my actual table
CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
PRIMARY KEY (`row_id`),
UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
KEY `index1` (`filename`),
KEY `index2` (`timestamp`),
KEY `index3` (`timestamp`,`filename`,`string`),
KEY `index4` (`filename`,`unit_num`)
) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1
and an example query for the graph is...
SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY)
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
Here is the explain for the query...
mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| 1 | SIMPLE | unit_tarma | ref | timestamp,index3,index4 | index4 | 58 | const,const | 13440 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
(This answer is directed at the schema and SELECT.)
Since you anticipate millions of rows, first I want to point out some improvements to the schema.
FLOAT(m,n) is usually the 'wrong' thing to do because it leads to two roundings. Either use plain FLOAT (which seems 'right' for metrics like voltage) or use DECIMAL(m,n). FLOAT is 4 bytes; in the cases given, DECIMAL would be 3 or 4 bytes.
When you have both INDEX(a) and INDEX(a,b), the former is unnecessary since the latter can cover for such. You have 3 unnecessary KEYs. This slows down INSERTs.
INT(3) -- Are you saying a "3-digit number"? If so consider TINYINT UNSIGNED (values 0..255) for 1 byte instead of INT for 4 bytes. This will save many MB of disk space, hence speed. (See also SMALLINT, etc, and SIGNED or UNSIGNED.)
If filename is repeated a lot, you may want to "normalize" it. This would save many MB.
Use NOT NULL unless you need NULL for something.
AUTO_INCREMENT=690892041 implies that you are about 1/3 of the way to disaster with id, which will top out at about 2 billion. Do you use id for anything? Getting rid of the column would avoid the issue; and change the UNIQUE KEY to PRIMARY KEY. (If you do need id, let's talk further.)
ENGINE=MyISAM -- Switching has some ramifications, both favorable and unfavorable. The table would become 2-3 times as big. The 'right' choice of PRIMARY KEY would further speed up this SELECT significantly. (And may or may not slow down other SELECTs.)
A note on the SELECT: Since string and unit_num are constants in the query, the last two fields of ORDER BY timestamp asc, string asc, unit_num asc are unnecessary. If they are relevant for reasons not apparent in the SELECT, then my advice may be incomplete.
This
WHERE filename = 'foobar'
AND unit_num='40'
AND string='2'
AND timestamp >= ...
is optimally handled by INDEX(filename, unit_name, string, timestamp). The order of the columns is not important except that timestamp needs to be last. Rearranging the current UNIQUE key, you give you the optimal index. (Meanwhile, none of the indexes is very good for this SELECT.) Making it the PRIMARY KEY and the table InnoDB would make it even faster.
Partitioning? No advantage. Not for performance; not for anything else you have mentioned. A common use for partitioning is for purging 'old'. If you intend to do such, let's talk further.
In huge tables it is best to look at all the important SELECTs simultaneously so that we don't speed up one while demolishing the speed of others. It may even turn out that partitioning helps in this kind of tradeoff.
First, I must ask what benefit Partitioning gives you? Is there some query that runs faster because of it?
There is no auto-partitioning.
Instead, you should have a job that runs every day and it counts the number of rows in the 'last active' partition to see if it is about 10M. If so, add another partition.
I recommend keeping the "last" partition (the one with MAXVALUE) empty. That way you can REORGANIZE PARTITION to split it into two empty partitions with essentially zero overhead. And I recommend that instead of ADD PARTITION because you might slip up and put something in the last partition.
It is unclear what will trigger 10M. Are there multiple rows for each store_id? And are there new rows coming in for each store? If so, then partitioning on store_id since all partitions will be growing all the time.
OK, so store_id was just a lame example from the reference manual. Please provide SHOW CREATE TABLE so we can talk concrete, not hand-waving. There are simply too many ways to take this task.
What is the activity?
If you mostly hit the "recent" partition(s), then an uneven distribution may be warrantied -- periodically add a new partition and combine an adjacent pair of old partitions. (I did this successfully in one system.)
If you will be purging "old" data, then clearly you need to use PARTITION BY RANGE(TO_DAYS(...)) and use DROP PARTITION plus REORGANIZE PARTITION.
And there are lots of other scenarios. But I know of only 4 scenarios where Partitioning provides any performance benefit. See my blog.
I want to create a partitioned table which is going to be filled with hundreds of millions of records. Using partitioning how can I have a particular day's records go into one partition, then the next day's in another, etc.. Then after ninety odd days I can delete old data from the oldest partition.
I tried this declaration (the hash function uses a modulo against the amount of partitions to calculate which partition gets the data). This ensures each day uses a different one of the 92 partitions; except it doesn't work.
CREATE TABLE records(
id INT NOT NULL AUTO_INCREMENT,
dt DATETIME,
PRIMARY KEY (id)
)
PARTITION BY HASH((MOD(DAYOFYEAR(dt), 92) + 92))
PARTITIONS 92;
The problem with the above snippet is that the column used in the hash expression has to be a unique key within the table.
How can I fix this so that I have ninety(ish) rotating partitions based on each day's records?
If I simply add the dt column to primary key, it seems to hit all the partitions if a select a date range, which is not what I want.
Any ideas?
The reason is that to partition on a date field and query by range you must either use YEAR() or TO_DAYS() in the partition expression.
Partitioning like this works as expected:
CREATE TABLE `alert` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`eventId` int(10) unsigned NOT NULL,
`occurred` datetime NOT NULL,
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
/*!50100 PARTITION BY RANGE (TO_DAYS(occurred))
(PARTITION 28_06 VALUES LESS THAN (735413) ENGINE = InnoDB,
PARTITION 29_06 VALUES LESS THAN (735414) ENGINE = InnoDB,
PARTITION 30_06 VALUES LESS THAN (735415) ENGINE = InnoDB,
PARTITION 01_07 VALUES LESS THAN (735416) ENGINE = InnoDB,
PARTITION 02_07 VALUES LESS THAN (735417) ENGINE = InnoDB,
PARTITION 03_07 VALUES LESS THAN (735418) ENGINE = InnoDB,
PARTITION 04_07 VALUES LESS THAN (735419) ENGINE = InnoDB,
PARTITION 05_07 VALUES LESS THAN (735420) ENGINE = InnoDB,
PARTITION 06_07 VALUES LESS THAN (735421) ENGINE = InnoDB,
PARTITION 07_07 VALUES LESS THAN (735422) ENGINE = InnoDB) */
mysql> explain partitions SELECT * FROM alert WHERE occurred >= '2013-07-02' and occurred <= '2013-07-04';
+----+-------------+-------+-------------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | alert | 02_07,03_07,04_07 | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
+----+-------------+-------+-------------------+------+---------------+------+---------+------+------+-------------+
Then you need to manage dropping and creating of the partition yourself.
Actually, the problem is that you can't define a PRIMARY or UNIQUE key on a partitioned table, if all the columns in the key are not included in the hash function.
One possible "fix" would be to remove the "PRIMARY" keyword from the KEY definition.
The problem is that MySQL has to enforce uniqueness when you declare a key to be UNIQUE or PRIMARY. And in order to enforce that, MySQL needs to be able to check whether the key value already exists. Instead of checking every partition, MySQL uses the partitioning function to determine the partition where a particular key would be found.