I've one generic question regarding retrieving datas from an partitioned table (MySQL Database).
I've Partitioned by Year range in the table.Also i've indexed the partitioned column.
PARTITION BY RANGE (YEAR(`RDATE`))
(PARTITION pp0 VALUES LESS THAN (0) ENGINE = MyISAM,
PARTITION p0 VALUES LESS THAN (2000) ENGINE = MyISAM,
PARTITION p1 VALUES LESS THAN (2005) ENGINE = MyISAM,
PARTITION p2 VALUES LESS THAN (2008) ENGINE = MyISAM,
PARTITION p3 VALUES LESS THAN (2010) ENGINE = MyISAM,
PARTITION p4 VALUES LESS THAN (2011) ENGINE = MyISAM,
PARTITION p5 VALUES LESS THAN (2012) ENGINE = MyISAM,
PARTITION p6 VALUES LESS THAN (2013) ENGINE = MyISAM,
PARTITION p7 VALUES LESS THAN (2014) ENGINE = MyISAM,
PARTITION p8 VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */
while running the query below..
explain partitions select Products from salesdata where rdate between '2011-11-01' and LAST_DAY('2011-11-01')
the result is like (total rows 2502 scanned..)
id select_type table partitions type possible_keys key key_len ref rows Extra
1 SIMPLE sldata p5 range sls_ind sls_ind 3 2502 Using where
But if i run this query..
explain partitions select Products from salesdata where rdate between '2011-12-01' and LAST_DAY('2011-12-01')
the result is (total rows 55181 scanned..)
id select_type table partitions type possible_keys key key_len ref rows Extra
1 SIMPLE sldata p5 ALL sls_ind 55181 Using where
Anyone have an idea, why that second query scans the whole partition unlike the first query..?
Related
I have MySQL database log table that increasing daily with 5m of data
and I have issue with collecting data from that table to make some analyze counts.
I have listed the details of the problem as follows:
This is my log table:
CREATE TABLE `details` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`provider` VARCHAR(25) NULL COLLATE 'utf8mb4_unicode_ci',
`DLR_Status` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
`source` VARCHAR(30) NULL COLLATE 'utf8mb4_bin',
`Destination` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',
`msg` VARCHAR(1000) NULL COLLATE 'utf8mb4_unicode_ci',
`timestamp` TIMESTAMP NULL,
`msg_timestamp` INT NOT NULL,
`created_at` TIMESTAMP NULL,
`updated_at` TIMESTAMP NULL,
PRIMARY KEY (`id`, `msg_timestamp`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
AUTO_INCREMENT=24169513
/*!50100 PARTITION BY RANGE (`msg_timestamp`)
(PARTITION p2016_02 VALUES LESS THAN (1456779600) ENGINE = InnoDB,
PARTITION p2016_03 VALUES LESS THAN (1459458000) ENGINE = InnoDB,
PARTITION p2016_04 VALUES LESS THAN (1462050000) ENGINE = InnoDB,
PARTITION p2016_05 VALUES LESS THAN (1464728400) ENGINE = InnoDB,
PARTITION p2016_06 VALUES LESS THAN (1467320400) ENGINE = InnoDB,
PARTITION p2016_07 VALUES LESS THAN (1469998800) ENGINE = InnoDB,
PARTITION p2016_08 VALUES LESS THAN (1472677200) ENGINE = InnoDB,
PARTITION p2016_09 VALUES LESS THAN (1475269200) ENGINE = InnoDB,
PARTITION p2016_10 VALUES LESS THAN (1477947600) ENGINE = InnoDB,
PARTITION p2016_11 VALUES LESS THAN (1480539600) ENGINE = InnoDB,
PARTITION p2016_12 VALUES LESS THAN (1483218000) ENGINE = InnoDB,
PARTITION p2017_01 VALUES LESS THAN (1485896400) ENGINE = InnoDB,
PARTITION p2017_02 VALUES LESS THAN (1488315600) ENGINE = InnoDB,
PARTITION p2017_03 VALUES LESS THAN (1490994000) ENGINE = InnoDB,
PARTITION p2017_04 VALUES LESS THAN (1493586000) ENGINE = InnoDB,
PARTITION p2017_05 VALUES LESS THAN (1496264400) ENGINE = InnoDB,
PARTITION p2017_06 VALUES LESS THAN (1498856400) ENGINE = InnoDB,
PARTITION p2017_07 VALUES LESS THAN (1501534800) ENGINE = InnoDB,
PARTITION p2017_08 VALUES LESS THAN (1504213200) ENGINE = InnoDB,
PARTITION p2017_09 VALUES LESS THAN (1506805200) ENGINE = InnoDB,
PARTITION p2017_10 VALUES LESS THAN (1509483600) ENGINE = InnoDB,
PARTITION p2017_11 VALUES LESS THAN (1512075600) ENGINE = InnoDB,
PARTITION p2017_12 VALUES LESS THAN (1514754000) ENGINE = InnoDB,
PARTITION p2018_01 VALUES LESS THAN (1517432400) ENGINE = InnoDB,
PARTITION p2018_02 VALUES LESS THAN (1519851600) ENGINE = InnoDB,
PARTITION p2018_03 VALUES LESS THAN (1522530000) ENGINE = InnoDB,
PARTITION p2018_04 VALUES LESS THAN (1525122000) ENGINE = InnoDB,
PARTITION p2018_05 VALUES LESS THAN (1527800400) ENGINE = InnoDB,
PARTITION p2018_06 VALUES LESS THAN (1530392400) ENGINE = InnoDB,
PARTITION p2018_07 VALUES LESS THAN (1533070800) ENGINE = InnoDB,
PARTITION p2018_08 VALUES LESS THAN (1535749200) ENGINE = InnoDB,
PARTITION p2018_09 VALUES LESS THAN (1538341200) ENGINE = InnoDB,
PARTITION p2018_10 VALUES LESS THAN (1541019600) ENGINE = InnoDB,
PARTITION p2018_11 VALUES LESS THAN (1543611600) ENGINE = InnoDB,
PARTITION p2018_12 VALUES LESS THAN (1546290000) ENGINE = InnoDB,
PARTITION p2019_01 VALUES LESS THAN (1548968400) ENGINE = InnoDB,
PARTITION p2019_02 VALUES LESS THAN (1551387600) ENGINE = InnoDB,
PARTITION p2019_03 VALUES LESS THAN (1554066000) ENGINE = InnoDB,
PARTITION p2019_04 VALUES LESS THAN (1556658000) ENGINE = InnoDB,
PARTITION p2019_05 VALUES LESS THAN (1559336400) ENGINE = InnoDB,
PARTITION p2019_06 VALUES LESS THAN (1561928400) ENGINE = InnoDB,
PARTITION p2019_07 VALUES LESS THAN (1564606800) ENGINE = InnoDB,
PARTITION p2019_08 VALUES LESS THAN (1567285200) ENGINE = InnoDB,
PARTITION p2019_09 VALUES LESS THAN (1569877200) ENGINE = InnoDB,
PARTITION p2019_10 VALUES LESS THAN (1572555600) ENGINE = InnoDB,
PARTITION p2019_11 VALUES LESS THAN (1575147600) ENGINE = InnoDB,
PARTITION p2019_12 VALUES LESS THAN (1577826000) ENGINE = InnoDB,
PARTITION p2020_01 VALUES LESS THAN (1580504400) ENGINE = InnoDB,
PARTITION p2020_02 VALUES LESS THAN (1583010000) ENGINE = InnoDB,
PARTITION p2020_03 VALUES LESS THAN (1585688400) ENGINE = InnoDB,
PARTITION p2020_04 VALUES LESS THAN (1588280400) ENGINE = InnoDB,
PARTITION p2020_05 VALUES LESS THAN (1590958800) ENGINE = InnoDB,
PARTITION p2020_06 VALUES LESS THAN (1593550800) ENGINE = InnoDB,
PARTITION p2020_07 VALUES LESS THAN (1596229200) ENGINE = InnoDB,
PARTITION p2020_08 VALUES LESS THAN (1598907600) ENGINE = InnoDB,
PARTITION p2020_09 VALUES LESS THAN (1601499600) ENGINE = InnoDB,
PARTITION p2020_10 VALUES LESS THAN (1604178000) ENGINE = InnoDB,
PARTITION p2020_11 VALUES LESS THAN (1606770000) ENGINE = InnoDB,
PARTITION p2020_12 VALUES LESS THAN (1609448400) ENGINE = InnoDB,
PARTITION p2021_01 VALUES LESS THAN (1612126800) ENGINE = InnoDB,
PARTITION p2021_02 VALUES LESS THAN (1614546000) ENGINE = InnoDB,
PARTITION p2021_03 VALUES LESS THAN (1617224400) ENGINE = InnoDB,
PARTITION p2021_04 VALUES LESS THAN (1619816400) ENGINE = InnoDB,
PARTITION p2021_05 VALUES LESS THAN (1622494800) ENGINE = InnoDB,
PARTITION p2021_06 VALUES LESS THAN (1625086800) ENGINE = InnoDB,
PARTITION p2021_07 VALUES LESS THAN (1627765200) ENGINE = InnoDB,
PARTITION p2021_08 VALUES LESS THAN (1630443600) ENGINE = InnoDB,
PARTITION p2021_09 VALUES LESS THAN (1633035600) ENGINE = InnoDB,
PARTITION p2021_10 VALUES LESS THAN (1635714000) ENGINE = InnoDB,
PARTITION p2021_11 VALUES LESS THAN (1638306000) ENGINE = InnoDB,
PARTITION p2021_12 VALUES LESS THAN (1640984400) ENGINE = InnoDB,
PARTITION p2022_01 VALUES LESS THAN (1643662800) ENGINE = InnoDB,
PARTITION p2022_02 VALUES LESS THAN (1646082000) ENGINE = InnoDB,
PARTITION p2022_03 VALUES LESS THAN (1648760400) ENGINE = InnoDB,
PARTITION p2022_04 VALUES LESS THAN (1651352400) ENGINE = InnoDB,
PARTITION p2022_05 VALUES LESS THAN (1654030800) ENGINE = InnoDB,
PARTITION p2022_06 VALUES LESS THAN (1656622800) ENGINE = InnoDB,
PARTITION p2022_07 VALUES LESS THAN (1659301200) ENGINE = InnoDB,
PARTITION p2022_08 VALUES LESS THAN (1661979600) ENGINE = InnoDB,
PARTITION p2022_09 VALUES LESS THAN (1664571600) ENGINE = InnoDB,
PARTITION p2022_10 VALUES LESS THAN (1667250000) ENGINE = InnoDB,
PARTITION p2022_11 VALUES LESS THAN (1669842000) ENGINE = InnoDB,
PARTITION p2022_12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
It's contains log data as follows:
id
provider
DLR_Status
source
Destination
msg
timestamp
msg_timestamp
1
KDD
done
website
01332456
free delivery
2019-12-01 12:00:13
1575201613
2
KDD
done
By phone
01322422
with cost 300
2019-12-01 12:00:37
1575201637
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The issue is that when i select some counts from this table like
SELECT SQL_CALC_FOUND_ROWS DLR_status,count(*) as c
FROM sms_details
group by DLR_status;
It's take very long time to give the results and some query give 504 Gateway Time-out error like this query
SELECT SQL_CALC_FOUND_ROWS Destination,count(*) as c
FROM sms_details
WHERE msg_timestamp >= UNIX_TIMESTAMP("2019-10-01") and msg_timestamp < UNIX_TIMESTAMP("2019-12-01") group by Destination;
I am already use partitioning in my table
and I tried to indexing some column but that make big problem with daily increasing data.
So what is the best practices for the following:
Speed up execution time
Caring about an insert speed
Shrink schema -- Smaller --> less I/O --> faster.
timestamp vs msg_timestamp -- These seem to be the same, just in different formats. Of so, toss one of them.
Normalization speeds up the inserts some -- by shrinking the amount of data. Most of the VARCHARs could be replaced by a 2-byte SMALLINT UNSIGNED or a 3-byte MEDIUMINT UNSIGNED`.
Future partitions -- Do not have more than one such; the SELECTs will waste time opening them to find nothing.
Too many partitions -- At some limit (maybe 50), having lots of partitions slows things down.
Batching is the best speedup. See LOAD DATA or INSERT ... VALUES (...), (...), .... In the latter case, I recommend batches of 1000 rows. (Going beyond that is getting into diminishing returns and possibly some limits.) If the data is coming from multiple sources, explain; then we can talk further.
Partitioning is very useful for purging 'old' data because DROP PARTITION is a lot faster than DELETE. See http://mysql.rjweb.org/doc.php/partitionmaint
Toss created_at and updated_at; they are probably useless. (Again, smaller is faster.)
SQL_CALC_FOUND_ROWS is not needed when you don't have LIMIT; simply observe how many rows are returned. Rethink the user requirement for such. (Come back for more discussion, if desired.)
DLR_status counts will be a full index scan if you have INDEX(DLR_status). And consider making that column an ENUM so that it is only 1 byte. (If there are several values and/or a growing number of values, then "normalize".)
Query 2 needs INDEX(Destination, msg_timestmap).
Is it big? 24M rows / 5 years --> less than 1 row per second. 100 rows/sec is where I start to worry about "high-speed ingestion". That is, I don't see a problem with inserts. Selects, on the other hand, may be a problem. You showed us two; let's see more. I don't want to recommend one index at a time; I would rather design a set of indexes to optimally handle all the likely queries. Especially since it may involve redesigning the partitioning.
Summary tables are an excellent way to do fast analysis in a "Data warehouse". See http://mysql.rjweb.org/doc.php/summarytables
Can someone explain the difference between below commands?
ALTER TABLE A DROP PARTITION p0;
and
ALTER TABLE A TRUNCATE PARTITION p0;
In which scenarios should we use DROP/TRUNCATE partition?
Both throw the data away. And it is not 'transactional', so you cannot recover the data with a ROLLBACK.
DROP PARTITION also removes the partition from the list of partitions.
TRUNCATE PARTITION leaves the partition in place, but empty.
A common usage of DROP PARTITION is to remove "old" rows. Think of a table of of information that needs to be kept for only 90 days. Use PARTITION BY RANGE(TO_DAYS(...)) and have weekly partitions. Then, every week DROP the oldest and ADD a new partition. More discussion here.
I have not seen a need for TRUNCATE.
Be aware that there are very few use cases where you can get any benefit from PARTITIONing. So far, I have found uses only for PARTITION BY RANGE.
TRUNCATING a partition will be good choice when you have LIST partitions on the table.
It will remove all rows which are part of LIST partition but will not remove the partition entry from the table structure.
Take a scenario where you want to store credit card transactions/orders placed etc., in a MySQL table. Since the data volume is huge, you might want to partition it. Say you have partitioned the table based on the month of transaction.
PARTITION BY RANGE ( month(transactionDate))
(PARTITION p0 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (12) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (13) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
You can also do it week wise if your data volume is huge.
Now you have to clean the old data from time to time. This is where the difference between Drop and Truncate comes. The list of partitions you have initially is p0,p1...p12.
When you drop a partition p1, the list becomes p0,p2,p3,p4...p12. So effectively the data for both Feb and March would go into p2.
But when you do a truncate, p1 is still intact but the data is evicted. So the list remains p0,p1...p12.
I have searched lot about automation of mysql partitioning.
But unfortunately nothing matches to problem.
I want delete an older partitions which are not needed but at the same time add new upcoming data to partition.
What I can do here is every day drop an older partition and create new partitions with some automated functions of mysql such as MONTH(NOW()-interval 2 month) etc.
But what it will do is increase the cost of operation as every night I need recreate the partitions for new data.
I found that i can use partitioning by range but there are all hardcoded examples suggest's that i might need to do partitioning every time new data gets added.
Here is an example I found but not much similar to me :
ALTER TABLE t1 PARTITION BY
RANGE(TO_DAYS(FROM_UNIXTIME(transaction_date)))(
PARTITION JAN VALUES LESS THAN (TO_DAYS('2013-02-01')),
PARTITION FEB VALUES LESS THAN (TO_DAYS('2013-03-01')),
PARTITION MAR VALUES LESS THAN (TO_DAYS('2013-04-01')),
PARTITION APR VALUES LESS THAN (TO_DAYS('2013-05-01')),
PARTITION MAY VALUES LESS THAN (TO_DAYS('2013-06-01')),
PARTITION JUN VALUES LESS THAN (TO_DAYS('2013-07-01')),
PARTITION JUL VALUES LESS THAN (TO_DAYS('2013-08-01')),
PARTITION AUG VALUES LESS THAN (TO_DAYS('2013-09-01')),
PARTITION SEP VALUES LESS THAN (TO_DAYS('2013-10-01')),
PARTITION `OCT` VALUES LESS THAN (TO_DAYS('2013-11-01')),
PARTITION NOV VALUES LESS THAN (TO_DAYS('2013-12-01')),
PARTITION `DEC` VALUES LESS THAN (TO_DAYS('2014-01-01'))
);
Please suggest me a proper way to do it.
There is no fully automated way -- You need to write code.
But first, let's fix an issue. Have another partition:
PARTITION future VALUES LESS THAN (MAXVALUE)
This will come in handy if you accidentally fail to roll the partitions some night.
And how about a bug: Your table essentially never has a full 12 months of data. Just after a sliding of the partitions, you will have only 11 months. Is that OK? If not, keep 13 months, not 12.
Now for some code to do the work, plus perhaps some more tips: http://mysql.rjweb.org/doc.php/partitionmaint
You can do it like this. This will automatically store the data in the corresponding partitions. Regarding the automation of truncating them, I too am exploring the creation of scheduled events.
PARTITION BY RANGE ( month(creationDate))
(PARTITION p0 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (3) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (4) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (5) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (6) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (7) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (8) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (9) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (10) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (11) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (12) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (13) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
This can be extended to creating partitions based on a week as well.
So I have decent size table with transactions. I can not redesign it or add more indexes. Data as is. Half of business logic do full scan anyway.
I get an idea to move some old/unused data to 'archive table' to speed up stuff a little and be able once in a while scan all data anyway. Or other direction around: create 'fast table' where I will have only fresh (last month or so) data, and normal table which will have all data.
Is anyone aware of this kind of technic or anything? Where to read? better approaches? How to implement in code or something.
For now we have around 50 mil rows of data with crazy partitions which made more harm then use.
PARTITION BY LIST (user_id%20)
SUBPARTITION BY KEY (user_role)
SUBPARTITIONS 4
(PARTITION p0 VALUES IN (0) ENGINE = InnoDB,
PARTITION p1 VALUES IN (1) ENGINE = InnoDB,
PARTITION p2 VALUES IN (2) ENGINE = InnoDB,
PARTITION p3 VALUES IN (3) ENGINE = InnoDB,
PARTITION p4 VALUES IN (4) ENGINE = InnoDB,
PARTITION p5 VALUES IN (5) ENGINE = InnoDB,
PARTITION p6 VALUES IN (6) ENGINE = InnoDB,
PARTITION p7 VALUES IN (7) ENGINE = InnoDB,
PARTITION p8 VALUES IN (8) ENGINE = InnoDB,
PARTITION p9 VALUES IN (9) ENGINE = InnoDB,
PARTITION p10 VALUES IN (10) ENGINE = InnoDB,
PARTITION p11 VALUES IN (11) ENGINE = InnoDB,
PARTITION p12 VALUES IN (12) ENGINE = InnoDB,
PARTITION p13 VALUES IN (13) ENGINE = InnoDB,
PARTITION p14 VALUES IN (14) ENGINE = InnoDB,
PARTITION p15 VALUES IN (15) ENGINE = InnoDB,
PARTITION p16 VALUES IN (16) ENGINE = InnoDB,
PARTITION p17 VALUES IN (17) ENGINE = InnoDB,
PARTITION p18 VALUES IN (18) ENGINE = InnoDB,
PARTITION p19 VALUES IN (19) ENGINE = InnoDB)
Where 3 different user_role and around 20 rows per uniq user_id. Do not ask me what the hell, it just is what it is....
Thank you.
PS: I absolutely understand it is better invest time to fully analyze and redesign table itself, however sometimes it's impossible 'cuz of politics and simple shitty people who above you in food chain.
You may want to look into Table Partitioning
, specifically Range Partitioning:
If you partition you table by the date column/timestamp, Mysql will then only scan an appropriate partition, during the query:
You frequently run queries that depend directly on the column used for partitioning the table. For example, when executing a query such as EXPLAIN PARTITIONS SELECT COUNT(*) FROM employees WHERE separated BETWEEN '2000-01-01' AND '2000-12-31' GROUP BY store_id;, MySQL can quickly determine that only partition p2 needs to be scanned because the remaining partitions cannot contain any records satisfying the WHERE clause.
I have a table I want to partition using MySQL 5.7 Partitioning to mitigate issues I'm having with dropping old data quickly. (Also, it would be nice to have increased insert I/O performance by partitioning across something other than date, especially if I plan to shard across multiple volumes with subpartitions)
Here is a simplified version of the table:
CREATE TABLE `tbl` (
`date` date NOT NULL,
`sub_id` int(11) unsigned NOT NULL,
`cmd_id` int(11) NOT NULL,
`code` TINYINT DEFAULT NULL,
`rqst` VARCHAR(32) NOT NULL DEFAULT '',
UNIQUE KEY `uk1` (sub_id,cmd_id,date)
) ENGINE=InnoDB
(note that use of column 'date' in uk1 is only to allow partitioning on date)
(The true unique key is (sub_id,cmd_id))
Here are the SQL statements I make on that table:
1. INSERT INTO tbl (NOW(), ...)
2. UPDATE tbl SET code=$code WHERE sub_id=$sub_id AND cmd_id=$cmd_id
3. SELECT code,rqst FROM tbl WHERE sub_id=$sub_id AND cmd_id=$cmd_id
Here is the partitioning scheme I've devised so far:
PARTITION BY RANGE (TO_DAYS(date))
SUBPARTITION BY HASH(sub_id)
SUBPARTITIONS 4
(PARTITION d001 VALUES LESS THAN (736250) ENGINE = InnoDB,
PARTITION d002 VALUES LESS THAN (736260) ENGINE = InnoDB,
PARTITION d003 VALUES LESS THAN (736270) ENGINE = InnoDB,
PARTITION d004 VALUES LESS THAN (736280) ENGINE = InnoDB,
PARTITION d005 VALUES LESS THAN (736290) ENGINE = InnoDB,
PARTITION d006 VALUES LESS THAN (736300) ENGINE = InnoDB,
PARTITION d007 VALUES LESS THAN (736310) ENGINE = InnoDB,
PARTITION d008 VALUES LESS THAN (736320) ENGINE = InnoDB,
PARTITION d009 VALUES LESS THAN (736330) ENGINE = InnoDB,
PARTITION d010 VALUES LESS THAN (736340) ENGINE = InnoDB,
PARTITION d011 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
However I believe this will hurt performance by requiring a read per each partition every time i reference (sub_id,cmd_id):
EXPLAIN PARTITIONS SELECT * FROM tbl WHERE sub_id='107' AND cmd_id='2246806';
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
| 1 | SIMPLE | optz | d001_d001sp1,d002_d002sp1,d003_d003sp1,d004_d004sp1,d005_d005sp1,d006_d006sp1,d007_d007sp1,d008_d008sp1,d009_d009sp1,d010_d010sp1,d011_d011sp1 | ref | uk1 | uk1 | 38 | const,const | 11 | Using where |
+----+-------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+-------------+------+-------------+
So the crux of my problem is:
If i partition by D date's then its D-1 Extra lookups
If i partition by S sub_id's then I cant easily DROP partitions on Date
I don't see how I could use COLUMNS Partitioning
Here are some notes/caveats:
INSERTing about 5-20million rows/day
Equal distribution of read,write,insert - but always single row
Only need to keep past ~month of data
A replication system is in place
The hardware involved is expensive
I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.
Thanks!
BY HASH is essentially useless, as are SUBPARTITIONs.
mitigate issues I'm having with dropping old data quickly.
That is, you need to DROP PARTITION for old date? Use PARTITION BY RANGE (TO_DAYS(date)) and don't bother with the subpartitioning.
For clarity, change UNIQUE KEY uk1 (sub_id,cmd_id,date) to PRIMARY KEY (sub_id,cmd_id,date).
[belated edited] Your three queries will work reasonably well with such. The SELECT and UPDATE will have to hit all partitions since date is not in the WHERE clause. The INSERT will hit only the latest partition (because of NOW()).
More discussion, including tips on doing the periodic purging: http://mysql.rjweb.org/doc.php/partitionmaint
Only need to keep past ~month of data
Recommend about 32 partitions -- one pending DROP, one future; see the link.
A replication system is in place
Doing the ALTER TABLE to add partitioning will stall the system, but I guess you understand the issue there.
I didn't want to include the date column in the unique key but then I couldn't partition on it, so the code ensures (sub_id,cmd_id) is unique across dates as it stands.
Yeah, a necessary evil.
5-20million rows/day
That's a max of a few hundred per second? If you have ingestion speed problems, see http://mysql.rjweb.org/doc.php/staging_table