I have a table in my database with 10+M rows.
Actually I never worked with so many records and it cames across me that I need a little help on indexing / partitioning the table.
The table looks like this:
CREATE TABLE `stock` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`data` date NOT NULL,
`cod_pdv` varchar(200) DEFAULT NULL,
`cod_art` varchar(200) DEFAULT NULL,
`xstock` int(11) DEFAULT NULL,
`sellout` int(11) DEFAULT NULL,
`backorder` int(11) DEFAULT NULL,
`id_insegna` int(11) DEFAULT NULL,
PRIMARY KEY (`ID`,`data`),
KEY `index_stock` (`cod_art`,`cod_pdv`,`data`),
KEY `index_data` (`data`),
KEY `index_trac_stock` (`cod_art`,`id_insegna`)
) ENGINE=InnoDB AUTO_INCREMENT=10120378 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (YEAR(data))
SUBPARTITION BY HASH (MONTH(data))
(PARTITION part0 VALUES LESS THAN (2015)
(SUBPARTITION subpart0 ENGINE = InnoDB,
SUBPARTITION subpart1 ENGINE = InnoDB,
SUBPARTITION subpart2 ENGINE = InnoDB,
SUBPARTITION subpart3 ENGINE = InnoDB,
SUBPARTITION subpart4 ENGINE = InnoDB,
SUBPARTITION subpart5 ENGINE = InnoDB,
SUBPARTITION subpart6 ENGINE = InnoDB,
SUBPARTITION subpart7 ENGINE = InnoDB,
SUBPARTITION subpart8 ENGINE = InnoDB,
SUBPARTITION subpart9 ENGINE = InnoDB,
SUBPARTITION subpart10 ENGINE = InnoDB,
SUBPARTITION subpart11 ENGINE = InnoDB,
SUBPARTITION subpart12 ENGINE = InnoDB),
PARTITION part1 VALUES LESS THAN (2016)
(SUBPARTITION subpart13 ENGINE = InnoDB,
SUBPARTITION subpart14 ENGINE = InnoDB,
SUBPARTITION subpart15 ENGINE = InnoDB,
SUBPARTITION subpart16 ENGINE = InnoDB,
SUBPARTITION subpart17 ENGINE = InnoDB,
SUBPARTITION subpart18 ENGINE = InnoDB,
SUBPARTITION subpart19 ENGINE = InnoDB,
SUBPARTITION subpart20 ENGINE = InnoDB,
SUBPARTITION subpart21 ENGINE = InnoDB,
SUBPARTITION subpart22 ENGINE = InnoDB,
SUBPARTITION subpart23 ENGINE = InnoDB,
SUBPARTITION subpart24 ENGINE = InnoDB,
SUBPARTITION subpart25 ENGINE = InnoDB),
PARTITION part2 VALUES LESS THAN (2017)
(SUBPARTITION subpart26 ENGINE = InnoDB,
SUBPARTITION subpart27 ENGINE = InnoDB,
SUBPARTITION subpart28 ENGINE = InnoDB,
SUBPARTITION subpart29 ENGINE = InnoDB,
SUBPARTITION subpart30 ENGINE = InnoDB,
SUBPARTITION subpart31 ENGINE = InnoDB,
SUBPARTITION subpart32 ENGINE = InnoDB,
SUBPARTITION subpart33 ENGINE = InnoDB,
SUBPARTITION subpart34 ENGINE = InnoDB,
SUBPARTITION subpart35 ENGINE = InnoDB,
SUBPARTITION subpart36 ENGINE = InnoDB,
SUBPARTITION subpart37 ENGINE = InnoDB,
SUBPARTITION subpart38 ENGINE = InnoDB));
most of the queries run on data, cod_art, cod_pdv and id_insegna.
An example of query could be:
explain
SELECT s.data, s.cod_art, s.giacenza, s.sellout, s.backorder
FROM stock AS s
WHERE s.cod_art IN ("103666","103672","20509","39730","5000016",
"7004009","7004010","7004055","7004064","7004065","7004105",
"7004133","7004161","7004163","7004178","7004213","7005932",
"7023139","7023142","7031974","7049009","7074201","7074204",
"7082052","7082058","7082062","7082067","7082072","7082077",
"7084113","7084127","7088599","7091092","7091094","7094124",
"7095505","7103663","7103678","7103681","7103684","7103687",
"7103690","7103691","7103748","7103766","7103814","7103832",
"7103834","7103835","7103840","7103860","7103902","7103903",
"7103905","7103906","7103907","7104915","7104916","7104936",
"7104957","7105357","7106936","7106937","7106938","7106943",
"7106945","7106946","7106950","7108714","7108716","7108719",
"7108770","7108771","7108778","7108779","7113920","7113921",
"7113925","7113936","7114837","7115099","7115711","7115712",
"7115713","7115714","7115715","7115716","7115717","7115719",
"7115720","7115722","7118620","7118660","7118663","7118664",
"7118665","7118666","7118667","7121650","7121826","7122100",
"7122101","7122102","7122104","7122105","7122106","7122108",
"7122112","7122113","7122115","7122119","7122120","7122123",
"7122124","7122125","7122130","7122141","7122154","7122157",
"7122158","7122159","7122162","7122224","7122238","7122239",
"7122242","7122245","7122246","7122249","7122251","7122252",
"7122256","7122257","7122262","7122266","7122272","7122273", "7122274","7122275","7122276","7122282","7122295","7122296",
"7122297","7122298","7122304","7122308","7122309","7122310", "7122311","7122312","7122314","7122315","7122318","7122319",
"7122326","7122327","7122370","7122371","7122372","7122374", "7122375","7122376","7122377","7122381","7122382","7122386",
"7122388","7122422","7122423","7122425","7122426","7122432", "7122434","7122435","7122436","7122459","7122460","7122469",
"7122470","7122474","7122475","7122479","7122480","7122483","7122484","7122486","7122489","7122496","7122498","7122504","7122505","7122680","7122682","7123119","7123141","7123151","7123152","7123368","7123900","7123945","7123949","7123950","7124025","7134016","7170052","7170055","7170058","7170062","7170067","7170072","7170077","7275297","7275298","7275299","7287682","7292630","7292631","7292644","7292645","7292659","7411238","7411627","7411628","7411629","7411630","7411631","7411632","7411633","7411678","7411679","7411933","7411944","7411955","7411961","7411995","7411996","7411997","7411998","7411999","7412007","7412008","7412009","7414118","7420171","7430455","7430722","7430724","7430728","7442570","7443160","7443178","7443179","7444126","7444220","7444225","7444733","7446020","7446060","7446080","7448499","7449594","7449645","7456561","7456607","7474229","7478302","7480817","7480834","7480836","7480865","7480868","7480869","7481360","7483186","7483199","7484430","7484431","7495863","7496040","7540619","7544450","7544452","7544459","7544460","7544490","7544491","7544842","7544843","7544854","7544855","7544856","7544930","7544931","7544934","7544935","7556801","7620180","7629100","7630592","7630598","7634033","7634035","7634694","7639626","7639628","7639629","7639658","7639661","7639664","7639669","7639670","7639694","7639709","7639894","7639895","7641894","7641895","7641896","7641898","7641933","7642087","7642089","7642098","7642099","7642141","7644215","7644219","7645021","7645022","7645023","7645027","7645040","7645047","7645058","7645059","7645073","7645074","7645076","7646558","7646741","7646742","7646743","7646745","7646746","7646764","7648910","7648912","7648913","7648925","7649244","7653052","7653056","7653325","7653750","7654141","7654500","7654505","7654507","7654514","7654518","7654542","7654554","7654556","7654560","7654565","7654566","7654567","7654568","7654569","7654587","7654588","7654734","7654736","7654745","7654750","7654770","7654777","7654779","7678400","7678419","7678420","7678421","7678426","7678470","7742625","7743642","7745754","7745762","7746585","7762401","7762409","7762417","7762423","7778939","7786921","7786999","7788416","7788473","7788512","7793723","7793731","7793779","7793780","7793783","H17103829","7108269","7108299","7135533","7135534","7135535","7135502","7135504","7135503","7135505","7135511","7135514","7135512","7135515","7135510","7135513","7123600","7123601","7123602","7123791","7123792","7123793","7123794","7123795","7123797","7123798","7123799","7654737","7250055","7286503","7250172","7250176","7250183","7250184","7250188","7208206","7286520","7451195","7017336","7017335","7495647","7495645","7495646","7451400","7451300","7451302","7451289","7451288","7451290","7451297","7451180","7451184","7444746","7444744","7444745","7451179","7451173","7451339","7101223","7101096","7101226","7101222","7101097","7101095","7101197","7101203","7101207","7101212","7113946","7113972"
)
AND s.id_insegna = '3'
AND s.data >= DATE_SUB(CURDATE(), INTERVAL 26 WEEK)
EXPLAIN:
id select_type table typ possible_keys key key_len ref rows Extra
1 SIMPLE s ALL index_stock,index_data,index_trac_stock NULL NULL NULL 4917092 Using where
However as you can see it doesn't use any index. That could be because I don't have a combined index of with data, cod_art and id_insegna but I've read that add many indexes it may not help.
Probably I've made some mistake on creating the table? For example primary indexing the data? I'm a bit lost. I need a real huge help :v
thanks in advance to everyone.
Plan A -- No PARTITION, just good INDEX
SUBPARTITION BY HASH is useless for performance.
In fact, no form of PARTITIONing will speed up the SELECT you proposed. And most forms will slow down the query.
The optimal index for that select should
Start with any = constant parts of the WHERE (id_insegna)
Continue with one 'range' field (code_art or data)
So, I would first recommend either of these. Or both, and let the optimizer pick between them:
INDEX(d_insegna, code_art)
INDEX(d_insegna, data)
Plan B -- better PARTITION, plus good INDEX
Now, I will partially retract a previous statement and say that, since you have two ranges in the WHERE, we can try to take advantage of PARTITION BY RANGE to help with one of them.
So, I think this would be best:
INDEX(d_insegna, code_art)
PARTITION BY RANGE (TO_DAYS(data))
( start VALUES LESS THAN (0),
PARTITION p201501 VALUES LESS THAN (TO_DAYS('2015-02-01')),
PARTITION p201502 VALUES LESS THAN (TO_DAYS('2015-03-01')),
PARTITION p201503 VALUES LESS THAN (TO_DAYS('2015-04-01')),
PARTITION p201504 VALUES LESS THAN (TO_DAYS('2015-05-01')),
PARTITION p201505 VALUES LESS THAN (TO_DAYS('2015-06-01')),
PARTITION p201506 VALUES LESS THAN (TO_DAYS('2015-07-01')),
PARTITION future VALUES LESS THAN MAXVALUE
)
With that, and the 26 WEEK range, it will hit about 7 partitions. Today that is all the partitions, but in the future, it will continue to be only about 7.
See my partition blog for more details.
Do not have lots of empty partitions for the future; it is a performance hit.
Other comments
I do not know the distribution of code_art, nor whether the IN is typical, so I hessitate to even mention the option of PARTITION BY RANGE(code_art) and use the other INDEX. Oops, it is not possible, since code_art is a VARCHAR and BY RANGE does not work with that datatype.
Caveat: If you have other SELECTs, they need to be considered simultaneously with this one -- Optimization, especially when PARTITIONs are involved, cannot be done one query at a time.
Why use VARCHAR for what looks like numbers?
For further discussion, please include EXPLAIN PARTITIONS SELECT ...\G.