Mysql partition indexing - mysql

I want to create a table from batch data for data mining purposes. I will have about 25 million rows of data a day going into this table. There are several indices defined on the table, so the insertion (I do batch insertions) speed is quite slow. With no indices I can stick 40K rows, while with indices it is more like 3-4 K, which makes this whole thing infeasible. So the idea is to partition the data by day, disable the keys, and then do the day's insertions, and reenable the indices. Reenabling indices on a day's worth of data takes, say, 20 minutes, which is fine. This takes me to my question. When you reenable the indices, will it have to recalculate the indices on all partions, or just for that day? It seems clear that for the index that the partitions are on (date in this case), it should be for that day only. But how about the other indices? If it needs to recalculate the indices for all partitions, there is no way it can be done in a reasonable amount of time. Does anyone know?
Show create is like this:
sts | CREATE TABLE `sts` (
`userid` int(10) unsigned DEFAULT NULL,
`urlid` int(10) unsigned DEFAULT NULL,
`geoid` mediumint(8) unsigned DEFAULT NULL,
`cid` mediumint(8) unsigned DEFAULT NULL,
`m` smallint(5) unsigned DEFAULT NULL,
`t` smallint(5) unsigned DEFAULT NULL,
`d` tinyint(3) unsigned DEFAULT NULL,
`requested` int(10) unsigned DEFAULT NULL,
`rate` tinyint(4) DEFAULT NULL,
`mode` varchar(12) DEFAULT NULL,
`session` smallint(5) unsigned DEFAULT NULL,
`sins` smallint(5) unsigned DEFAULT NULL,
`tos` mediumint(8) unsigned DEFAULT NULL,
PRIMARY KEY (userid, urlid, requested),
KEY `id_index` (`m`),
KEY `id_index2` (`t`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
It is not currently partitioned.

You disable/enable index on a table. It means that index will be disabled/enables on all parts of the table.
Consider this scenario for loading new data:
Create a staging table defining all partitions you will need
Load data into staging table without indexes.
Create indexes on this table.
Move partition to target table, which is partitioned same as staging table.
Drop indexes on staging table
To partition your existing data in controllable manner you can use same logic to move data to new partitioned table.

Related

Mysql Partitioning Query Performance

i have created partitions on pricing table. below is the alter statement.
ALTER TABLE `price_tbl`
PARTITION BY HASH(man_code)
PARTITIONS 87;
one partition consists of 435510 records. total records in price_tbl is 6 million.
EXPLAIN query showing only one partion is used for the query . Still the query takes 3-4 sec to execute. below is the query
EXPLAIN SELECT vrimg.image_cap_id,vm.man_name,vr.range_code,vr.range_name,vr.range_url, MIN(`finance_rental`) AS from_price, vd.der_id AS vehicle_id FROM `range_tbl` vr
LEFT JOIN `image_tbl` vrimg ON vr.man_code = vrimg.man_code AND vr.type_id = vrimg.type_id AND vr.range_code = vrimg.range_code
LEFT JOIN `manufacturer_tbl` vm ON vr.man_code = vm.man_code AND vr.type_id = vm.type_id
LEFT JOIN `derivative_tbl` vd ON vd.man_code=vm.man_code AND vd.type_id = vr.type_id AND vd.range_code=vr.range_code
LEFT JOIN `price_tbl` vp ON vp.vehicle_id = vd.der_id AND vd.type_id = vp.type_id AND vp.product_type_id=1 AND vp.maintenance_flag='N' AND vp.man_code=164
AND vp.initial_rentals_id =(SELECT rental_id FROM `rentals_tbl` WHERE rental_months='9')
AND vp.annual_mileage_id =(SELECT annual_mileage_id FROM `mileage_tbl` WHERE annual_mileage='8000')
WHERE vr.type_id = 1 AND vm.man_url = 'audi' AND vd.type_id IS NOT NULL GROUP BY vd.der_id
Result of EXPLAIN.
Same query without partitioning takes 3-4 sec.
Query with partitioning takes 2-3 sec.
how we can increase query performance as it is too slow yet.
attached create table structure.
price table - This consists 6 million records
CREATE TABLE `price_tbl` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`lender_id` bigint(20) DEFAULT NULL,
`type_id` bigint(20) NOT NULL,
`man_code` bigint(20) NOT NULL,
`vehicle_id` bigint(20) DEFAULT NULL,
`product_type_id` bigint(20) DEFAULT NULL,
`initial_rentals_id` bigint(20) DEFAULT NULL,
`term_id` bigint(20) DEFAULT NULL,
`annual_mileage_id` bigint(20) DEFAULT NULL,
`ref` varchar(255) DEFAULT NULL,
`maintenance_flag` enum('Y','N') DEFAULT NULL,
`finance_rental` decimal(20,2) DEFAULT NULL,
`monthly_rental` decimal(20,2) DEFAULT NULL,
`maintenance_payment` decimal(20,2) DEFAULT NULL,
`initial_payment` decimal(20,2) DEFAULT NULL,
`doc_fee` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`,`type_id`,`man_code`),
KEY `type_id` (`type_id`),
KEY `vehicle_id` (`vehicle_id`),
KEY `term_id` (`term_id`),
KEY `product_type_id` (`product_type_id`),
KEY `finance_rental` (`finance_rental`),
KEY `type_id_2` (`type_id`,`vehicle_id`),
KEY `maintenanace_idx` (`maintenance_flag`),
KEY `lender_idx` (`lender_id`),
KEY `initial_idx` (`initial_rentals_id`),
KEY `man_code_idx` (`man_code`)
) ENGINE=InnoDB AUTO_INCREMENT=5830708 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (man_code)
PARTITIONS 87 */
derivative table - This consists 18k records.
CREATE TABLE `derivative_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`der_cap_code` varchar(20) DEFAULT NULL,
`der_id` bigint(20) DEFAULT NULL,
`body_style_id` bigint(20) DEFAULT NULL,
`fuel_type_id` bigint(20) DEFAULT NULL,
`trans_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`model_code` bigint(20) DEFAULT NULL,
`der_name` varchar(255) DEFAULT NULL,
`der_url` varchar(255) DEFAULT NULL,
`der_intro_year` date DEFAULT NULL,
`der_disc_year` date DEFAULT NULL,
`der_last_spec_date` date DEFAULT NULL,
KEY `der_id` (`der_id`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`),
KEY `range_code` (`range_code`),
KEY `model_code` (`model_code`),
KEY `body_idx` (`body_style_id`),
KEY `capcodeidx` (`der_cap_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
range table - This consists 1k records
CREATE TABLE `range_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`range_name` varchar(255) DEFAULT NULL,
`range_url` varchar(255) DEFAULT NULL,
KEY `range_code` (`range_code`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH is essentially useless if you are hoping for improved performance. BY RANGE is useful in a few use cases_.
In most situations, improvements in indexes are as good as trying to use partitioning.
Some likely problems:
No explicit PRIMARY KEY for InnoDB tables. Add a natural PK, if applicable, else an AUTO_INCREMENT.
No "composite" indexes -- they often provide a performance boost. Example: The LEFT JOIN between vr and vrimg involves 3 columns; a composite index on those 3 columns in the 'right' table will probably help performance.
Blind use of BIGINT when smaller datatypes would work. (This is an I/O issue when the table is big.)
Blind use of 255 in VARCHAR.
Consider whether most of the columns should be NOT NULL.
That query may be a victim of the "explode-implode" syndrome. This is where you do JOIN(s), which create a big intermediate table, followed by a GROUP BY to bring the row-count back down.
Don't use LEFT unless the 'right' table really is optional. (I see LEFT JOIN vd ... vd.type_id IS NOT NULL.)
Don't normalize "continuous" values (annual_mileage and rental_months). It is not really beneficial for "=" tests, and it severely hurts performance for "range" tests.
Same query without partitioning takes 3-4 sec. Query with partitioning takes 2-3 sec.
The indexes almost always need changing when switching between partitioning and non-partitioning. With the optimal indexes for each case, I predict that performance will be close to the same.
Indexes
These should help performance whether or not it is partitioned:
vm: (man_url)
vr: (man_code, type_id) -- either order
vd: (man_code, type_id, range_code, der_id)
-- `der_id` 4th, else in any order (covering)
vrimg: (man_code, type_id, range_code, image_cap_id)
-- `image_cap_id` 4th, else in any order (covering)
vp: (type_id, der_id, product_type_id, maintenance_flag,
initial_rentals, annual_mileage, man_code)
-- any order (covering)
A "covering" index is an extra boost, in that it can do all the work just in the index's BTree, without touching the data's BTree.
Implement a bunch of what I recommend, then come back (in another Question) for further tweaking.
Usually the "partition key" should be last in a composite index.

cannot partition this mysql table

I have an innoDB table named "transaction" with ~1.5 million rows. I would like to partition this table (probably on column "gas_station_id" since it is used a lot in join queries) but I've read in MySQL 5.7 Reference Manual that
All columns used in the table's partitioning expression must be part of every unique key that the table may have, including any primary key.
I have two questions:
The column "gas_station_id" is not part of unique key or primary key. How could I partition this table then?
even if I could partition this table, I am not sure which partitioning type would be better in this case? (I was thinking about LIST partitioning (we have about 40 different(distinct) gas stations) but I am not sure since there will be only one value in each list partition like the following :
ALTER TABLE transaction
PARTITION BY LIST(gas_station_id)
( PARTITION p1 VALUES IN (9001),
PARTITION p2 VALUES IN (9002),.....)
I tried partitioning by KEY, but I receive the following error (I think because id is not part of all unique keys..):
#1053 - a UNIQUE INDEX must include all columns in the table's partitioning function
This is the structure of the "transaction" table:
EDIT
and this is what SHOW CREATE TABLE shows:
CREATE TABLE `transaction` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`terminal_transaction_id` int(11) NOT NULL,
`fuel_terminal_id` int(11) NOT NULL,
`fuel_terminal_serial` int(11) NOT NULL,
`xboard_id` int(11) NOT NULL,
`gas_station_id` int(11) NOT NULL,
`operator_id` varchar(16) NOT NULL,
`shift_id` int(11) NOT NULL,
`xboard_total_counter` int(11) NOT NULL,
`fuel_type` tinyint(2) NOT NULL,
`start_fuel_time` int(11) NOT NULL,
`end_fuel_time` int(11) DEFAULT NULL,
`preset_amount` int(11) NOT NULL,
`actual_amount` int(11) DEFAULT NULL,
`fuel_cost` int(11) DEFAULT NULL,
`payment_cost` int(11) DEFAULT NULL,
`purchase_type` int(11) NOT NULL,
`payment_ref_id` text,
`unit_fuel_price` int(11) NOT NULL,
`fuel_status_id` int(11) DEFAULT NULL,
`fuel_mode_id` int(11) NOT NULL,
`payment_result` int(11) NOT NULL,
`card_pan` varchar(20) DEFAULT NULL,
`state` int(11) DEFAULT NULL,
`totalizer` int(11) NOT NULL DEFAULT '0',
`shift_start_time` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `terminal_transaction_id` (`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`) USING BTREE,
KEY `start_fuel_time_idx` (`start_fuel_time`),
KEY `fuel_terminal_idx` (`fuel_terminal_id`),
KEY `xboard_idx` (`xboard_id`),
KEY `gas_station_id` (`gas_station_id`) USING BTREE,
KEY `purchase_type` (`purchase_type`) USING BTREE,
KEY `shift_start_time` (`shift_start_time`) USING BTREE,
KEY `fuel_type` (`fuel_type`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
Short answer: Don't use PARTITION. Let's see the query to help speed it up.
Long answer:
1.5M rows is only marginally big enough to consider partitioning.
PARTITION BY LIST is probably useless for performance.
You have not given enough info to give you answers other that vague hints. Please provide at least SHOW CREATE TABLE and the slow SELECT.
It is possible to add the partition key onto the end of the PRIMARY or UNIQUE key; you will lose the uniqueness test.
Don't index a low-cardinality column; it won't be used.
More on PARTITION

Alter table to apply partitioning by key in mysql

I have a table with million of rows and the frequency of growth will probably increase in future, so far about 4.3 million rows are added in a month, causing the database to slow down. I have already applied indexing but it's not really optimizing the speed. Is applying Partitioning to such data favorable?
Also how can I apply partitioning on a table with million of rows? I know it will look something like this
ALTER TABLE gpsloggs
PARTITION BY KEY(DeviceCode)
PARTITIONS 10;
The problem is I was Partitioning on DeviceCode which is not a primary key so partitioning isn't permissible.
DROP TABLE IF EXISTS `gpslogss`;
CREATE TABLE `gpslogss` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`DeviceCode` varchar(255) DEFAULT NULL,
`Latitude` varchar(255) DEFAULT NULL,
`Longitude` varchar(255) DEFAULT NULL,
`Speed` double DEFAULT NULL,
`rowStamp` datetime DEFAULT NULL,
`Date` varchar(255) DEFAULT NULL,
`Time` varchar(255) DEFAULT NULL,
`AlarmCode` int(11) DEFAULT NULL,
PRIMARY KEY `Id` (`Id`) USING BTREE,
KEY `DeviceCode` (`DeviceCode`) USING BTREE
);
So I altered the table and made the table in a new database with 0 records this way and it worked fine
DROP TABLE IF EXISTS `gpslogss`;
CREATE TABLE `gpslogss` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`DeviceCode` varchar(255) DEFAULT NULL,
`Latitude` varchar(255) DEFAULT NULL,
`Longitude` varchar(255) DEFAULT NULL,
`Speed` double DEFAULT NULL,
`rowStamp` datetime DEFAULT NULL,
`Date` varchar(255) DEFAULT NULL,
`Time` varchar(255) DEFAULT NULL,
`AlarmCode` int(11) DEFAULT NULL,
KEY `Id` (`Id`) USING BTREE,
KEY `DeviceCode` (`DeviceCode`) USING BTREE
);
PARTITION BY KEY(DeviceCode)
PARTITIONS 10;
How should I render the code so that I can apply partitioning to the table with million of rows? How should I drop keys and alter the table to apply partitioning without damaging data?
Short answer: Don't.
Long answer: PARTITION BY KEY does not provide any performance benefit (that I know of). And why else use PARTITION?
Other notes:
You should use InnoDB for virtually all tables.
InnoDB tables should have an explicit PRIMARY KEY.
There is a DATETIME datatype; don't use VARCHAR for date or time, and don't split them.
latitude and longitude are numeric; don't use VARCHAR. FLOAT is a likely candidate (precise enough to differentiate vehicles, but not people).
Your real question is about speed. Let's see the slow SELECTs and work backward from them. Adding PARTITIONing is rarely a solution to performance.

MYSQL improve performance in a very big table

I have this table
CREATE TABLE llegada (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`emc_id` int(10) unsigned DEFAULT NULL,
`cuartel_id` int(10) unsigned DEFAULT NULL,
`fecha` datetime DEFAULT NULL,
`nro_entrada` int(10) unsigned DEFAULT NULL,
`valor` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ind_llegada` (`emc_id`,`cuartel_id`,`fecha`,`nro_entrada`)
) ENGINE=MyISAM AUTO_INCREMENT=18822145 DEFAULT CHARSET=latin1
This tablet has approximately 100000000 records. And to improve the performance I would like to partition to this table, in 6 parts depending on the year. But first two problems happens I'm not sure how to do it and not know if would modify the queries made to the table. My idea would not have to modify the query page that accesses the database.
Thanks for advance.
I have never heard of a way to partition sql itself without just creating multiple databases. As you said, you'll need to modify the query page or the way information is stored to the various databases on your site, which you should do anyway because that's going to be a lot of wasted processing time. I'm surprised it hasn't already affected your user experience.

Is there a better index to speed up this query?

The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.