Is there a better index to speed up this query? - mysql

The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?

First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.

Related

Mysql Partitioning Query Performance

i have created partitions on pricing table. below is the alter statement.
ALTER TABLE `price_tbl`
PARTITION BY HASH(man_code)
PARTITIONS 87;
one partition consists of 435510 records. total records in price_tbl is 6 million.
EXPLAIN query showing only one partion is used for the query . Still the query takes 3-4 sec to execute. below is the query
EXPLAIN SELECT vrimg.image_cap_id,vm.man_name,vr.range_code,vr.range_name,vr.range_url, MIN(`finance_rental`) AS from_price, vd.der_id AS vehicle_id FROM `range_tbl` vr
LEFT JOIN `image_tbl` vrimg ON vr.man_code = vrimg.man_code AND vr.type_id = vrimg.type_id AND vr.range_code = vrimg.range_code
LEFT JOIN `manufacturer_tbl` vm ON vr.man_code = vm.man_code AND vr.type_id = vm.type_id
LEFT JOIN `derivative_tbl` vd ON vd.man_code=vm.man_code AND vd.type_id = vr.type_id AND vd.range_code=vr.range_code
LEFT JOIN `price_tbl` vp ON vp.vehicle_id = vd.der_id AND vd.type_id = vp.type_id AND vp.product_type_id=1 AND vp.maintenance_flag='N' AND vp.man_code=164
AND vp.initial_rentals_id =(SELECT rental_id FROM `rentals_tbl` WHERE rental_months='9')
AND vp.annual_mileage_id =(SELECT annual_mileage_id FROM `mileage_tbl` WHERE annual_mileage='8000')
WHERE vr.type_id = 1 AND vm.man_url = 'audi' AND vd.type_id IS NOT NULL GROUP BY vd.der_id
Result of EXPLAIN.
Same query without partitioning takes 3-4 sec.
Query with partitioning takes 2-3 sec.
how we can increase query performance as it is too slow yet.
attached create table structure.
price table - This consists 6 million records
CREATE TABLE `price_tbl` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`lender_id` bigint(20) DEFAULT NULL,
`type_id` bigint(20) NOT NULL,
`man_code` bigint(20) NOT NULL,
`vehicle_id` bigint(20) DEFAULT NULL,
`product_type_id` bigint(20) DEFAULT NULL,
`initial_rentals_id` bigint(20) DEFAULT NULL,
`term_id` bigint(20) DEFAULT NULL,
`annual_mileage_id` bigint(20) DEFAULT NULL,
`ref` varchar(255) DEFAULT NULL,
`maintenance_flag` enum('Y','N') DEFAULT NULL,
`finance_rental` decimal(20,2) DEFAULT NULL,
`monthly_rental` decimal(20,2) DEFAULT NULL,
`maintenance_payment` decimal(20,2) DEFAULT NULL,
`initial_payment` decimal(20,2) DEFAULT NULL,
`doc_fee` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`,`type_id`,`man_code`),
KEY `type_id` (`type_id`),
KEY `vehicle_id` (`vehicle_id`),
KEY `term_id` (`term_id`),
KEY `product_type_id` (`product_type_id`),
KEY `finance_rental` (`finance_rental`),
KEY `type_id_2` (`type_id`,`vehicle_id`),
KEY `maintenanace_idx` (`maintenance_flag`),
KEY `lender_idx` (`lender_id`),
KEY `initial_idx` (`initial_rentals_id`),
KEY `man_code_idx` (`man_code`)
) ENGINE=InnoDB AUTO_INCREMENT=5830708 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (man_code)
PARTITIONS 87 */
derivative table - This consists 18k records.
CREATE TABLE `derivative_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`der_cap_code` varchar(20) DEFAULT NULL,
`der_id` bigint(20) DEFAULT NULL,
`body_style_id` bigint(20) DEFAULT NULL,
`fuel_type_id` bigint(20) DEFAULT NULL,
`trans_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`model_code` bigint(20) DEFAULT NULL,
`der_name` varchar(255) DEFAULT NULL,
`der_url` varchar(255) DEFAULT NULL,
`der_intro_year` date DEFAULT NULL,
`der_disc_year` date DEFAULT NULL,
`der_last_spec_date` date DEFAULT NULL,
KEY `der_id` (`der_id`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`),
KEY `range_code` (`range_code`),
KEY `model_code` (`model_code`),
KEY `body_idx` (`body_style_id`),
KEY `capcodeidx` (`der_cap_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
range table - This consists 1k records
CREATE TABLE `range_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`range_name` varchar(255) DEFAULT NULL,
`range_url` varchar(255) DEFAULT NULL,
KEY `range_code` (`range_code`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH is essentially useless if you are hoping for improved performance. BY RANGE is useful in a few use cases_.
In most situations, improvements in indexes are as good as trying to use partitioning.
Some likely problems:
No explicit PRIMARY KEY for InnoDB tables. Add a natural PK, if applicable, else an AUTO_INCREMENT.
No "composite" indexes -- they often provide a performance boost. Example: The LEFT JOIN between vr and vrimg involves 3 columns; a composite index on those 3 columns in the 'right' table will probably help performance.
Blind use of BIGINT when smaller datatypes would work. (This is an I/O issue when the table is big.)
Blind use of 255 in VARCHAR.
Consider whether most of the columns should be NOT NULL.
That query may be a victim of the "explode-implode" syndrome. This is where you do JOIN(s), which create a big intermediate table, followed by a GROUP BY to bring the row-count back down.
Don't use LEFT unless the 'right' table really is optional. (I see LEFT JOIN vd ... vd.type_id IS NOT NULL.)
Don't normalize "continuous" values (annual_mileage and rental_months). It is not really beneficial for "=" tests, and it severely hurts performance for "range" tests.
Same query without partitioning takes 3-4 sec. Query with partitioning takes 2-3 sec.
The indexes almost always need changing when switching between partitioning and non-partitioning. With the optimal indexes for each case, I predict that performance will be close to the same.
Indexes
These should help performance whether or not it is partitioned:
vm: (man_url)
vr: (man_code, type_id) -- either order
vd: (man_code, type_id, range_code, der_id)
-- `der_id` 4th, else in any order (covering)
vrimg: (man_code, type_id, range_code, image_cap_id)
-- `image_cap_id` 4th, else in any order (covering)
vp: (type_id, der_id, product_type_id, maintenance_flag,
initial_rentals, annual_mileage, man_code)
-- any order (covering)
A "covering" index is an extra boost, in that it can do all the work just in the index's BTree, without touching the data's BTree.
Implement a bunch of what I recommend, then come back (in another Question) for further tweaking.
Usually the "partition key" should be last in a composite index.

Alter table to apply partitioning by key in mysql

I have a table with million of rows and the frequency of growth will probably increase in future, so far about 4.3 million rows are added in a month, causing the database to slow down. I have already applied indexing but it's not really optimizing the speed. Is applying Partitioning to such data favorable?
Also how can I apply partitioning on a table with million of rows? I know it will look something like this
ALTER TABLE gpsloggs
PARTITION BY KEY(DeviceCode)
PARTITIONS 10;
The problem is I was Partitioning on DeviceCode which is not a primary key so partitioning isn't permissible.
DROP TABLE IF EXISTS `gpslogss`;
CREATE TABLE `gpslogss` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`DeviceCode` varchar(255) DEFAULT NULL,
`Latitude` varchar(255) DEFAULT NULL,
`Longitude` varchar(255) DEFAULT NULL,
`Speed` double DEFAULT NULL,
`rowStamp` datetime DEFAULT NULL,
`Date` varchar(255) DEFAULT NULL,
`Time` varchar(255) DEFAULT NULL,
`AlarmCode` int(11) DEFAULT NULL,
PRIMARY KEY `Id` (`Id`) USING BTREE,
KEY `DeviceCode` (`DeviceCode`) USING BTREE
);
So I altered the table and made the table in a new database with 0 records this way and it worked fine
DROP TABLE IF EXISTS `gpslogss`;
CREATE TABLE `gpslogss` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`DeviceCode` varchar(255) DEFAULT NULL,
`Latitude` varchar(255) DEFAULT NULL,
`Longitude` varchar(255) DEFAULT NULL,
`Speed` double DEFAULT NULL,
`rowStamp` datetime DEFAULT NULL,
`Date` varchar(255) DEFAULT NULL,
`Time` varchar(255) DEFAULT NULL,
`AlarmCode` int(11) DEFAULT NULL,
KEY `Id` (`Id`) USING BTREE,
KEY `DeviceCode` (`DeviceCode`) USING BTREE
);
PARTITION BY KEY(DeviceCode)
PARTITIONS 10;
How should I render the code so that I can apply partitioning to the table with million of rows? How should I drop keys and alter the table to apply partitioning without damaging data?
Short answer: Don't.
Long answer: PARTITION BY KEY does not provide any performance benefit (that I know of). And why else use PARTITION?
Other notes:
You should use InnoDB for virtually all tables.
InnoDB tables should have an explicit PRIMARY KEY.
There is a DATETIME datatype; don't use VARCHAR for date or time, and don't split them.
latitude and longitude are numeric; don't use VARCHAR. FLOAT is a likely candidate (precise enough to differentiate vehicles, but not people).
Your real question is about speed. Let's see the slow SELECTs and work backward from them. Adding PARTITIONing is rarely a solution to performance.

Why would this simple MySQL update query take so long?

My host has been sending me messages over the last few months saying that my site is using way too many MySQL minutes. They also send some logs showing which queries use up the most time on occasion. Some of the queries are kind of long and complicated, so I understand why they would be an issue. But a few have me scratching my head. The one I want to focus on next is this:
UPDATE parentmessages SET views=views+1 WHERE parentid='11308'
The number is just an example, it could be any parentid. The parentmessages table has parentid as the primary key, so I would think it would be indexed and easily found. There are about 11,000 records in the table, which is not really that many. Here are the numbers my host gave me for how long this query took over 6 instances yesterday:
Taking 0.126455 , 1.472929 , 1.638743 , 3.040538 , 7.130041 , 112.498037 seconds to complete
The 112 could be a random glitch I suppose, but why would it take 3, 7 seconds sometimes?! My best bet is because I have a lot of indices on the table but I don't know enough about MySQL to know if that would matter. And why would it sometimes be 1/10th of a second and sometimes many seconds?
Here is the show create table:
CREATE TABLE `parentmessages` (
`parentid` int(7) NOT NULL AUTO_INCREMENT,
`active` tinyint(1) NOT NULL,
`level` int(2) NOT NULL,
`type` varchar(10) NOT NULL,
`hidden` tinyint(1) DEFAULT NULL,
`sticky` tinyint(1) NOT NULL,
`poll` tinyint(1) NOT NULL,
`topic` varchar(120) DEFAULT NULL,
`message` varchar(30000) NOT NULL,
`views` int(6) NOT NULL,
`replies` int(5) NOT NULL,
`userid` int(7) NOT NULL,
`datetimecalc` int(11) NOT NULL,
`lastreplycalc` int(11) NOT NULL,
`lastreplyuser` int(7) NOT NULL,
`editedcalc` int(11) DEFAULT NULL,
`editeduser` int(7) DEFAULT NULL,
`realediteduser` int(7) DEFAULT NULL,
`altint` int(7) DEFAULT NULL,
`imageurl` varchar(125) DEFAULT NULL,
`locked` tinyint(1) NOT NULL,
`tempid` int(12) NOT NULL,
PRIMARY KEY (`parentid`),
KEY `useridindex` (`userid`),
KEY `datetimecalcindex` (`datetimecalc`),
KEY `activeindex` (`active`),
KEY `lastreplycalcindex` (`lastreplycalc`),
KEY `levelindex` (`level`),
KEY `stickyindex` (`sticky`)
) ENGINE=MyISAM AUTO_INCREMENT=11716 DEFAULT CHARSET=latin1
One reason could be, that another slow query is blocking the table and your update is just waiting for the other query to finish.
Don't use MyISAM. I forget who said it, maybe PeterZ but "using myisam means you don't care about your data". The easiest way is to check for table locking is to look at the processlist. Dumps, inserts, updates etc will all lock the table. MyISAM is all but deprecated in 5.6 for good reason.

Keys are reported as redundant, but are they really?

I am trying to optimize the schemas for tables by removing redundant keys. Both the percona toolkit and common_schema tell me that the following key is redundant:
mysql> SELECT redundant_index_name, sql_drop_index FROM redundant_keys;
+----------------------+-------------------------------------------------------------------------------+
| redundant_index_name | sql_drop_index |
+----------------------+-------------------------------------------------------------------------------+
| deviceName | ALTER TABLE `reporting`.`tbCardData` DROP INDEX `deviceName` |
+----------------------+-------------------------------------------------------------------------------+
1 rows in set (0.18 sec)
mysql> show create table `reporting`.`tbCardData`;
CREATE TABLE `tbCardData` (
`pkCardDataId` bigint(12) unsigned NOT NULL AUTO_INCREMENT,
`deviceName` varchar(64) DEFAULT NULL,
`shelfId` smallint(3) unsigned DEFAULT NULL,
`cardId` smallint(3) unsigned DEFAULT NULL,
`cardName` varchar(64) DEFAULT NULL,
`cardType` smallint(3) unsigned DEFAULT NULL,
`cardSubType` smallint(3) unsigned DEFAULT NULL,
`cardSpareGroupId` smallint(3) unsigned DEFAULT NULL,
`cardSerialNum` varchar(64) DEFAULT NULL,
`cardCarrierSerialNum` varchar(64) DEFAULT NULL,
`dom` tinyint(2) unsigned NOT NULL DEFAULT '0',
`updateTime` int(11) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`pkCardDataId`),
UNIQUE KEY `devchascarddom` (`deviceName`,`shelfId`,`cardId`,`dom`),
KEY `deviceName` (`deviceName`),
KEY `dom` (`dom`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I understand that deviceName Key and the unique key devchascarddom share the leftmost attribute, deviceName, but it would seem to me that the Unique Key occurs once, whereas there are several deviceNames in the list. I guess what i am saying is, dropping the Key deviceName doesn't seem to make sense to me here, but i am no mysql guru -- should i drop it or is this just the way these tools are reporting back to me that i'll have to discard?
MySQL can use the first part of the compound index devchascarddom in the same way it can use deviceName. These tools are telling you the truth. The deviceName index will be smaller, and if you can get rid of devchascarddom instead, that would be better. You'll have to look at the EXPLAIN output for your queries to see if that's possible.
It's saying if it needed an index on devicename, it would use the compound unique key devchascarddom to get it, presumably because that index would itself be indexed by devicename.
Mind you I've no idea whether that is true, but it would make more sense than replicating each member.
e.g.
Device1
Shelf1
Card1
dom1
dom2
etc.
Now if you had an index on Shelf it wouldn't say it was redundant.

How do fields not selected for in a MySQL query affect query speed for the fields I am selecting on?

This is a theoretical question based on an application I have. I am wondering if there is some technical insight to be gained beyond just speed tests on my system.
I have the following two tables:
CREATE TABLE `files` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) NOT NULL DEFAULT '',
`processed` tinyint(1) unsigned NOT NULL DEFAULT '0',
`last_processed` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
and...
CREATE TABLE `file_metas` (
`file_id` int(10) unsigned NOT NULL,
`title` varchar(255) NOT NULL DEFAULT '',
`description` varchar(1000) NOT NULL DEFAULT '',
`keywords` varchar(1000) NOT NULL DEFAULT '',
PRIMARY KEY (`file_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The file_metas data is long text strings about each file from the files table. Each file only has one entry in the file_metas table so these two tables could be combined.
I'm wondering what affect adding the long text fields to the files table will have on the performance of select statements done on the files table when I'm not selecting for title, description, or keywords. I'm curious about the technical details. Does simply having the text fields in the table slow queries not involving those fields? How does this work in general with MySQL MyISAM tables? Is there any good reason to keep the file_metas data in a separate table?