i have a very simple query that im trying to optimize, its taking 2~5 secs to execute.
This is my CREATE TABLE
CREATE TABLE `artist` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) character set utf8 NOT NULL,
`bio` MEDIUMTEXT character set utf8 DEFAULT NULL,
`hits` INTEGER NOT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `album` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`artist_id` INTEGER NOT NULL,
`title` VARCHAR(100) character set utf8 NOT NULL,
`year` INTEGER,
`hits` INTEGER NOT NULL,
PRIMARY KEY (`id`),
KEY (`artist_id`)
);
CREATE TABLE `track` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) character set utf8 NOT NULL,
`lyric` MEDIUMTEXT character set utf8,
`album_id` INTEGER NOT NULL,
`hits` INTEGER NOT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY (`album_id`)
);
ALTER TABLE `album` ADD FOREIGN KEY (artist_id) REFERENCES `artist` (`id`);
ALTER TABLE `track` ADD FOREIGN KEY (album_id) REFERENCES `album` (`id`);
and this is the query im running
SELECT DISTINCT artist.name, track.name
FROM track
LEFT JOIN album ON track.album_id = album.id
LEFT JOIN artist ON album.artist_id = artist.id
ORDER BY track.hits DESC
LIMIT 5
Explain selects show this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE track ALL NULL NULL NULL NULL 103796 Using temporary; Using filesort
1 SIMPLE album eq_ref PRIMARY PRIMARY 4 lyrics.track.album_id 1
1 SIMPLE artist eq_ref PRIMARY PRIMARY 4 lyrics.album.artist_id 1
I'm new to MySQL but i guess using Using temporary; Using filesort is bad and thats why the query is very slow, can you guys hint me here? thanks!
update: The main problem here is that the very same song can be 5 times in the DB with different ID's, because the same song can be in different albums. If i dont use distinct, this doesnt happen, bust i must for this reason
This answer isn't 100% an answer for the original question. The original question is what came up when searching using the messages from my problem though, so just in case it helps someone else, I'll leave the solution for a problem that is closely related.
The "using temporary; using filesort" was actually a red herring and the index that was added was never getting used. The index was not getting used because one of the joined tables had a different character encoding on it than the other.
Converting all tables in the query so that they all used the same character encoding fixed it instantly.
(In our case converting a utf8 encoded table to a latin1 encoding)
Hope it helps someone.
You can get it to use an index by adding
create index idx_tracks_on_album_id_name_hits on track(album_id, name, hits);
And since you are doing a DISTINCT across two tables, there will be no index to possibly find the unique rows so it puts it into a temp table to get rid of the duplicates.
I think if you create an index on track.hits, you might get rid of "using temporary; using filesort", the reason for which might be because MySQL cannot find an index to do the sort.
ALTER TABLE `track`
ADD KEY `idx_hits` (`hits`);
Let me know if it worked.
why do you use DISTINCT? why do you use LEFT JOIN (insted of JOIN)?
Related
The problem:
I have 1 table of aprox 5000 rows called imported_cities
I have 1 table of aprox 800 000 rows called postal_codes containing postal codes cities
I need to validate each distinct city from imported_cities against the cities in postal codes table based on city name and its province. See tables structure below.
If they match exactly (yes, exactly. The rest of cities are manually validated) I have to update a column on imported_city and
enter both city from imported_cities and city from postal_codes (side by side) into a third table called imported_cities_equiv
What I have tried:
Adding indexes to tables and make query below. It takes forever... :(
explain SELECT DISTINCT ic.destinationCity, pc.city FROM (imported_cities ic, postalcodes pc)
WHERE LOWER(ic.destinationCity) = LOWER(pc.city)
the result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ip index NULL company_city 478 NULL 4221 Using index; Using temporary
1 SIMPLE pc index NULL city_prov 160 NULL 765407 Using where; Using index; Using join buffer (Block...
--
-- Table structure for table postalcodes
CREATE TABLE IF NOT EXISTS `postalcodes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(11) NOT NULL,
`city` varchar(50) NOT NULL,
`province` varchar(50) NOT NULL,
`provinceISO` varchar(2) NOT NULL,
`latitude` decimal(17,13) NOT NULL,
`longitude` decimal(17,13) NOT NULL,
PRIMARY KEY (`id`),
KEY `code` (`code`),
KEY `city_prov` (`city`,`provinceISO`)
--
-- Table structure for table imported_cities
CREATE TABLE IF NOT EXISTS `imported_cities` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`companyName` varchar(30) CHARACTER SET utf8 NOT NULL,
`destinationCity` varchar(128) CHARACTER SET utf8 NOT NULL,
`destinationProvince` varchar(20) CHARACTER SET utf8 NOT NULL,
`equivCity` varchar(128) CHARACTER SET utf8 DEFAULT NULL,
`minAmount` decimal(6,2) NOT NULL
PRIMARY KEY (`id`),
KEY `company_city` (`companyName`,`destinationCity`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=7933 ;
--
-- Table structure for table imported_cities_equiv
CREATE TABLE IF NOT EXISTS `imported_cities_equiv` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`imported_city` varchar(128) CHARACTER SET utf8 NOT NULL,
`pc_city` varchar(128) CHARACTER SET utf8 NOT NULL,
`province` varchar(20) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=149 ;
Any help or suggestion is appreciated. Thank you.
The query you want to get your information is:
SELECT ip.*, (pc.city is not null) as exact match
FROM imported_prices ip left join
postalcodes pc
on LOWER(ip.destinationCity) = LOWER(pc.city) and
lower(ip.province) = lower(pc.province);
However, this will have really bad performance. Getting rid of the lower() would help:
SELECT ip.*, (pc.city is not null) as exact match
FROM imported_prices ip left join
postalcodes pc
on(ip.destinationCity) =(pc.city) and
(ip.province) = (pc.province);
Because then you can add an index on postalcodes(city, province).
If you cannot use remove lower(), then alter the table to add new columns and put the lower-case values in those columns. Then build an index on the new columns and use them in the join.
Thank you all for pointing me on the right direction.
Some changes have been made following your advices:
added indexes on imported_cities table on destinationCity and destinationProvince columns
added indexes on postalcodes table on city and provinceISO columns
JOIN clause have only one side upper since the field ic.destinationCity is already in uppercase
limit query by province on WHERE for performance
The final SQL is:
SELECT DISTINCT pc.city, pc.provinceISO
FROM postalcodes pc
LEFT JOIN imported_cities ic
ON upper(pc.city) = ic.destinationCity AND
pc.provinceISO = ic.destinationProvince
WHERE ic.destinationProvince = 'QC';
AND the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pc ref province province 8 const 278115 Using index condition; Using temporary
1 SIMPLE ip ref destinationCity,destinationProvince destinationCity 386 func 1 Using index condition; Using where; Distinct
Going forward I can now construct the INSERT query on PHP and make one INSERT query to insert all equivalent cities on the 3rd table. Thank you all.
Sorry fot long post but this is really strange and I am close to give it up. 2 tables:
CREATE TABLE `endu_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
`base_yob` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_results_206a6355` (`base_name`),
KEY `endu_results_63df4402` (`base_nr`),
KEY `base_yob` (`base_yob`)
) ENGINE=InnoDB AUTO_INCREMENT=3424028 DEFAULT CHARSET=utf8;enter code here
and 2nd:
CREATE TABLE `endu_resultinterest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`result_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `endu_resultinterest_3b529087` (`result_id`),
CONSTRAINT `result_id_refs_id_19e24435` FOREIGN KEY (`result_id`) REFERENCES `endu_results` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=48590 DEFAULT CHARSET=utf8;
There are about 2mln records in endu_resultstable and less then 100K i endu_resultinterest. I have slow query:
explain select base_yob from endu_resultinterest
inner join endu_results
on (endu_results.id = endu_resultinterest.result_id)
order by endu_results.base_yob;
1 SIMPLE endu_resultinterest index endu_resultinterest_3b529087 endu_resultinterest_3b529087 4 NULL 47559 Using index; Using temporary; Using filesort
The question is: Why mysql is using this index: endu_resultinterest_3b529087 - but it should use base_yob - this is where sorting is requested ?
To test it further I have manaully created 2 additional identical tables endu_testresults and endu_testresultintrest and filled those with some records:
CREATE TABLE `endu_testresults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_yob` int(11) DEFAULT NULL,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_testresults_a65b2616` (`base_yob`),
KEY `endu_testresults_ba0ab39c` (`base_name`),
KEY `endu_testresults_d75ba04d` (`base_nr`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8;
So I go again for explain:
explain select base_yob from endu_testresultinterest
inner join endu_testresults
on (endu_testresults.id = endu_testresultinterest.result_id)
order by endu_testresults.base_yob;
and suprise suprise:
1 SIMPLE endu_testresults index PRIMARY endu_testresults_a65b2616 5 NULL 19 Using index
Index sort column base_yob (endu_testresults_a65b2616) is now used.
Why is that in one case index is used and in other I got 'using filesort;using temporary ? Does size matters ? I will try to copy records from one to another but do not get it with indexes. MySql is 5.6.16
Short answer: Because it is faster.
Long answer...
Your EXPLAINs seem to be incomplete -- I would expect 2 lines in each.
The first table is 20 (70?) times as big as the second. The optimizer picked the smaller table to start with. Hence it is initially doing 1/20th the amount of work. The sort that comes later (ORDER BY ...) is much less work than if it had to do 20 times as much work to start with.
The output is only 48K rows, correct? And that is how many rows in the 2nd table, correct?
Your test tables did not have the same bigger/smaller ratio, did they? Hence the different EXPLAIN.
Someone helped me come up with this query but its still too slow; The order by is slowing it down and I dont think its using my index
I'm hoping someone can fix it for me :D Yes I read the manual page but I can't understand it.
Query:
EXPLAIN SELECT u.id, u.url, u.title, u.numsaves
FROM urls u
JOIN tags t ON t.url_id = u.id
AND t.tag = 'osx'
ORDER BY u.numsaves DESC
LIMIT 20
Showing rows 20 - 19 ( 20 total, Query took 1.5395 sec) [numsaves: 6130 - 2107]
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ref tag_id tag_id 767 const 49432 Using where; Using index; Using temporary; Using filesort
1 SIMPLE u eq_ref PRIMARY,id_numsaves_IX PRIMARY 4 jcooper_whatrethebest_urls.t.url_id 1
Database:
CREATE TABLE `urls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`domain` text,
`title` text NOT NULL,
`description` text,
`numsaves` int(11) NOT NULL,
`firstsaved` varchar(256) DEFAULT NULL,
`md5` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `id_numsaves_IX` (`id`,`numsaves`)
) ENGINE=InnoDB AUTO_INCREMENT=2958560 DEFAULT CHARSET=utf8
CREATE TABLE `tags` (
`url_id` int(11) DEFAULT NULL,
`hash` varchar(255) NOT NULL,
`tag` varchar(255) NOT NULL,
UNIQUE KEY `tag_id` (`tag`,`url_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I think the main problem with your query is your choice of indexes.
1) tags has a compound UNIQUE KEY on tag and url_id but no PRIMARY KEY.
If nothing else, you should make it primary - this may help a bit with performance. Also, you might want to take a close look if VARCHAR(255) is really necessary for your tags. It makes the index quite big.
2) add a separate index on numsaves since you're ordering by that. The compound index on id and numsaves is not going to help here.
3) EXPLAIN says that you have 49432 rows in tags that match "osx". This is quite redundant. You may want to split your tags table into two, one containing the text while the other contains the N:M link to urls.
Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.
This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!
The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/