MySQL simple join using temporary table - mysql

I've got (what I thought) was a very simple query on a MySQL database, but using explain shows that the query is using a temporary table. I've tried changing the order of the select and the join but to no avail. I've reduced the tables to their simplest (to see if it was an issue with the complexity of my tables, but I still have the problem). I've tried it with two basic tables, one with a "name" field, and the other with a foreign key reference back to that table:
a:
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(128) | NO | MUL | NULL | |
+-------+--------------+------+-----+---------+----------------+
b:
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| a_id | int(11) | NO | MUL | NULL | |
+-------+---------+------+-----+---------+----------------+
And this is my query:
SELECT a.id, a.name FROM a JOIN b ON a.id = b.a_id ORDER BY a.name;
Which I thought was very simple... just a list of all records in a that have records in b, ordered by name. Alas explain says this:
+----+-------------+-------+--------+---------------+---------+---------+--------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+--------+------+----------------------------------------------+
| 1 | SIMPLE | b | index | a_id | a_id | 4 | NULL | 2 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | b.a_id | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+--------+------+----------------------------------------------+
It looks like it should be using the key on the b table but for some reason it isn't. I get the feeling I'm missing something basic (or my RDBMS knowledge needs brushing up somewhat). Anyone any ideas why it's using a temporary table for such a simple query?

Related

How to optimize mysql query with COUNT(*) and GROUP BY

I have the original table of 4 columns, described as follows:
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| FieldID | varchar(10) | NO | MUL | NULL | |
| PaperID | varchar(10) | NO | | NULL | |
| RefID | varchar(10) | NO | | NULL | |
| FieldID2 | varchar(10) | NO | MUL | NULL | |
+----------+-------------+------+-----+---------+-------+
I want to run a query with COUNT(*) and GROUP BY :
select FieldID, FieldID2, count(*) from nFPRF75_1 GROUP BY FieldID, FieldID2
I've created indexes on both column FieldID and column FieldID2, however, they seem to be ineffective. I have also tried OPTIMIZE table_name and created redundant indexes on these two columns (as is indicated by other optimization questions), unfortunately it didn't work out either.
Here is what I get from EXPLAIN:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+----------+---------------------------------+
| 1 | SIMPLE | nFPRF75_1 | ALL | NULL | NULL | NULL | NULL | 90412507 | Using temporary; Using filesort |
I wonder if there's anyway that I can use indexes in this query, or any other way to optimize it. Now it's of very low efficiency since there's lots of lines.
Thanks a lot for the help!
You should create a multi-column index of (FieldID, FieldID2).
Create an index of FieldID, FieldID2 if you are grouping by them. That must improve the speed.
Also, I recommend you change count(*) to count('myIntColumn') which improve the speed too.

MySQL second abstraction join slows query markedly

When I add a second level of abstraction to a join, ie joining on a table that was join in the first place, the query time is be multiplied by 1000x
mysql> SELECT xxx.content.id, columns.stories.title, columns.stories.date
-> FROM xxx.content
-> LEFT JOIN columns.stories on xxx.content.contentable_id = columns.stories.id
-> WHERE columns.stories.title IS NOT NULL
-> AND xxx.content.contentable_type = 'PublisherStory';
yields results in 0.01 seconds
mysql> SELECT xxx.content.id, columns.photos.id as pid, columns.stories.title, columns.stories.date
-> FROM xxx.content
-> LEFT JOIN columns.stories on xxx.content.contentable_id = columns.stories.id
-> LEFT JOIN columns.photos on columns.stories.id = columns.photos.story_id
-> WHERE columns.stories.title IS NOT NULL
-> AND xxx.content.contentable_type = 'PublisherStory';
yields results in 14 seconds
this is being performed on tables with records in the 10ks to low 100ks of records
Is this normal or what could be causing such a slowdown?
first query plan:
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+------+-------------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 7099 | Using where |
| 1 | SIMPLE | stories | eq_ref | PRIMARY | PRIMARY | 8 | xxx.content.contentable_id | 1 | Using where |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+------+-------------+
Second Query
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+-------+-------------+
| 1 | SIMPLE | content | ALL | NULL | NULL | NULL | NULL | 7099 | Using where |
| 1 | SIMPLE | stories | eq_ref | PRIMARY | PRIMARY | 8 | xxx.content.contentable_id | 1 | Using where |
| 1 | SIMPLE | photos | ALL | NULL | NULL | NULL | NULL | 21239 | |
+----+-------------+---------+--------+---------------+---------+---------+--------------------------------------+-------+-------------+
if joining columns.photos slows down so much, it could be because :
photos.story_id is not a foreign key from stories and there is no index on that column.
Not seing your table structure, I could no tell exactly but I suggest you to
verify that photos.story_id is a foreign key and if your mysql version does not
support foreign key ( pretty old) put an index on that column.
I would check indexes on contentable_type and stories.title also as they are
part of the where close.

Comparing performance of two queries in MySQL

I am trying to optimize a query on a mysql table I've created. I expect that there will be many many rows in the table. Looking at this question the accepted answer and the top voted answer suggests two different approaches.
I wrote these two queries and want to know which one is more performant.
SELECT uv.*
FROM UserVisit uv INNER JOIN
(SELECT ID,MAX(visitDate) visitDate
FROM UserVisit GROUP BY ID) last
ON (uv.ID = last.ID AND uv.visitDate = last.visitDate);
Running this with EXPLAIN yields:
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | |
| 1 | PRIMARY | uv | eq_ref | PRIMARY | PRIMARY | 11 | last.playscanID,last.visitDate | 1 | |
| 2 | DERIVED | UserVisit | index | NULL | PRIMARY | 11 | NULL | 4 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+--------------------------------+------+-------------+
3 rows in set (0.01 sec)
And the other query:
SELECT lastVisits.*
FROM ( SELECT * FROM UserVisit ORDER BY visitDate DESC ) lastVisits
GROUP BY lastVisits.ID
Running that with EXPLAIN yields:
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 2 | DERIVED | UserVisit | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)
I am uncertain how to interpret the result of the two EXPLAINs.
Which of these queries can I expect to be faster and why?
EDIT:
This is the way UserVisit table looks:
+----------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| ID | bigint(20) unsigned | NO | PRI | NULL | |
| visitDate | date | NO | PRI | NULL | |
| visitTime | time | NO | | NULL | |
| analysisResult | decimal(3,2) | NO | | NULL | |
+----------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Firstly, you might want to read the manual on EXPLAIN. It's a dense read, but it should provide most of the information you want.
Secondly, as Strawberry says, the second query works by accident. The behaviour may change in future versions, and your query would not return an error, just different data. That's nearly always a bad thing.
Finally, the EXPLAIN suggests that version 1 will be faster. In EXTRA, it's saying it's using an index, which is much faster than filesort. Without a schema, it's hard to be sure, but I think you will also benefit from a compound key on ID and visitdate.

MySQL index slowing down query

MySQL Server version: 5.0.95
Tables All: InnoDB
I am having an issue with a MySQL db query. Basically I am finding that if I index a particular varchar(50) field tag.name, my queries take longer (x10) than not indexing the field. I am trying to speed this query up, however my efforts seem to be counter productive.
The culprit line and field seems to be:
WHERE `t`.`name` IN ('news','home')
I have noticed that if i query the tag table directly without a join using the same criteria and with the name field indexed, i do not have the issue.. It actually works faster as expected.
EXAMPLE Query **
SELECT `a`.*, `u`.`pen_name`
FROM `tag_link` `tl`
INNER JOIN `tag` `t`
ON `t`.`tag_id` = `tl`.`tag_id`
INNER JOIN `article` `a`
ON `a`.`article_id` = `tl`.`link_id`
INNER JOIN `user` `u`
ON `a`.`user_id` = `u`.`user_id`
WHERE `t`.`name` IN ('news','home')
AND `tl`.`type` = 'article'
AND `a`.`featured` = 'featured'
GROUP BY `article_id`
LIMIT 0 , 5
EXPLAIN with index **
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY,name | name | 152 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | tag_id | 4 | portal.t.tag_id | 10 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,fk_article_user1 | PRIMARY | 4 | portal.tl.link_id | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
EXPLAIN without index **
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,fk_article_user1 | PRIMARY | 4 | NULL | 8742 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | link_id | 4 | portal.a.article_id | 3 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | portal.tl.tag_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
TABLE CREATE
CREATE TABLE `tag` (
`tag_id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`type` enum('layout','image') NOT NULL,
`create_dttm` datetime default NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=43077 DEFAULT CHARSET=utf8
INDEXS
SHOW INDEX FROM tag_link;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tag_link | 0 | PRIMARY | 1 | tag_link_id | A | 42023 | NULL | NULL | | BTREE | |
| tag_link | 1 | tag_id | 1 | tag_id | A | 10505 | NULL | NULL | | BTREE | |
| tag_link | 1 | link_id | 1 | link_id | A | 14007 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM article;
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| article | 0 | PRIMARY | 1 | article_id | A | 5723 | NULL | NULL | | BTREE | |
| article | 1 | fk_article_user1 | 1 | user_id | A | 1 | NULL | NULL | | BTREE | |
| article | 1 | create_dttm | 1 | create_dttm | A | 5723 | NULL | NULL | YES | BTREE | |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Final Solution
It seems that MySQL is just sorted the data incorrectly. In the end it turned out faster to look at the tag table as a sub query returning the ids.
It seems that article_id is the primary key for the article table.
Since you're grouping by article_id, MySQL needs to return the records in order by that column, in order to perform the GROUP BY.
You can see that without the index, it scans all records in the article table, but they're at least in order by article_id, so no later sort is required. The LIMIT optimization can be applied here, since it's already in order, it can just stop after it gets five rows.
In the query with the index on tag.name, instead of scanning the entire articles table, it utilizes the index, but against the tag table, and starts there. Unfortunately, when doing this, the records must later be sorted by article.article_id in order to complete the GROUP BY clause. The LIMIT optimization can't be applied since it must return the entire result set, then order it, in order to get the first 5 rows.
In this case, MySQL just guesses wrongly.
Without the LIMIT clause, I'm guessing that using the index is faster, which is maybe what MySQL was guessing.
How big are your tables?
I noticed in the first explain you have a "Using temporary; Using filesort" which is bad. Your query is likely being dumped to disc which makes it way slower than in memory queries.
Also try to avoid using "select *" and instead query the minimum fields needed.

MySQL: How do I speed up a "Count()" query with a "JOIN" and "order_by"?

I have the following two (simplified for the sake of example) tables in my MySQL db:
DESCRIBE appname_item;
-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(200) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
DESCRIBE appname_favorite;
+---------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | NULL | |
| item_id | int(11) | NO | MUL | NULL | |
+---------------+----------+------+-----+---------+----------------+
I'm trying to get a list of items ordered by the number of favorites. The query below works, however there are thousands of records in the Item table, and the query is taking up to a couple of minutes to complete.
SELECT `appname_item`.`id`, `appname_item`.`name`, COUNT(`appname_favorite`.`id`) AS `num_favorites`
FROM `appname_item`
LEFT OUTER JOIN `appname_favorite` ON (`appname_item`.`id` = `appname_favorite`.`item_id`)
GROUP BY `appname_item`.`id`, `appname_item`.`name`
ORDER BY `num_favorites` DESC;
Here are the results of EXPLAIN, which provides some insight as to why the query is so slow (type "ALL", "using temporary", and "using filesort" should all be avoided if possible.)
+----+-------------+--------------------+------+-----------------------------+-----------------------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+------+-----------------------------+-----------------------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | appname_item | ALL | NULL | NULL | NULL | NULL | 574 | Using temporary; Using filesort |
| 1 | SIMPLE | appname_favorite | ref | appname_favorite_67b70d25 | appname_favorite_67b70d25 | 4 | appname.appname_item.id | 1 | |
+----+-------------+--------------------+------+-----------------------------+-----------------------------+---------+-------------------------------+------+---------------------------------+
I know that the easiest way to optimize the query is to add an Index, but I can't seem to figure out how to add an Index for a Count() query that involves a JOIN and an order_by. I should also mention that I am running this through the Django ORM, so would prefer to not change the sql query and just work on fixing and fine tuning the db to run the query in the most effective way.
I've been trying to figure this out for a while, so any help would be much appreciated!
UPDATE
Here are the indexes that are already in the db:
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| appname_favorite | 0 | PRIMARY | 1 | id | A | 594 | NULL | NULL | | BTREE | |
| appname_favorite | 1 | appname_favorite_fbfc09f1 | 1 | user_id | A | 12 | NULL | NULL | | BTREE | |
| appname_favorite | 1 | appname_favorite_67b70d25 | 1 | item_id | A | 594 | NULL | NULL | | BTREE | |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Actually you can't avoid filesort because the count is determined at the calculation time and is unknown in the index. The only solution I can imagine is to create a composite index for table appname_item, which may help a little or not, depending on your particular data:
ALTER TABLE appname_item ADD UNIQUE INDEX `item_id_name` (`id` ASC, `name` ASC);
There is nothing wrong with your query - it looks good.
It could be the the optimizer has out-of-date info about the table. Try running this:
ANALYZE TABLE <tableaname>;
for all tables involved.
Firstly, for the count() function, you can check this answer to know more detail:
https://stackoverflow.com/a/2710630/1020600
For example, using MySQL, count(*) will be fast under a MyISAM table
but slow under an InnoDB. Under InnoDB you should use count(1) or
count(pk)
If your storage engines is MYISAM and if you want to count on row (i guess so), use count(*) is enough.
From your EXPLAIN, I found there's no Key for appname_item, if i try to add a condition
where `appname_item`.`id` = `appname_favorite`.`item_id`
then the "key" appears. so funny but it's work.
The final sql like this
explain SELECT `appname_item`.`id`, `appname_item`.`name`, COUNT(*) AS `num_favorites`
FROM `appname_item`
LEFT OUTER JOIN `appname_favorite` ON (`appname_item`.`id` = `appname_favorite`.`item_id`)
where `appname_item`.`id` = `appname_favorite`.`item_id`
GROUP BY `appname_item`.`id`, `appname_item`.`name`
ORDER BY `num_favorites` DESC;
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra
|
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | appname_favorite | index | item_id |
item_id | 5 | NULL | 2312 | Using
index; Using temporary; Using filesort | | 1 | SIMPLE |
appname_item | eq_ref | PRIMARY | PRIMARY | 4 |
test.appname_favorite.item_id | 1 | Using where
|
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------+------+----------------------------------------------+
On my computer, table appname_item has 1686 rows and appname_favorite has 2312 rows, old sql takes from 15 to 23ms. new sql takes 3.7 to 5.3ms