I have two tables:
localities:
CREATE TABLE `localities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`type` varchar(30) NOT NULL,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rgt` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_localities_on_parent_id_and_type` (`parent_id`,`type`),
KEY `index_localities_on_name` (`name`),
KEY `index_localities_on_lft_and_rgt` (`lft`,`rgt`)
) ENGINE=InnoDB;
locatings:
CREATE TABLE `locatings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`localizable_id` int(11) DEFAULT NULL,
`localizable_type` varchar(255) DEFAULT NULL,
`locality_id` int(11) NOT NULL,
`category` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_locatings_on_locality_id` (`locality_id`),
KEY `localizable_and_category_index` (`localizable_type`,`localizable_id`,`category`),
KEY `index_locatings_on_category` (`category`)
) ENGINE=InnoDB;
localities table is implemented as a nested set.
Now, when user belongs to some locality (through some locating) he also belongs to all its ancestors (higher level localities). I need a query that will select all the localities that all the users belong to into a view.
Here is my try:
select distinct lca.*, lt.localizable_type, lt.localizable_id
from locatings lt
join localities lc on lc.id = lt.locality_id
left join localities lca on (lca.lft <= lc.lft and lca.rgt >= lc.rgt)
The problem here is that it takes way too much time to execute.
I consulted EXPLAIN:
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| 1 | SIMPLE | lt | ALL | index_locatings_on_locality_id | NULL | NULL | NULL | 4926 | 100.00 | Using temporary |
| 1 | SIMPLE | lc | eq_ref | PRIMARY | PRIMARY | 4 | bzzik_development.lt.locality_id | 1 | 100.00 | |
| 1 | SIMPLE | lca | ALL | index_localities_on_lft_and_rgt | NULL | NULL | NULL | 11439 | 100.00 | |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
3 rows in set, 1 warning (0.00 sec)
The last join obviously doesn’t use lft, rgt index as I expect it to. I’m desperate.
UPDATE:
After adding a condition as #cairnz suggested, the query takes still too much time to process.
UPDATE 2: Column names instead of the asterisk
Updated query:
SELECT DISTINCT lca.id, lt.`localizable_id`, lt.`localizable_type`
FROM locatings lt FORCE INDEX(index_locatings_on_category)
JOIN localities lc
ON lc.id = lt.locality_id
INNER JOIN localities lca
ON lca.lft <= lc.lft AND lca.rgt >= lc.rgt
WHERE lt.`category` != "Unknown";
Updated EXAPLAIN:
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| 1 | SIMPLE | lt | range | index_locatings_on_category | index_locatings_on_category | 153 | NULL | 2545 | 100.00 | Using where; Using temporary |
| 1 | SIMPLE | lc | eq_ref | PRIMARY,index_localities_on_lft_and_rgt | PRIMARY | 4 | bzzik_production.lt.locality_id | 1 | 100.00 | |
| 1 | SIMPLE | lca | ALL | index_localities_on_lft_and_rgt | NULL | NULL | NULL | 11570 | 100.00 | Range checked for each record (index map: 0x10) |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
Any help appreciated.
Ah, it just occurred to me.
Since you are asking for everything in the table, mysql decides to use a full table scan instead, as it deems it more efficient.
In order to get some key usage, add in some filters to restrict looking for every row in all the tables anyways.
Updating Answer:
Your second query does not make sense. You are left joining to lca yet you have a filter in it, this negates the left join by itself. Also you're looking for data in the last step of the query, meaning you will have to look through all of lt, lc and lca in order to find your data. Also you have no index with left-most column 'type' on locations, so you still need a full table scan to find your data.
If you had some sample data and example of what you are trying to achieve it would perhaps be easier to help.
try to experiment with forcing index - http://dev.mysql.com/doc/refman/5.1/en/index-hints.html, maybe it's just optimizer issue.
It looks like you're wanting the parents of the single result.
According to the person credited with defining Nested Sets in SQL, Joe Celko at http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html "This model is a natural way to show a parts explosion, because a final assembly is made of physically nested assemblies that break down into separate parts."
In other words, Nested Sets are used to filter children efficiently to an arbitrary number of independent levels within a single collection. You have two tables, but I don't see where the properties of the set "locatings" can't be de-normalized into "localities"?
If the localities table had a geometry column, could I not find the one locality from a "locating" and then select on the one table using a single filter: parent.lft <= row.left AND parent.rgt >= row.rgt ?
UPDATED
In this answer https://stackoverflow.com/a/1743952/3018894, there is an example from http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/ where the following example gets all the ancestors to an arbitrary depth of 100000:
SELECT hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM (
SELECT #r AS _id,
#level := #level + 1 AS level,
(
SELECT #r := NULLIF(parent, 0)
FROM t_hierarchy hn
WHERE id = _id
)
FROM (
SELECT #r := 1000000,
#level := 0
) vars,
t_hierarchy hc
WHERE #r IS NOT NULL
) hc
JOIN t_hierarchy hp
ON hp.id = hc._id
ORDER BY
level DESC
Related
I have a mySql table where all status changes are recorded. I want to be able to query the status of all items on a specific date, or the last date for all items. The table I have now is:
CREATE TABLE `tra_rel_sta` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tra_id` int(11) DEFAULT NULL,
`sta_id` int(11) DEFAULT NULL,
`changed_on` datetime DEFAULT NULL,
`changed_by` int(11) DEFAULT NULL,
`comments` text,
PRIMARY KEY (`id`),
KEY `tra_id` (`tra_id`),
KEY `rel` (`tra_id`,`sta_id`,`changed_on`),
KEY `sta_id` (`sta_id`),
KEY `changed_on` (`changed_on`),
KEY `tra_changed` (`tra_id`,`changed_on`)
) ENGINE=InnoDB AUTO_INCREMENT=51734 DEFAULT CHARSET=utf8;
(I know I'm probably overdoing the indexes, but I haven't exactly figured out how to optimize indexes yet).
The query I'm using now, which works is:
SELECT rel.changed_on, rel.changed_by, rel.tra_id, sta.id AS sta_id, sta.status, sta.description, sta.onHold, sta.awaitingApproval, sta.approved, sta.complete, sta.locked
FROM (
SELECT tra_id, MAX(changed_on) AS lst
FROM tra_rel_sta
GROUP BY tra_id
) AS rec
LEFT JOIN tra_rel_sta AS rel ON rel.changed_on = rec.lst AND rel.tra_id = rec.tra_id
LEFT JOIN tra_status AS sta ON sta.id = rel.sta_id
If I want to use a specific date, I insert a WHERE statement in the sub-query.
This works, but it takes about 0.65 seconds to run in PHP with about 51,733 records in the table. This query is used as a sub query in several others when I need to know the last status of an object, and as a result, is slowing down many application.
I've tried to use a sub query in the WHERE statement as described in MySQL: how to select record with latest date before a certain date but it takes almost twice as long. I've tried using a JOIN statement as described in MySQL select of record with latest date but I'm getting about the same or just slightly slower results.
How can I optimize this query or fix my indexes to make this more effective?
Thanks!!
As requested, EXPLAIN of query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---|-------------|-------------|--------|-----------------------------------|---------|---------|-------------------|-------|-------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 49931 | NULL
1 | PRIMARY | rel | ref | tra_id,rel,changed_on,tra_changed | tra_id | 5 | rec.tra_id | 1 | Using where
1 | PRIMARY | sta | eq_ref | PRIMARY | PRIMARY | 4 | csinfo.rel.sta_id | 1 | NULL
2 | DERIVED | tra_rel_sta | index | tra_id,rel,tra_changed | tra_id | 5 | NULL | 49931 | NULL
I've got this mysql query:
SELECT DISTINCT post.postId,hash,previewUrl,lastRetrieved
FROM post INNER JOIN (tag as t1,taggedBy as tb1,tag as t2,taggedBy as tb2,tag as t3,taggedBy as tb3)
ON post.id=tb1.postId AND tb1.tagId=t1.id AND post.id=tb2.postId AND tb2.tagId=t2.id AND post.id=tb3.postId AND tb3.tagId=t3.id
WHERE ((t1.name="a" AND t2.name="b") OR t3.name="c")
ORDER BY post.postId DESC LIMIT 0,100;
it takes around 15 seconds to run that query, whereas the same query without DISTINCT takes less than a second.
EXPLAIN output for the query with DISTINCT:
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+
| 1 | SIMPLE | post | index | PRIMARY | postId | 4 | NULL | 1 | Using temporary |
| 1 | SIMPLE | tb1 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb1.tagId | 1 | Distinct |
| 1 | SIMPLE | tb2 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb2.tagId | 1 | Distinct |
| 1 | SIMPLE | tb3 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index; Distinct |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb3.tagId | 1 | Using where; Distinct |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-----------------------+
7 rows in set (0.01 sec)
EXPLAIN output for the query without DISTINCT:
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+
| 1 | SIMPLE | post | index | PRIMARY | postId | 4 | NULL | 1 | NULL |
| 1 | SIMPLE | tb1 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb1.tagId | 1 | NULL |
| 1 | SIMPLE | tb2 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb2.tagId | 1 | NULL |
| 1 | SIMPLE | tb3 | ref | PRIMARY,tagId | PRIMARY | 4 | e621datamirror.post.id | 13 | Using index |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY,name,name_2 | PRIMARY | 4 | e621datamirror.tb3.tagId | 1 | Using where |
+----+-------------+-------+--------+---------------------+---------+---------+--------------------------+------+-------------+
CREATE TABLE `post` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`postId` int(11) NOT NULL,
`hash` varchar(32) COLLATE utf8_bin NOT NULL,
`previewUrl` varchar(512) COLLATE utf8_bin NOT NULL,
`lastRetrieved` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `postId` (`postId`),
UNIQUE KEY `hash` (`hash`),
KEY `postId_2` (`postId`),
KEY `postId_3` (`postId`)
) ENGINE=InnoDB AUTO_INCREMENT=692561 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE `tag` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`),
KEY `name_2` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=157876 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE `taggedBy` (
`postId` int(11) NOT NULL,
`tagId` int(11) NOT NULL,
PRIMARY KEY (`postId`,`tagId`),
KEY `tagId` (`tagId`),
CONSTRAINT `taggedBy_ibfk_1` FOREIGN KEY (`postId`) REFERENCES `post` (`id`) ON DELETE CASCADE,
CONSTRAINT `taggedBy_ibfk_2` FOREIGN KEY (`tagId`) REFERENCES `tag` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
what causes this query to be so slow? how can I speed it up?
I hope I've given enough information so you guys can give me some meaningful answers. if I've left something out I'll be happy to add it.
Several things are being discussed, even in #SlimGhost's reasonable (but deleted) answer.
DISTINCT vs GROUP BY
Although GROUP BY can sometimes be used to replace DISTINCT, don't do it; they are meant for different things.
They both require some form of extra effort. (I'll get to the 10x later.) Both have to discover common values -- either in the entire row (for DISTINCT) or for the grouped items. This can be done in one of at least two ways. (Probably most engines have these options built in.) Note that the DISTINCT or GROUP BY must logically come after WHERE, but before ORDER BY and LIMIT.
Keep some kind of internal associative array as the output is being generated. This is practical if the the optimizer can see that there won't be "too many" possible different values.
Sort the output; then dedup or group in a pass over the output. This works regardless of size.
ORDER BY + LIMIT
Notice that the query is doing DISTINCT over 4 columns: post.postId, hash, previewUrl, lastRetrieved. It is not obvious whether these are all in post or scattered across the 7 tables. (Please clarify by qualifying every column.)
Let's assume the JOINs need to be done to find the 4 columns.
Let's say there is no DISTINCT. Now, the operations are
Walk through post in ORDER BY post.postID order.
For each such row, do the JOINs and check the WHERE.
After 100 rows have passed the WHERE, stop.
But with DISTINCT, the optimizer can't make such a simplifying assumption in order to stop short. Instead:
Walk through post in ORDER BY post.postID order. (Starting with t1/t2/t3 is out of the question because of OR.) Actually, it is unclear whether the optimizer would bother going in this order.
For each such row, do the JOINs and check the WHERE.
Do something about DISTINCT.
After 100 rows have passed the WHERE, stop. Note: This may involve lots more rows from post (perhaps 10x?)
Keep in mind that the optimizer knows nothing about whether postId is 1:1 with hash, etc. So, it can't make simplifying assumptions. Suppose there were 200 rows in the JOIN with the smallest postId, and the hash happened to be in descending order. Smells like a need for a "sort".
EXPLAIN FORMAT=JSON SELECT ... might give you some of these details.
Ouch. You have both a id and UNIQUE(postid)? Get rid of id and turn the postId into the PRIMARY KEY. This, alone, may speed things up.
What is the hash a hash of?
Please use the JOIN ... ON ... syntax.
You have 3 indexes on postId; get rid of the extra two.
Why use DISTINCT?
Now that I see that all the SELECTed columns come from the one table, and that they will obviously be easily made distinct, why even consider using DISTINCT.
(updates)
JOIN ON
FROM post INNER JOIN (tag as t1,taggedBy as tb1,...
ON post.id=tb1.postId AND tb1.tagId=t1.id AND ...
-->
FROM post
JOIN tag AS t1 ON post.id = tb1.postId
JOIN taggedBy AS tb1 ON tb2.tagId = t2.id
... (each ON is next to the JOIN it applies to)
A speedup technique
SELECT p2.postId, p2.hash, p2.previewUrl, p2.lastRetrieved
FROM (
SELECT DISTINCT postId -- Only the PRIMARY KEY
FROM post
JOIN ... etc
WHERE ... ...
ORDER BY postId
LIMIT 100
) x
JOIN post AS p2 ON x.postId = p2.id -- self join for getting rest of fields
ORDER BY x.postId -- assuming you need the ordering
This puts the DISTINCT in the inner query, where you are fetching only the one column (postId). (I am not sure whether this technique will help much in your case.)
Why would this query (and a number of similar variants) not use the index for ASIN on the 'tags' table? It insists on a full-table scan even when A contains just a few rows. As 'tags' table on production contains nearly a million entries, it's killing the query rather badly.
SELECT C.tag, count(C.tag) AS total
FROM
(
SELECT B.*
FROM
(
SELECT ASIN FROM requests WHERE user_id=9
) A
INNER JOIN tags B USING(ASIN)
) C
GROUP BY C.tag ORDER BY total DESC
EXPLAIN shows no index being used (run on test DB so rows in 'tags' is low, but still a full table scan):
| 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 0 | const row not found |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 28 | |
| 2 | DERIVED | B | ALL | NULL | NULL | NULL | NULL | 2593 | Using where; Using join buffer |
| 3 | DERIVED | borrowing_requests | ref | idx_user_id | idx_user_id | 5 | | 27 | Using where
Indexes:
| book_tags | 1 | asin | 1 | ASIN | A | 432 | NULL | NULL | | BTREE | |
| book_tags | 1 | idx_tag | 1 | tag | A | 1296 | NULL | NULL | | BTREE | |
| book_tags | 1 | idx_updated_on | 1 | updated_on | A | 518 | NULL | NULL | | BTREE
The query was rewritten from an INNER JOIN which was having the same problem:
SELECT tag, count(tag) AS total
FROM tags
INNER JOIN requests ON requests.ASIN=tags.ASIN
WHERE user_id=9
GROUP BY tag
ORDER BY total DESC
EXPLAIN:
| 1 | SIMPLE | tags | ALL | NULL | NULL | NULL | NULL | 2593 | Using temporary; Using filesort |
| 1 | SIMPLE | requests | ref | idx_ASIN,idx_user_id | idx_ASIN | 33 | func | 3 | Using where
I get the idea this is a real basic point I'm missing, but about 4 hours work on it has got me nowhere. Any advice is welcome.
EDIT:
I can see that the first query using sub-queries won't use indexes thanks to some replies, but this was being used as it ran twice as quick as the bottom query with just the INNER JOIN.
As an example, there are 70k rows in requests (all with an indexed ASIN), and 700k rows in tags, with 95k different ASINs in tags, each with less than 10 different tag records.
If a user has 10 requests, I only want the tags from those 10 ASINs to be listed and counted. In my mind, this should use tags.idx_ASIN and should lookup 100 rows (10 ASINs, each with max of 10 tags) at most from the tags table.
I'm missing something...I just can't see what.
EDIT:
requests CREATE TABLE:
CREATE TABLE IF NOT EXISTS `requests` (
`bid` int(40) NOT NULL AUTO_INCREMENT,
`user_id` int(20) DEFAULT NULL,
`ASIN` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
`status` enum('active','inactive','pending','deleted','completed') COLLATE utf8_unicode_ci NOT NULL,
`added_on` datetime NOT NULL,
`status_changed_on` datetime NOT NULL,
`last_emailed` datetime DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`bid`),
KEY `idx_ASIN` (`ASIN`),
KEY `idx_status` (`status`),
KEY `idx_added_on` (`added_on`),
KEY `idx_user_id` (`user_id`),
KEY `idx_status_changed_on` (`status_changed_on`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=149380 ;
tags CREATE TABLE
CREATE TABLE IF NOT EXISTS `tags` (
`ASIN` varchar(10) NOT NULL,
`tag` varchar(50) NOT NULL,
`updated_on` datetime NOT NULL,
KEY `idx_tag` (`tag`),
KEY `idx_updated_on` (`updated_on`),
KEY `idx_asin` (`ASIN`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
There is no primary key on tags. I don't usually have tables without primary keys, but didn't see the need on this one. Could this be an issue?
AHA! Different charsets and collations. I shall correct that and try again!
Later:
That got it. Query went down from 10secs to 0.006secs. Thanks to everyone for getting me to look at this differently.
MySQL doesn't index subqueries. If you want indexes to improve performance of your queries, rewrite them to not use subqueries.
Try reversing the order of the tables in your original query:
SELECT tag, count(tag) AS total
FROM requests
INNER JOIN tags ON requests.ASIN=tags.ASIN
WHERE user_id=9
GROUP BY tag
ORDER BY total DESC
AHA! Different charsets and collations. I shall correct that and try again!
Later:
That got it. Query went down from 10secs to 0.006secs. Thanks to everyone for getting me to look at this differently.
I have the following query:
explain select * from users, dls where dls.user_id=users.id and users.status = 'accepted' and users.acc = 0 order by users.user_name desc limit 18416, 16
Which results in the following explain;
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
| 1 | SIMPLE | dls | ALL | PRIMARY,user_id | NULL | NULL | NULL | 19910 | Using temporary; Using filesort |
| 1 | SIMPLE | users | ref | PRIMARY,id_user_name | id_user_name | 4 | dls.user_id | 1 | Using where |
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
2 rows in set (0.00 sec)
This query is really, really slow and I cannot figure out how to fix it. I tried all kinds of indexes from reading articles on how to optimize order by / limit queries, but the result remains the same. Can anyone please help?
Edit: schemas:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_name` varchar(100) character set utf8 NOT NULL,
`status` enum('accepted','rejected') character set utf8 NOT NULL,
`acc` varchar(6) character set utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `user_name` (`user_name`),
KEY `id_user_name` (`id`,`user_name`)
)
CREATE TABLE `dls` (
`user_id` int(10) unsigned NOT NULL,
`category_id` bigint(20) NOT NULL,
`download_url` varchar(255) character set utf8 NOT NULL,
PRIMARY KEY (`user_id`,`category_id`),
KEY `user_id` (`user_id`)
)
Output for query by Scrummeister;
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
| 1 | SIMPLE | u | ALL | PRIMARY,id_user_name | NULL | NULL | NULL | 10838 | Using where; Using filesort |
| 1 | SIMPLE | dls | ref | PRIMARY,user_id | user_id | 4 | u.id | 2 | |
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
MySql is known to have issues with a LIMIT using a large offset.
The STRAIGHT_JOIN keyword, tells MySql to first scan the users table and then for every user, look up the rows in the dls table.
SELECT STRAIGHT_JOIN *
FROM users u JOIN dls ON dls.user_id = users.id
WHERE u.status = 'accepted' and u.acc = 0
ORDER BY users.user_name desc
LIMIT 18416, 16
Using STRAIGHT_JOIN is not recommended unless there is a need for it, In this specific case i believe it might work since it can use the user_name index for Sorting.
Other options you have:
Increase the size of sort_buffer_size
Increase the size of read_rnd_buffer_size (with caution!)
Doing the paging on the users table only, regardless of how many dls he has, Only than apply the JOIN.
Handle the paging in your code. Assuming a user goes from page to page with skipping to many, you should store the first & last user names for each page. If the user clicks the next page - Add a WHERE user_name > "{LastPageLastUsername} LIMIT 0,16" this will increase
For other optimization, read ORDER BY Optimization and Limit Optimization
Try add an index to the users table with the following columns
status, acc, user_name
or
acc, status, user_name
which ever is the faster
In MySQL 5.0.75-0ubuntu10.2 I've got a fixed table layout like that:
Table parent with an id
Table parent2 with an id
Table children1 with a parentId
CREATE TABLE `Parent` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Parent2` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Children1` (
`id` int(11) NOT NULL auto_increment,
`parentId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `parent` (`parentId`)
) ENGINE=InnoDB
A children has a parent in one of the tables Parent or Parent2. When I need to get a children I use a query like that:
select * from Children1 c
inner join (
select id as parentId from Parent
union
select id as parentId from Parent2
) p on p.parentId = c.parentId
Explaining this query yields:
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| 3 | UNION | Parent2 | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
which is reasonable given the layout.
Now the problem: The previous query is somewhat useless, since it returns no columns from the parent elements. In the moment I add more columns to the inner query no index will be used anymore:
mysql> explain select * from Children1 c inner join ( select id as parentId,name from Parent union select id as parentId,name from Parent2 ) p on p.parentId = c.parentId;
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | ALL | NULL | NULL | NULL | NULL | 1 | |
| 3 | UNION | Parent2 | ALL | NULL | NULL | NULL | NULL | 1 | |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
Can anyone explain why the (PRIMARY) indices are not used any more? Is there a workaround for this problem if possible without having to change the DB layout?
Thanks!
I think that the optimizer falls down once you start pulling out multiple columns in the derived query because of the possibility that it would need to convert data types on the union (not in this case, but in general). It may also be due to the fact that your query essentially wants to be a correlated derived subquery, which isn't possible (from dev.mysql.com):
Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
What you are trying to do (but isn't valid) is:
select * from Children1 c
inner join (
select id as parentId from Parent where Parent.id = c.parentId
union
select id as parentId from Parent2 where Parent.id = c.parentId
) p
Result: "Unknown column 'c.parentId' in 'where clause'.
Is there a reason you don't prefer two left joins and IFNULLs:
select *, IFNULL(p1.name, p2.name) AS name from Children1 c
left join Parent p1 ON p1.id = c.parentId
left join Parent2 p2 ON p2.id = c.parentId
The only difference between the queries is that in yours you'll get two rows if there is a parent in each table. If that's what you want/need then this will work well also and joins will be fast and always make use of the indexes:
(select * from Children1 c join Parent p1 ON p1.id = c.parentId)
union
(select * from Children1 c join Parent2 p2 ON p2.id = c.parentId)
My first thought is to insert a "significant" number of records in the tables and use ANALYZE TABLE to update the statistics. A table with 4 records will always be faster to read using a full scan rather then going via the index!
Further, you can try USE INDEX to force the usage of the index and look how the plan changes.
I will also recomend reading this documentation and see which bits are relevant
MYSQL::Optimizing Queries with EXPLAIN
This article can also be useful
7 ways to convince MySQL to use the right index