Here's the query:
SELECT
u.uid as UID,
fuo.uid as FUO_UID,
fo.prid as FO_NAME
FROM
users u
LEFT OUTER JOIN firstpoint_users_organisations fuo ON (u.uid=fuo.uid)
LEFT OUTER JOIN firstpoint_organisations fo ON (fo.nid=fuo.nid)
WHERE
u.status=1 AND u.uid>1
ORDER BY u.uid
LIMIT 3;
And the tables:
users
+------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+----------------+
| uid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(60) | NO | UNI | | |
| status | tinyint(4) | NO | | 0 | |
+-----------------------------------------------------------------------------+
firstpoint_users_organisations
+-------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| nid | int(10) unsigned | NO | PRI | 0 | |
| uid | int(10) unsigned | NO | PRI | 0 | |
+-------+------------------+------+-----+---------+-------+
firstpoint_organisations
+----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| nid | int(10) unsigned | NO | PRI | 0 | |
| prid | varchar(32) | NO | | | |
+------------------------------------------------------------+
I wish to show users.uid and firstpoint_organisations.prid for every row in users, even though some users won't have a prid, in which case I show NULL (hence the left outer joins). The connection should be as follows:
users
uid - firstpoint_users_organisations
\---->uid
nid - firstpoint_organisations
\-------->nid
prid
So each user (users) has a user id (uid), and the organisation they're associated with (firstpoint_users_organisation) has a node id (nid) and stores this association. The organisation's details are then stored in firstpoint_organisations.
So every user will have a prid, but if they don't, show NULL.
Now, if I do an INNER JOIN on firstpoint_users_organisations and then on firstpoint_organisations, I get a good query speed (the above query runs in 0.02 seconds). But, when I switch both to LEFT OUTER JOIN, so I can get all users, prid or no prid, the above query takes ~90 seconds to run.
Is there anything I can do to speed this query up? There are approx. 70,000 rows in the users table, but even with LIMIT 3, the making the INNER JOIN a LEFT OUTER JOIN takes a horrible amount of time. Interestingly, the query takes the same amount of time to run with LIMIT 30, so I think there's something fundamentally wrong with my query.
EXPLAIN as requested:
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+----------------------------------------------+
| 1 | SIMPLE | u | range | PRIMARY | PRIMARY | 4 | NULL | 13152 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | fuo | index | NULL | PRIMARY | 8 | NULL | 3745 | Using index |
| 1 | SIMPLE | fo | eq_ref | PRIMARY | PRIMARY | 4 | dbdb-dbdb_uat.fuo.nid | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------+-------+----------------------------------------------+
3 rows in set (0.00 sec)
Your query is pointlessly (because uid > 1 would include all but one of the users) using the index on uid, so use the IGNORE INDEX hint for that index:
SELECT
u.uid as UID,
fuo.uid as FUO_UID,
fo.prid as FO_NAME
FROM users u IGNORE INDEX (uid)
LEFT JOIN firstpoint_users_organisations fuo ON u.uid=fuo.uid
LEFT JOIN firstpoint_organisations fo ON fo.nid=fuo.nid
WHERE u.status=1
AND u.uid > 1
ORDER BY u.uid
LIMIT 3
You should put an index on users(status), which may give you some benefit if there are enough rows with status != 1
It is quite expected that changing the LIMIT would have no effect, because 70000 rows must be sorted before the limit is applied to know which rows are the first rows to return - the limit has little effect, except that less rows are returned to the client (less comma IO)
I'm a believer in "less code is good", so from a strictly style point of view I have removed non essential code from your query:
removed OUTER because there is no other kind of left join
removed brackets around join conditions because you don't need 'em
I would use a unique index on u.status,u.uid for that, because mysql has to make a fullscan to look, which entries has status = 1 I think.
I hope it's faster afterwards ;)
Related
I have a MySQL query which has a JOIN of 12 tables. When I explain the query, It showing 394699 rows for one table and 185368 rows for another table. All other tables has 1-3 rows. The total result which I am getting from the query id 472 rows only. But for that, it is taking more than 1 minute.
Is there any way to check how many rows has been analyzed to produce such a result? So that, I can find which is the table costs the higher time.
I am giving the query structure below. As the table structure is too high, I am not able to provide it here.
SELECT h.nid,h.attached_nid,h.created, s.field_species_value as species, g.field_gender_value as gender, u.field_unique_id_value as unqid, n.title, dob.field_adult_healthy_weight_value as birth_date, dcolor.field_dog_primary_color_value as dogcolor, ccolor.field_primary_color_value as catcolor, sdcolor.field_dog_secondary_color_value as sdogcolor, sccolor.field_secondary_color_value as scatcolor, dpattern.field_dog_pattern_value as dogpattern, cpattern.field_cat_pattern_value as catpattern
FROM table1 h
JOIN table2 n ON n.nid = h.nid
JOIN table3 s ON n.nid = s.entity_id
JOIN table4 u ON n.nid = u.entity_id
LEFT JOIN table5 g ON n.nid = g.entity_id
LEFT JOIN table6 dob ON n.nid = dob.entity_id
LEFT JOIN table7 AS dcolor ON n.nid = dcolor.entity_id
LEFT JOIN table8 AS ccolor ON n.nid = ccolor.entity_id
LEFT JOIN table9 AS sdcolor ON n.nid = sdcolor.entity_id
LEFT JOIN table10 AS sccolor ON n.nid = sccolor.entity_id
LEFT JOIN table11 AS dpattern ON n.nid = dpattern.entity_id
LEFT JOIN table12 AS cpattern ON n.nid = cpattern.entity_id
WHERE h.title = '4208'
AND ((h.created BETWEEN 1483257600 AND 1485935999))
AND h.uid!=1
AND h.uid IN(
SELECT etid
FROM `table`
WHERE gid=464
AND entity_type='user')
AND h.attached_nid>0
ORDER BY CAST(h.created as UNSIGNED) DESC;
Below is the EXPLAIN result which I get
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| 1 | PRIMARY | s | index | entity_id | field_species_value | 772 | NULL | 394699 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | u | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | n | eq_ref | PRIMARY | PRIMARY | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | g | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dob | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | ccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sdcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | cpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | h | ref | attached_nid,nid,uid | nid | 5 | pantheon.s.entity_id | 3 | Using index condition; Using where |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | Using where |
| 2 | MATERIALIZED | og_membership | ref | entity,gid | gid | 4 | const | 185368 | Using where |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
You can find the ROWS_EXAMINED by using the Performance Schema.
Here is a link to the performance schema quick start guide.
https://dev.mysql.com/doc/refman/5.5/en/performance-schema-quick-start.html
This is the query I run in PHP applications, to find out what queries I need to optimize. You should be able to adapt it pretty easily.
The query finds the stats on the query that was run before this one. So in my apps, I run query after every query I run, store the results, then at the end of the PHP script I output the stats for every query I ran during the request.
SELECT `EVENT_ID`, TRUNCATE(`TIMER_WAIT`/1000000000000,6) as Duration,
`SQL_TEXT`, `DIGEST_TEXT`, `NO_INDEX_USED`, `NO_GOOD_INDEX_USED`, `ROWS_AFFECTED`, `ROWS_SENT`, `ROWS_EXAMINED`
FROM `performance_schema`.`events_statements_history`
WHERE
`CURRENT_SCHEMA` = '{$database}' AND `EVENT_NAME` LIKE 'statement/sql/%'
AND `THREAD_ID` = (SELECT `THREAD_ID` FROM `performance_schema`.`threads` WHERE `performance_schema`.`threads`.`PROCESSLIST_ID` = CONNECTION_ID())
ORDER BY `EVENT_ID` DESC LIMIT 1;
To decrease the number of rows accessed from og_membership, try adding an index containing the gid, entity_type, and etid fields. Including gid and entity_type should make the lookup more performant and including etid will make the index a covering index.
After adding the index, run EXPLAIN again to look at the results. Based on the new explain plan, either keep the index, remove the index, and/or add an additional index. Keep doing this until you get results you are satisfied with.
For sure, you will want to try and eliminate any mentions of Using temporary or Using filesort. Using temporary implies a temporary table is being used to make this query probably for the sheer size of your intermittent. Using filesort implies ordering isn't being satisfied with an index and is being done by examining the matching rows.
An detail explanation about EXPLAIN can be found at https://dev.mysql.com/doc/refman/5.7/en/explain-output.html.
Key-Value (EAV) schema sucks.
Indexes:
table1: INDEX(title, created)
table1: INDEX(uid, title, created)
table: INDEX(gid, entity_type, etid)
table* -- Is `entity_id` already an index? Can it be the PRIMARY KEY?
Does nid need to be NULL instead of NOT NULL?
If those don't do enough, try:
And turn the IN ( SELECT ... ) into a JOIN ( SELECT ... ) USING(hid)
If you still need help, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...
I need a piece of advice, building an app now and I need to run some queries on rather large tables, possibly at a very frequent rate, so I'm trying to get the best approach performance wise.
I have the following 2 tables:
Albums:
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| eventid | int(11) | NO | MUL | NULL | |
| album | varchar(200) | NO | | NULL | |
| filename | varchar(200) | NO | | NULL | |
| obstacle_time | time | NO | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
and keywords:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| eventid | int(11) | NO | MUL | NULL | |
| filename | varchar(200) | NO | | NULL | |
| bibnumbers | varchar(200) | NO | | NULL | |
| gender | varchar(20) | YES | | NULL | |
| top_style | varchar(20) | YES | | NULL | |
| pants_style | varchar(20) | YES | | NULL | |
| other | varchar(20) | YES | | NULL | |
| cap | varchar(200) | NO | | NULL | |
| tshirt | varchar(200) | NO | | NULL | |
| pants | varchar(200) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
Both table have a unique_index declared which is a constraint of the eventid+filename column.
Both table contains information about some images, but the albums table is available instantly (as soon as I have the images), while the keywords table usually becomes available several days later after a manual tagging of the images is completed
Now I will have people searching for all kind of things once the tagging is enabled, but since the results can be HUGE (up to 10.000 or more) I'm only showing them in small chunks so the browser doesn't get killed with trying to load a huge amount of images, because of this my server will be hit with loads of query requests (every time the visitor scrolls to the bottom of the page, an ajax query will return the next chunk of images).
Now my question is, which of the following queries is better performance wise:
SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
ON `basket`.`eventid` = `albums`.`eventid`
AND `basket`.`fileid` = `albums`.`id`
AND `basket`.`visitor_id` = 1
LEFT JOIN `keywords`
ON `keywords`.`eventid` = `albums`.`eventid`
AND `albums`.`filename` = `keywords`.`filename`
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%string%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `gender` = 1
AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
AND `cap` = '2'
AND `tshirt` = '1'
AND `pants` = '3'
ORDER BY `obstacle_time`
LIMIT X, 10
OR using an IN CLAUSE inside WHERE like:
SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
ON `basket`.`eventid` = `albums`.`eventid`
AND `basket`.`fileid` = `albums`.`id`
AND `basket`.`visitor_id` = 1
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%string%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `filename` IN (
SELECT `filename`
FROM `keywrods`
WHERE
`eventid` = 1
AND `gender` = 1
AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
AND `cap` = '2'
AND `tshirt` = '1'
AND `pants` = '3'
)
ORDER BY `obstacle_time`
LIMIT X, 10
I had looked to similar questions but wasn't able to figure it out which is the best course of action.
My understanding so far is that:
Using LEFT JOINtakes advantages of INDEXING, BUT!!! if I use it I will get a full join of the tables even when I only need a significantly smaller result set, so it's almost a wast to join thousands of rows just to then filter out most of them.
Using IN and subquery isn't indexed??? I'm not 100% sure about this, I'm using MySQL 5.6 and to the best of my understanding since 5.6 even subqueries get automatically indexed my MySQL. I think this method has benefits when there result is significantly filtered, not sure if there will be any benefit if the subquery will return all the possible filenames.
As footnote questions:
Should I consider returning the whole result to the client on the first query and use client side (HTML) techniques to load the images gradually rather than re-querying the server each time?
Should I consider merging the 2 tables into 1, how much of a performance impact will that have? (can be tricky due to various reasons, which have no place in the question)
Thanks.
EDIT 1
Explain for JOIN query:
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | albums_2015 | ref | unique_index | unique_index | 4 | const | 6475 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | basket | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | keywords_2015 | eq_ref | unique_index | unique_index | 206 | const,mybibnumber.albums_2015.filename | 1 | Using index |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
Using WHERE IN:
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| 1 | SIMPLE | albums_2015 | ref | unique_index | unique_index | 4 | const | 6475 | Using where; Using temporary; Using filesort | |
| 1 | SIMPLE | keywords_2015 | eq_ref | unique_index | unique_index | 206 | const,mybibnumber.albums_2015.filename | 1 | Using where | |
| 1 | SIMPLE | basket | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using join buffer (Block Nested Loop) | |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
EDIT 2
I wasn't able to set up a SQL Fiddler (keep getting error of something went wrong), so I have created a test database on one of my servers.
Address: http://188.165.217.185/phpmyadmin/, user: temp_test, pass: test_temp
I'm still building the whole thing and I don't have all the values filled in yet, like top_style, pants_style, etc, so a more appropriate query for the test scenario will be:
WHERE IN:
SELECT `albums_2015`.`filename`,
`albums_2015`.`id`,
`obstacle_time`
FROM `albums_2015`
WHERE `albums_2015`.`eventid` = 1
AND `album` LIKE '%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `filename` IN (SELECT `filename`
FROM `keywords_2015`
WHERE eventid = 1
AND
`bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]')
ORDER BY `obstacle_time`
LIMIT 0, 10
LEFT JOIN
SELECT `albums_2015`.`filename`,`albums_2015`.`id`,`obstacle_time`
FROM `albums_2015`
LEFT JOIN `keywords_2015`
ON `keywords_2015`.`eventid` = `albums_2015`.`eventid`
AND `albums_2015`.`filename` = `keywords_2015`.`filename`
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]'
ORDER BY `obstacle_time`
LIMIT 0, 10
More a bunch of tips :
Join using index are the best if you have to deal with multi table query,
Don't mind adding some index to speed up your query (index take space, but on INT field it's nothing and you gain way more than you lose).
In case of big table, caching the data in the distant table is usually a good idea.
An insert Trigger on TAG_table that cache the displayed part in the distant table (like the tag name for the overview of albums) can help you keeping your join query at a descent frequency.
Be careful with REGEX, it's something that hurt badly the perf. Adding a new table to split data is a better idea (and use indexing which is native optimisation)
For every field in a WHERE clause of a big and frequent query you should have an index on it. If you can't put one, then your DB model is f**cked-up and need to be changed.
I have:
_sms_users_
id
_join_smsuser_campaigns_
sms_user_id
I'd like to pull out the sms_users that don't have a record in the join_smsuser_campaigns table (via the join_smsuser_campaigns.sms_user_id = sms_users.id relationships)
I have the sql:
select * from sms_users where id not in (select sms_user_id from join_smsuser_campaigns);
edit:
here's the explain select results:
mysql> explain select u.* from sms_users u left join join_smsuser_campaigns c on u.id = c.sms_user_id where c.sms_user_id is null;
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| 1 | SIMPLE | u | ALL | NULL | NULL | NULL | NULL | 42303 | |
| 1 | SIMPLE | c | index | NULL | index_join_smsuser_campaigns_on_campaign_id_and_sms_user_id | 8 | NULL | 30722 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
mysql> describe sms_users;
+------------------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
mysql> describe join_smsuser_campaigns;
+---------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+---------+-------+
| sms_user_id | int(11) | NO | | NULL | |
Looks like the problm is that I don't have an index on sms_user_id on the join?
This takes about 5 minutes to run and I see that there is one record. Is there a more efficient way to do this via a join? My sql skills are pretty basic.
Queries can get slow when having many items in the IN clause. Use a left join instead
select u.*
from sms_users u
left join join_smsuser_campaigns c on u.id = c.sms_user_id
where c.sms_user_id is null
See this great join explanation
(edited) For more details about the app it self, please, also see:
Simple but heavy application consuming a lot of resources. How to Optimize?
(The adopted solution was use both joins and fulltext search)
I have the following query running up to roughly 500.000 rows in 25 seconds. If I remove the ORDER, it takes 0.5 seconds.
Fisrt test
Keeping the ORDER and removing all t. and tu. columns, the query takes 7 seconds.
Second test
If I add or remove an INDEX to the i.created_at field the response time remain the same.
QUERY:
**EDITED: I'VE NOTICED THAT BOTH GROUP BY AND ORDER BY SLOW DOWN THE QUERY (I've also achieve a little gain in the query changing the joins. The gain was to 10secs, but at all, the problem remains). With the modification, the EXPLAIN have stopped to return filesort, but stills returning "using temporary" **
SELECT SQL_NO_CACHE
DISTINCT `i`.`id`,
`i`.`entity`,
`i`.`created_at`,
`i`.`collected_at`,
`t`.`status_id` AS `twt_status_id`,
`t`.`user_id` AS `twt_user_id`,
`t`.`content` AS `twt_content`,
`tu`.`id` AS `twtu_id`,
`tu`.`screen_name` AS `twtu_screen_name`,
`tu`.`profile_image` AS `twtu_profile_image`
FROM `mtrt_items` AS `i`
LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id
LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
LEFT JOIN `twt_users` AS `tu` ON u.id = tu.id
INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
INNER JOIN `account_clients` AS `c` ON g.client_id =c.id
ORDER BY `i`.`created_at` DESC
LIMIT 100 OFFSET 0
Here is the EXPLAIN (EDITED):
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
| 1 | SIMPLE | c | index | PRIMARY | PRIMARY | 4 | NULL | 1 | Using index; Using temporary |
| 1 | SIMPLE | g | ref | PRIMARY,client_id | client_id | 4 | clubr_new.c.id | 3 | Using index |
| 1 | SIMPLE | sg | ref | group_id,search_id | group_id | 4 | clubr_new.g.id | 1 | Using index |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | clubr_new.sg.search_id | 1 | Using index |
| 1 | SIMPLE | r | ref | search_id,item_id | search_id | 4 | clubr_new.s.id | 4359 | Using where |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 8 | clubr_new.r.item_id | 1 | |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 8 | clubr_new.i.user_id | 1 | Using index |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | clubr_new.i.id | 1 | |
| 1 | SIMPLE | tu | eq_ref | PRIMARY | PRIMARY | 8 | clubr_new.u.id | 1 | |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
Here is the mtrt_items table:
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| entity | enum('twitter','facebook','youtube','flickr','orkut') | NO | MUL | NULL | |
| user_id | bigint(20) | NO | MUL | NULL | |
| created_at | datetime | NO | MUL | NULL | |
| collected_at | datetime | NO | | NULL | |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
CREATE TABLE `mtrt_items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`entity` enum('twitter','facebook','youtube','flickr','orkut') COLLATE utf8_unicode_ci NOT NULL,
`user_id` bigint(20) NOT NULL,
`created_at` datetime NOT NULL,
`collected_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `mtrt_user_id` (`user_id`),
KEY `entity` (`entity`),
KEY `created_at` (`created_at`),
CONSTRAINT `mtrt_items_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `mtrt_users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=309650 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The twt_tweets_content is MyISAM and is also used for fulltext searches:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| user_id | int(11) | NO | MUL | NULL | |
| status_id | varchar(100) | NO | MUL | NULL | |
| content | varchar(200) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
Instead of placing the Order By into the main query, wrap it, like so:
SELECT * FROM (
... your query
) ORDER BY `created at`
Take a look at the query plan. You will find that in your case, the sort is performed on your table mtrt_items before the outer join is performed. In the rewrite I've partially provided, the sort is applied after the outer joins, and is applied on a much smaller set.
UPDATE
Assuming that the LIMIT is being applied to a large set (500,000?), it looks like you can perform the top before doing any of the joins.
SELECT * from (
SELECT
`id`, ... `created_at`, ...
ORDER BY `i`.`created_at` DESC
LIMIT 100 OFFSET 0) as i
LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id
LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
LEFT JOIN `twt_users` AS `tu` ON t.user_id = tu.id
INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
INNER JOIN `account_clients` AS `c` ON g.client_id =c.id
GROUP BY i.id
Don't include the VARCHAR/TEXT fields in your initial query. This will create the TEMPORARY table required for the sorting, using the MEMORY engine and this will increase the efficiency dramatically. You can collect the text fields later using another query, without any sorting, simply with a condition on the PRIMARY KEY field and merge the data in your script (assuming that you are using one).
Also get rid of any JOINs (INNER or OUTER) that you don't actually take any data from.
I am trying to improve a query which does the following:
For every job, add up all the costs, add up the invoiced amount, and calculate a profit/loss. The costs come from several different tables, e.g. purchaseorders, users_events (engineer allocated time/time he spent on site), stock used etc.
The query also needs to output some other columns like the name of the site for the work, so that that column can be sorted by (an ORDER BY is appended after all of this).
SELECT
jobs.job_id,
jobs.start_date,
jobs.end_date,
events.time,
sites.name site,
IFNULL(stock_cost,0) stock_cost,
labour,
materials,
labour+materials+plant+expenses revenue,
(labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio
FROM
jobs
LEFT JOIN (
SELECT
job_id,
SUM(labour_charge) labour,
SUM(materials_charge) materials,
SUM(plant_hire_charge) plant,
SUM(expenses) expenses
FROM invoices
GROUP BY job_id
ORDER BY NULL
) invoices USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
SUM(travel+parking+materials) user_expenses
FROM users_events
WHERE type='job'
GROUP BY job_id
ORDER BY NULL
) events USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IFNULL(total,0))*0.01 orders_cost
FROM purchaseorders
GROUP BY job_id
ORDER BY NULL
) purchaseorders USING(job_id)
LEFT JOIN (
SELECT
location job_id,
SUM(amount*cost))*0.01 stock_cost
FROM stock_location
LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
WHERE location>=3000 AND amount>0 AND cost>0
GROUP BY location
ORDER BY NULL
) stock USING(job_id)
LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;
I read this: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html but don't see how/if I can apply anything therein.
For testing purposes, I have tried adding all sorts of indices on fields left, right and centre with no improvement to the EXPLAIN output:
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| 1 | PRIMARY | jobs | ALL | NULL | NULL | NULL | NULL | 7088 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5038 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 6476 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 904 | |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 531 | |
| 1 | PRIMARY | sites | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.jobs.site_id | 1 | |
| 5 | DERIVED | stock_location | ALL | stock,location,amount,…| NULL | NULL | NULL | 5426 | Using where; Using temporary; |
| 5 | DERIVED | stock_items | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.stock_location.stock_id | 1 | Using where |
| 4 | DERIVED | purchaseorders | ALL | NULL | NULL | NULL | NULL | 1445 | Using temporary; |
| 3 | DERIVED | users_events | ALL | type,type_job | NULL | NULL | NULL | 11295 | Using where; Using temporary; |
| 2 | DERIVED | invoices | ALL | NULL | NULL | NULL | NULL | 5320 | Using temporary; |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
The rows produced is 5 x 10^21 (down from 3 x 10^42 before I started optimising this query!)
It currently takes seven seconds to execute (down from 26) but I would like that to be under one second.
By the way: GROUP BY x ORDER BY NULL is a great way to eliminate unnecessary filesorts from subqueries! (from http://www.mysqlperformanceblog.com/2006/09/04/group_concat-useful-group-by-extension/)
Based on your comment to my question, I would do the following...
At the very top...
SELECT STRAIGHT_JOIN (just add the "STRAIGH_JOIN" keyword)
Then, for each of your subqueries for invoices, events, p/o's, etc, change the ORDER BY to the JOB_ID explicitly so it might help the optimization against the primary JOBS table join.
Finally, ensure each of your subquery tables HAS an index on the Job_ID (Invoices, User_events, PurchaseOrders, Stock_Location)
Additionally, for the Stock_Location table, you might want to help the WHERE clause for your subquery by having a compound index on
(job_id, location, amount) Three fields deep should be enough even though you have the key plus 3 where condition elements.