MySQL ORDER BY columns across multiple joined tables - mysql

I have a database which consists of three tables, with the following structure:
restaurant table: restaurant_id, location_id, rating. Example: 1325, 77, 4.5
restaurant_name table: restaurant_id, language, name. Example: 1325, 'en', 'Pizza Express'
location_name table: location_id, language, name. Example: 77, 'en', 'New York'
I would like to get the restaurant info in English, sorted by location name and restaurant name, and use the LIMIT clause to paginate the result. So my SQL is:
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
This is terribly slow - so I refined my SQL with deferred JOIN, which make things a lot faster (from over 10 seconds to 2 seconds):
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN (
SELECT r.restaurant_id
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
) r1
ON r.restaurant_id = r1.restaurant_id
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
2 seconds is unfortunately still not very acceptable to the user, so I go and check the EXPLAIN of the my query, and it appears that the slow part is on the ORDER BY clause, which I see "Using temporary; Using filesort". I checked the official reference manual about ORDER BY optimization and I come across this statement:
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it may still use indexes to find the rows that match the
WHERE clause. Examples:
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
So for my case, given that the two columns I'm ordering by are from the nonconstant joined tables, index cannot be used. My question is, is there any other approach I can take to speed things up, or what I've done so far is already the best I can achieve?
Thanks in advance for your help!
EDIT 1
Below is the EXPLAIN output with the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | ALL | idx_restaurant_name_1 | NULL | NULL | NULL | 8484 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
Below is the EXPLAIN output without the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | index | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | NULL | 8484 | Using where; Using index |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where; Using index |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
EDIT 2
Below are the DDL of the table. I built them for illustrating this problem only, the real table has much more columns.
CREATE TABLE restaurant (
restaurant_id INT NOT NULL AUTO_INCREMENT,
location_id INT NOT NULL,
rating INT NOT NULL,
PRIMARY KEY (restaurant_id),
INDEX idx_restaurant_1 (location_id)
);
CREATE TABLE restaurant_name (
restaurant_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_restaurant_name_1 (restaurant_id, language),
INDEX idx_restaurant_name_2 (name)
);
CREATE TABLE location_name (
location_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_location_name_1 (location_id, language),
INDEX idx_location_name_2 (name)
);

Based on the EXPLAIN numbers, there could be about 170 "pages" of restaurants (8484/50)? I suggest that that is impractical for paging through. I strongly recommend you rethink the UI. In doing so, the performance problem you state will probably vanish.
For example, the UI could be 2 steps instead of 170 to get to the restaurants in Zimbabwe. Step 1, pick a country. (OK, that might be page 5 of the countries.) Step 2, view the list of restaurants in that country; it would be only a few pages to flip through. Much better for the user; much better for the database.
Addenda
In order to optimize the pagination, get the paginated list of pages from a single table (so that you can 'remember where you left off'). Then join the language table(s) to look up the translations. Note that this only looks up on page's worth of translations, not thousands.

Related

Find the number total rows analyzed in a MySQL JOIN query

I have a MySQL query which has a JOIN of 12 tables. When I explain the query, It showing 394699 rows for one table and 185368 rows for another table. All other tables has 1-3 rows. The total result which I am getting from the query id 472 rows only. But for that, it is taking more than 1 minute.
Is there any way to check how many rows has been analyzed to produce such a result? So that, I can find which is the table costs the higher time.
I am giving the query structure below. As the table structure is too high, I am not able to provide it here.
SELECT h.nid,h.attached_nid,h.created, s.field_species_value as species, g.field_gender_value as gender, u.field_unique_id_value as unqid, n.title, dob.field_adult_healthy_weight_value as birth_date, dcolor.field_dog_primary_color_value as dogcolor, ccolor.field_primary_color_value as catcolor, sdcolor.field_dog_secondary_color_value as sdogcolor, sccolor.field_secondary_color_value as scatcolor, dpattern.field_dog_pattern_value as dogpattern, cpattern.field_cat_pattern_value as catpattern
FROM table1 h
JOIN table2 n ON n.nid = h.nid
JOIN table3 s ON n.nid = s.entity_id
JOIN table4 u ON n.nid = u.entity_id
LEFT JOIN table5 g ON n.nid = g.entity_id
LEFT JOIN table6 dob ON n.nid = dob.entity_id
LEFT JOIN table7 AS dcolor ON n.nid = dcolor.entity_id
LEFT JOIN table8 AS ccolor ON n.nid = ccolor.entity_id
LEFT JOIN table9 AS sdcolor ON n.nid = sdcolor.entity_id
LEFT JOIN table10 AS sccolor ON n.nid = sccolor.entity_id
LEFT JOIN table11 AS dpattern ON n.nid = dpattern.entity_id
LEFT JOIN table12 AS cpattern ON n.nid = cpattern.entity_id
WHERE h.title = '4208'
AND ((h.created BETWEEN 1483257600 AND 1485935999))
AND h.uid!=1
AND h.uid IN(
SELECT etid
FROM `table`
WHERE gid=464
AND entity_type='user')
AND h.attached_nid>0
ORDER BY CAST(h.created as UNSIGNED) DESC;
Below is the EXPLAIN result which I get
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
| 1 | PRIMARY | s | index | entity_id | field_species_value | 772 | NULL | 394699 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | u | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | n | eq_ref | PRIMARY | PRIMARY | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | g | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dob | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | ccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sdcolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | sccolor | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | dpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | cpattern | ref | entity_id | entity_id | 4 | pantheon.s.entity_id | 1 | |
| 1 | PRIMARY | h | ref | attached_nid,nid,uid | nid | 5 | pantheon.s.entity_id | 3 | Using index condition; Using where |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | Using where |
| 2 | MATERIALIZED | og_membership | ref | entity,gid | gid | 4 | const | 185368 | Using where |
+------+--------------+---------------+--------+----------------------+---------------------+---------+----------------------+--------+----------------------------------------------+
You can find the ROWS_EXAMINED by using the Performance Schema.
Here is a link to the performance schema quick start guide.
https://dev.mysql.com/doc/refman/5.5/en/performance-schema-quick-start.html
This is the query I run in PHP applications, to find out what queries I need to optimize. You should be able to adapt it pretty easily.
The query finds the stats on the query that was run before this one. So in my apps, I run query after every query I run, store the results, then at the end of the PHP script I output the stats for every query I ran during the request.
SELECT `EVENT_ID`, TRUNCATE(`TIMER_WAIT`/1000000000000,6) as Duration,
`SQL_TEXT`, `DIGEST_TEXT`, `NO_INDEX_USED`, `NO_GOOD_INDEX_USED`, `ROWS_AFFECTED`, `ROWS_SENT`, `ROWS_EXAMINED`
FROM `performance_schema`.`events_statements_history`
WHERE
`CURRENT_SCHEMA` = '{$database}' AND `EVENT_NAME` LIKE 'statement/sql/%'
AND `THREAD_ID` = (SELECT `THREAD_ID` FROM `performance_schema`.`threads` WHERE `performance_schema`.`threads`.`PROCESSLIST_ID` = CONNECTION_ID())
ORDER BY `EVENT_ID` DESC LIMIT 1;
To decrease the number of rows accessed from og_membership, try adding an index containing the gid, entity_type, and etid fields. Including gid and entity_type should make the lookup more performant and including etid will make the index a covering index.
After adding the index, run EXPLAIN again to look at the results. Based on the new explain plan, either keep the index, remove the index, and/or add an additional index. Keep doing this until you get results you are satisfied with.
For sure, you will want to try and eliminate any mentions of Using temporary or Using filesort. Using temporary implies a temporary table is being used to make this query probably for the sheer size of your intermittent. Using filesort implies ordering isn't being satisfied with an index and is being done by examining the matching rows.
An detail explanation about EXPLAIN can be found at https://dev.mysql.com/doc/refman/5.7/en/explain-output.html.
Key-Value (EAV) schema sucks.
Indexes:
table1: INDEX(title, created)
table1: INDEX(uid, title, created)
table: INDEX(gid, entity_type, etid)
table* -- Is `entity_id` already an index? Can it be the PRIMARY KEY?
Does nid need to be NULL instead of NOT NULL?
If those don't do enough, try:
And turn the IN ( SELECT ... ) into a JOIN ( SELECT ... ) USING(hid)
If you still need help, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...

Sql query optimization and debug

I am trying to optimize sql query in mysql db. Tried different ways of rewriting it , adding/removing indexes, but nothing seems to decrease the load. Maybe I am missing something.
Query:
select co.country_name as state, ci.city_name as city, ci.city_id, ci.country_id,
count(l.id) as num
FROM cities ci
INNER JOIN countries co ON (ci.country_id = co.country_id)
INNER JOIN dancers l ON (l.city_id = ci.city_id AND l.closed = 0 AND l.approved = 1 )
WHERE 1 AND ci.main=1
GROUP BY ci.city_id
ORDER BY city
Duration : 2.01sec - 2.20sec
Optimized query:
select co.country_name as state, ci.city_name as city, ci.city_id, ci.country_id, count(l.id) as num from
(select ci1.city_name, ci1.city_id, ci1.country_id from cities ci1
where ci1.main=1) as ci
INNER JOIN countries co ON (ci.country_id = co.country_id)
INNER JOIN dancers l ON (l.city_id = ci.city_id AND l.closed = 0 AND l.approved = 1 ) GROUP BY ci.city_id ORDER BY city
Duration : 0.82sec - 0.90sec
But i feel that this query can be optimized even more but not getting the ideea how to optimized it. There are 3 tables
Table 1 : countries ( country_id, country_name)
Table 2 : cities ( city_id, city_name, main, country_id)
Table 3 : dancers ( id, country_id, city_id, closed, approved)
I am trying to get all the cities which have main=1 and for each to count all the profiles that are into those cities joining with countries to get the country_name.
Any ideas are welcomed, thank you.
Later edit : - first query explain
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| 1 | SIMPLE | l | index_merge | city_id,closed,approved,city_id_2 | closed,approved | 1,2 | NULL | 75340 | Using intersect(closed,approved); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ci | eq_ref | PRIMARY,state_id_2,state_id,city_name,lat,city_name_shorter,city_id | PRIMARY | 4 | db.l.city_id | 1 | Using where |
| 1 | SIMPLE | co | eq_ref | PRIMARY | PRIMARY | 4 | db.ci.country_id | 1 | Using where |
+----+-------------+-------+-------------+---------------------------------------------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
Second query explain :
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
| 1 | PRIMARY | co | ALL | PRIMARY | NULL | NULL | NULL | 51 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ref | <auto_key1> | <auto_key1> | 4 | db.co.country_id | 176 | Using where |
| 1 | PRIMARY | l | ref | city_id,closed,approved,city_id_2 | city_id_2 | 4 | ci.city_id | 44 | Using index condition; Using where |
| 2 | DERIVED | ci1 | ALL | NULL | NULL | NULL | NULL | 11765 | Using where |
+----+-------------+------------+------+-----------------------------------+-------------+---------+------------------+-------+------------------------------------+
#used_by_already query explain :
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
| 1 | PRIMARY | co | ALL | PRIMARY | NULL | NULL | NULL | 51 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | db.co.country_id | 565 | Using where |
| 2 | DERIVED | l | index_merge | city_id,closed,approved,city_id_2 | closed,approved | 1,2 | NULL | 75341 | Using intersect(closed,approved); Using where; Using temporary; Using filesort |
| 2 | DERIVED | ci1 | eq_ref | PRIMARY,state_id_2,city_id | PRIMARY | 4 | db.l.city_id | 1 | Using where |
+----+-------------+------------+-------------+-----------------------------------+-----------------+---------+------------------+-------+--------------------------------------------------------------------------------+
I suggest you try this:
SELECT
co.country_name AS state
, ci.city_name AS city
, ci.city_id
, ci.country_id
, ci.num
FROM (
SELECT
ci1.city_id
, ci1.city_name
, ci1.country_id
, COUNT(l.id) AS num
FROM cities ci1
INNER JOIN dancers l ON l.city_id = ci1.city_id
AND l.closed = 0
AND l.approved = 1
WHERE ci1.main = 1
GROUP BY
ci1.city_id
, ci1.city_name
, ci1.country_id
) AS ci
INNER JOIN countries co ON ci.country_id = co.country_id
;
And that you provide the explain plan output for further analysis if needed. When optimizing knowing what indexes exist, and having he explain plan, are essentials.
Not also, that MySQL does permit non-standard GROUP BY syntax (where only one or some of the columns in the select list are included under the group by clause).
In recent versions of MySQL the default behaviour for GROUP BY has changed to SQL standard syntax (where all "non-aggregating" columns in the select list must be included under the group by clause). While your existing queries use the non-standard group by syntax, the query supplied here dis compliant.

Debugging Slow mySQL query with Explain

Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m
MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)
Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';

MySQL Join Optimisation: Improving join type with derived tables and GROUP BY

I am trying to improve a query which does the following:
For every job, add up all the costs, add up the invoiced amount, and calculate a profit/loss. The costs come from several different tables, e.g. purchaseorders, users_events (engineer allocated time/time he spent on site), stock used etc.
The query also needs to output some other columns like the name of the site for the work, so that that column can be sorted by (an ORDER BY is appended after all of this).
SELECT
jobs.job_id,
jobs.start_date,
jobs.end_date,
events.time,
sites.name site,
IFNULL(stock_cost,0) stock_cost,
labour,
materials,
labour+materials+plant+expenses revenue,
(labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio
FROM
jobs
LEFT JOIN (
SELECT
job_id,
SUM(labour_charge) labour,
SUM(materials_charge) materials,
SUM(plant_hire_charge) plant,
SUM(expenses) expenses
FROM invoices
GROUP BY job_id
ORDER BY NULL
) invoices USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
SUM(travel+parking+materials) user_expenses
FROM users_events
WHERE type='job'
GROUP BY job_id
ORDER BY NULL
) events USING(job_id)
LEFT JOIN (
SELECT
job_id,
SUM(IFNULL(total,0))*0.01 orders_cost
FROM purchaseorders
GROUP BY job_id
ORDER BY NULL
) purchaseorders USING(job_id)
LEFT JOIN (
SELECT
location job_id,
SUM(amount*cost))*0.01 stock_cost
FROM stock_location
LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
WHERE location>=3000 AND amount>0 AND cost>0
GROUP BY location
ORDER BY NULL
) stock USING(job_id)
LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;
I read this: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html but don't see how/if I can apply anything therein.
For testing purposes, I have tried adding all sorts of indices on fields left, right and centre with no improvement to the EXPLAIN output:
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| 1 | PRIMARY | jobs | ALL | NULL | NULL | NULL | NULL | 7088 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5038 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 6476 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 904 | |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 531 | |
| 1 | PRIMARY | sites | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.jobs.site_id | 1 | |
| 5 | DERIVED | stock_location | ALL | stock,location,amount,…| NULL | NULL | NULL | 5426 | Using where; Using temporary; |
| 5 | DERIVED | stock_items | eq_ref | PRIMARY | PRIMARY | 4 | bestbee_db.stock_location.stock_id | 1 | Using where |
| 4 | DERIVED | purchaseorders | ALL | NULL | NULL | NULL | NULL | 1445 | Using temporary; |
| 3 | DERIVED | users_events | ALL | type,type_job | NULL | NULL | NULL | 11295 | Using where; Using temporary; |
| 2 | DERIVED | invoices | ALL | NULL | NULL | NULL | NULL | 5320 | Using temporary; |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
The rows produced is 5 x 10^21 (down from 3 x 10^42 before I started optimising this query!)
It currently takes seven seconds to execute (down from 26) but I would like that to be under one second.
By the way: GROUP BY x ORDER BY NULL is a great way to eliminate unnecessary filesorts from subqueries! (from http://www.mysqlperformanceblog.com/2006/09/04/group_concat-useful-group-by-extension/)
Based on your comment to my question, I would do the following...
At the very top...
SELECT STRAIGHT_JOIN (just add the "STRAIGH_JOIN" keyword)
Then, for each of your subqueries for invoices, events, p/o's, etc, change the ORDER BY to the JOB_ID explicitly so it might help the optimization against the primary JOBS table join.
Finally, ensure each of your subquery tables HAS an index on the Job_ID (Invoices, User_events, PurchaseOrders, Stock_Location)
Additionally, for the Stock_Location table, you might want to help the WHERE clause for your subquery by having a compound index on
(job_id, location, amount) Three fields deep should be enough even though you have the key plus 3 where condition elements.

Currently using View, Should I use a hard table instead?

I am currently debating whether my table, mapping_uGroups_uProducts, which is a view formed by the following table:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost`
SQL SECURITY DEFINER VIEW `db`.`mapping_uGroups_uProducts`
AS select distinct `X`.`upID` AS `upID`,`Z`.`ugID` AS `ugID` from
((`db`.`mapping_uProducts_Products` `X` join `db`.`productsInfo` `Y`
on((`X`.`pID` = `Y`.`pID`))) join `db`.`mapping_uGroups_Groups` `Z`
on((`Y`.`gID` = `Z`.`gID`)));
My current query is:
SELECT upID FROM uProductsInfo \
JOIN fs_uProducts USING (upID) column \
JOIN mapping_uGroups_uProducts USING (upID) -- could be faster if we use hard table and index \
JOIN mapping_fs_key USING (fsKeyID) \
WHERE fsName="OVERALL" \
AND ugID=1 \
ORDER BY score DESC \
LIMIT 0,30;
which is pretty slow. (for 30 results, it requires about 10 secondes). I think the reason for my query being so slow is definitely due to the fact that that particular query relies on a VIEW which has no index to speed things up.
+----+-------------+----------------+--------+----------------+---------+---------+---------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+--------+----------------+---------+---------+---------------------------------------+-------+---------------------------------+
| 1 | PRIMARY | mapping_fs_key | const | PRIMARY,fsName | fsName | 386 | const | 1 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 19706 | Using where |
| 1 | PRIMARY | uProductsInfo | eq_ref | PRIMARY | PRIMARY | 4 | mapping_uGroups_uProducts.upID | 1 | Using index |
| 1 | PRIMARY | fs_uProducts | ref | upID | upID | 4 | db.uProductsInfo.upID | 221 | Using where |
| 2 | DERIVED | X | ALL | PRIMARY | NULL | NULL | NULL | 40772 | Using temporary |
| 2 | DERIVED | Y | eq_ref | PRIMARY | PRIMARY | 4 | db.X.pID | 1 | Distinct |
| 2 | DERIVED | Z | ref | PRIMARY | PRIMARY | 4 | db.Y.gID | 2 | Using index; Distinct |
+----+-------------+----------------+--------+----------------+---------+---------+---------------------------------------+-------+---------------------------------+
7 rows in set (0.48 sec)
The explain here looks pretty cryptic, and I don't know whether I should drop view and write a script to just insert everything in the view to a hard table. ( obviously, it will lose the flexibility of the view since the mapping changes quite frequently).
Does anyone have any idea to how I can optimize my schema better?
You current plan uses the view as a driven table: it is scanned for each record in mapping_fs_key with fsName = 'OVERALL'
You could replace the view with this function:
SELECT upID FROM uProductsInfo
JOIN fs_uProducts USING (upID)
JOIN mapping_fs_key USING (fsKeyID)
WHERE fsName='OVERALL'
AND upID IN
(
SELECT upID
FROM mapping_uGroups_Groups Z
JOIN productsInfo Y
ON y.gID = z.gID
JOIN mapping_uProducts_Products X
ON x.pID = y.pID
WHERE z.ugID = 1
)
ORDER BY
score DESC
LIMIT 0,30