I have a query, which is not operating on a lot of data (IMHO) but takes a number of minutes (5-10) to execute and ends up filling the /tmp space (takes up to 20GB) while executing. Once it's finished the space is freed again.
The query is as follows:
SELECT c.name, count(b.id), c.parent_accounting_reference, o.contract, a.contact_person, a.address_email, a.address_phone, a.address_fax, concat(ifnull(concat(a.description, ', '),''), ifnull(concat(a.apt_unit, ', '),''), ifnull(concat(a.preamble, ', '),''), ifnull(addr_entered,'')) FROM
booking b
join visit v on (b.visit_id = v.id)
join super_booking s on (v.super_booking_id = s.id)
join customer c on (s.customer_id = c.id)
join address a on (a.customer_id = c.id)
join customer_number cn on (cn.customer_numbers_id = c.id)
join number n on (cn.number_id = n.id)
join customer_email ce on (ce.customer_emails_id = c.id)
join email e on (ce.email_id = e.id)
left join organization o on (o.accounting_reference = c.parent_accounting_reference)
left join address_type at on (a.type_id = at.id and at.name_key = 'billing')
where s.company_id = 1
and v.expected_start_date between '2015-01-01 00:00:00' and '2015-02-01 00:00:00'
group by s.customer_id
order by count(b.id) desc
And the explain plan for the same is:
+----+-------------+-------+--------+--------------------------------------------------------------+---------------------+---------+--------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------------------------+---------------------+---------+--------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | s | ref | PRIMARY,FKC4F8739580E01B03,FKC4F8739597AD73B1 | FKC4F8739580E01B03 | 9 | const | 74088 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ce | ref | FK864C4FFBAF6458E3,customer_emails_id,customer_emails_id_2 | customer_emails_id | 9 | id_dev.s.customer_id | 1 | Using where |
| 1 | SIMPLE | cn | ref | FK530F62CA30E87991,customer_numbers_id,customer_numbers_id_2 | customer_numbers_id | 9 | id_dev.ce.customer_emails_id | 1 | Using where |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.s.customer_id | 1 | |
| 1 | SIMPLE | e | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.ce.email_id | 1 | Using index |
| 1 | SIMPLE | n | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.cn.number_id | 1 | Using index |
| 1 | SIMPLE | v | ref | PRIMARY,FK6B04D4BEF4FD9A | FK6B04D4BEF4FD9A | 8 | id_dev.s.id | 1 | Using where |
| 1 | SIMPLE | b | ref | FK3DB0859E1684683 | FK3DB0859E1684683 | 8 | id_dev.v.id | 1 | Using index |
| 1 | SIMPLE | o | ref | org_acct_reference | org_acct_reference | 767 | id_dev.c.parent_accounting_reference | 1 | |
| 1 | SIMPLE | a | ref | FKADDRCUST,customer_address_idx | FKADDRCUST | 9 | id_dev.c.id | 256 | Using where |
| 1 | SIMPLE | at | eq_ref | PRIMARY | PRIMARY | 8 | id_dev.a.type_id | 1 | |
+----+-------------+-------+--------+--------------------------------------------------------------+---------------------+---------+--------------------------------------+-------+----------------------------------------------+
It appears to be using the correct indexes and such so I can't understand why the large usage of /tmp and long execution time.
Your query uses a temporary table, which you can see by the Using temporary; note in the EXPLAIN result. Your MySQL settings are probably configured to use /tmp to store temporary tables.
If you want to optimize the query further, you should probably investigate why the temporary table is needed at all. The best way to do that is gradually simplifying the query until you figure out what is causing it. In this case, probably just the amount of rows needed to be processed, so if you really do need all this data, you probably need the temp table too. But don't give up on optimizing on my account ;)
By the way, on another note, you might want to look into COALESCE for handling NULL values.
You're stuck with a temporary table, because you're doing an aggregate query and then ordering it by one of the results in the aggregate. Your optimizing goal should be to reduce the number of rows and/or columns in that temporary table.
Add an index on visit.expected_start_date. This may help MySQL satisfy your query more quickly, especially if your visit table has many rows that lie outside the date range in your query.
It looks like you're trying to find the customers with the most bookings in a particular date range.
So, let's start with a subquery to summarize the least amount of material from your database.
SELECT count(*) booking_count, s.customer_id
FROM visit v
JOIN super_booking s ON v.super_booking_id = s.id
JOIN booking b ON v.id = b.visit_id
WHERE v.expected_start_date <= '2015-01-01 00:00:00'
AND v.expected_start_date > '2015-02-01 00:00:00'
AND s.company_id = 1
GROUP BY s.customer_id
This gives back a list of booking counts and customer ids for the date range and company id in question. It will be pretty efficient, especially if you put an index on expected_start_date in the visit table
Then, let's join that subquery to the one that pulls out all that information you need.
SELECT c.name, booking_count, c.parent_accounting_reference,
o.contract,
a.contact_person, a.address_email, a.address_phone, a.address_fax,
concat(ifnull(concat(a.description, ', '),''),
ifnull(concat(a.apt_unit, ', '),''),
ifnull(concat(a.preamble, ', '),''),
ifnull(addr_entered,''))
FROM (
SELECT count(*) booking_count, s.customer_id
FROM visit v
JOIN super_booking s ON v.super_booking_id = s.id
JOIN booking b ON v.id = b.visit_id
WHERE v.expected_start_date <= '2015-01-01 00:00:00'
AND v.expected_start_date > '2015-02-01 00:00:00'
AND s.company_id = 1
GROUP BY s.customer_id
) top
join customer c on top.customer_id = c.id
join address a on (a.customer_id = c.id)
join customer_number cn on (cn.customer_numbers_id = c.id)
join number n on (cn.number_id = n.id)
join customer_email ce on (ce.customer_emails_id = c.id)
join email e on (ce.email_id = e.id)
left join organization o on (o.accounting_reference = c.parent_accounting_reference)
left join address_type at on (a.type_id = at.id and at.name_key = 'billing')
order by booking_count DESC
That should speed your work up a whole bunch, by reducing the size of the data you need to summarize.
Note: Beware the trap in date BETWEEN this AND that. You really want
date >= this
AND date < that
because BETWEEN means
date >= this
AND date <= that
Related
I have a select query, that selects over 50k records from MySQL 5.5 database at once, and this amount is expected to grow. The query contains multiple subquery which is taking over 120s to execute.
Initially some of the sale_items and stock tables didn't have more that the ID keys, so I added some more:
SELECT
`p`.`id` AS `id`,
`p`.`Name` AS `Name`,
`p`.`Created` AS `Created`,
`p`.`Image` AS `Image`,
`s`.`company` AS `supplier`,
`s`.`ID` AS `supplier_id`,
`c`.`name` AS `category`,
IFNULL((SELECT
SUM(`stocks`.`Total_Quantity`)
FROM `stocks`
WHERE (`stocks`.`Product_ID` = `p`.`id`)), 0) AS `total_qty`,
IFNULL((SELECT
SUM(`sale_items`.`quantity`)
FROM `sale_items`
WHERE (`sale_items`.`product_id` = `p`.`id`)), 0) AS `total_sold`,
IFNULL((SELECT
SUM(`sale_items`.`quantity`)
FROM `sale_items`
WHERE ((`sale_items`.`product_id` = `p`.`id`) AND `sale_items`.`Sale_ID` IN (SELECT
`refunds`.`Sale_ID`
FROM `refunds`))), 0) AS `total_refund`
FROM ((`products` `p`
LEFT JOIN `cats` `c`
ON ((`c`.`ID` = `p`.`cat_id`)))
LEFT JOIN `suppliers` `s`
ON ((`s`.`ID` = `p`.`supplier_id`)))
This is the explain result
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 20981 | |
| 2 | DERIVED | p | ALL | NULL | NULL | NULL | NULL | 20934 | |
| 2 | DERIVED | c | eq_ref | PRIMARY | PRIMARY | 4 | p.cat_id | 1 | |
| 2 | DERIVED | s | eq_ref | PRIMARY | PRIMARY | 4 | p.supplier_id | 1 | |
| 5 | DEPENDENT SUBQUERY | sale_items | ref | sales_items_product_id | sales_items_product_id | 5 | p.id | 33 | Using where |
| 6 | DEPENDENT SUBQUERY | refunds | index_subquery | IDX_refunds_sale_id | IDX_refunds_sale_id | 5 | func | 1 | Using index; Using where |
| 4 | DEPENDENT SUBQUERY | sale_items | ref | sales_items_product_id | sales_items_product_id | 5 | p.id | 33 | Using where |
| 3 | DEPENDENT SUBQUERY | stocks | ref | IDX_stocks_product_id | IDX_stocks_product_id | 5 | p.id | 1 | Using where |
+----+--------------------+------------+----------------+------------------------+------------------------+---------+---------------------------------
I am expecting that the query takes less that 3s at most, but I can't seem to figure out the best way to optimize this query.
The query looks fine to me. You select all data and aggregate some of it. This takes time. Your explain plan shows there are indexes on the IDs, which is good. And at a first glance there is not much we seem to be able to do here...
What you can do, though, is provide covering indexes, i.e. indexes that contain all columns you need from a table, so the data can be taken from the index directly.
create index idx1 on cats(id, name);
create index idx2 on suppliers(id, company);
create index idx3 on stocks(product_id, total_quantity);
create index idx4 on sale_items(product_id, quantity, sale_id);
This can really boost your query.
What you can try About the query itself is to move the subqueries to the FROM clause. MySQL's optimizer is not great, so although it should get the same execution plan, it may well be that it favors the FROM clause.
SELECT
p.id,
p.name,
p.created,
p.image,
s.company as supplier,
s.id AS supplier_id,
c.name AS category,
COALESCE(st.total, 0) AS total_qty,
COALESCE(si.total, 0) AS total_sold,
COALESCE(si.refund, 0) AS total_refund
FROM products p
LEFT JOIN cats c ON c.id = p.cat_id
LEFT JOIN suppliers s ON s.id = p.supplier_id
LEFT JOIN
(
SELECT SUM(total_quantity) AS total
FROM stocks
GROUP BY product_id
) st ON st.product_id = p.id
LEFT JOIN
(
SELECT
SUM(quantity) AS total,
SUM(CASE WHEN sale_id IN (SELECT sale_id FROM refunds) THEN quantity END) as refund
FROM sale_items
GROUP BY product_id
) si ON si.product_id = p.id;
(If sale_id is unique in refunds, then you can even join it to sale_items. Again: this should usually not make a difference, but in MySQL it may still. MySQL was once notorious for treating IN clauses much worse than the FROM clause. This may not be the case anymore, I don't know. You can try - if refunds.sale_id is unique).
I have the following query which is working correctly, however it is running very poorly. I am suspecting that my issue is with the two comparison conditions in the INNER JOIN statement. Both of the fields have an index, but the query optimiser in MySQL seems to be ignoring them. Here is my query:
EDIT: Changed query to use the one suggested below by Gordon, as it has kept the same results but is performing faster. EXPLAIN statement is still not happy though, and the output is shown below.
SELECT a.id
FROM pc a INNER JOIN
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc FORCE INDEX (IDX_SEENDATE)
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) b
ON a.correction_value = b.correction_value AND
a.seenDate = b.mxdate INNER JOIN
cameras c
ON c.camera_id = a.camerauid
WHERE c.in_out = 0;
EXPLAIN
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2414394 | 100 | Using where; |
| | | | | | | | | | | | Using temporary; |
| | | | | | | | | | | | Using filesort |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 1 | PRIMARY | a | NULL | ref | correction_value, | idx_seenDate | 5 | b.mxdate | 1 | 3.8 | Using where |
| | | | | | idx_seenDate, | | | | | | |
| | | | | | fk_camera_idx | | | | | | |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 1 | PRIMARY | c | NULL | ALL | PRIMARY | NULL | NULL | NULL | 41 | 2.44 | Using where; |
| | | | | | | | | | | | Using join buffer (Block Nested Loop) |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
| 2 | DERIVED | pc | NULL | range | correction_value, | idx_seenDate | 5 | NULL | 2414394 | 100 | Using index Condition; |
| | | | | | idx_seenDate | | | | | | Using temporary; |
| | | | | | | | | | | | Using filesort |
+----+-------------+------------+------------+-------+-------------------+--------------+---------+----------+---------+----------+---------------------------------------+
How can the query be optimised but still have the same outcome?
Let's start by focusing on the subquery.
SELECT correction_value,
MAX(seenDate) mxdate
FROM pc
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
Please run that with twice, with
INDEX sc (seenDate, correction_value)
INDEX cs (correction_value, seenDate)
Please FORCE one index, then the other. Depending on what version of MySQL you are running, one of the indexes will work better than the other.
I think that later versions will prefer "cs" because it can leapfrog through the index very efficiently.
Once you have determined which composite index to use, then remove the FORCE and the unused index, then try the entire query. The same index should do fine for the combined query.
Since your task seems to involve a "groupwise max", I suggest you see if there are performance tips here: http://mysql.rjweb.org/doc.php/groupwise_max
Try this
SELECT
a.id
FROM pc a
INNER JOIN
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc
INNER JOIN cameras ON (cameras.camera_id = pc.camerauid AND cameras.in_out = 0)
WHERE pc.seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value) b ON (a.correction_value = b.correction_value AND a.seenDate = b.mxdate);
use index on pc.seenDate column.
I would start by writing the query as:
SELECT a.id
FROM pc a INNER JOIN
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) b
ON a.correction_value = b.correction_value AND
a.seenDate = b.mxdate INNER JOIN
cameras c
ON c.camera_id = a.camerauid
WHERE c.in_out = 0; - don't use single quotes if `in_out` is a number
The place to start with this query is to have indexes: pc(seendate, correction_value, seendate) and cameras(camera_id, in_out).
There may also be ways to rewrite the query, if that is not sufficient.
RDBMS uses the output of the first query as a input of the next query. So, if we look at the derived query, it is using a filter so we can use that as a first query, then join to the pc then join to the camera table.
Indexes: mentioned by Gordon Linof or pc(id, correction_value, seendate) and cameras(camera_id, in_out)
The final query could be rewrite as below:
SELECT a.id
--add any other column here, you want to show in the EXPLAINED output
FROM
(
SELECT id, correction_value, MAX(seenDate) mxdate
FROM pc
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) a
INNER JOIN pc b
ON a.correction_value = b.correction_value
AND a.seenDate = b.mxdate
INNER JOIN cameras c
ON c.camera_id = a.camerauid
WHERE c.in_out = 0;
From your question it is not clear how the tables are indexed, but in this subquery
(SELECT correction_value, MAX(seenDate) mxdate
FROM pc FORCE INDEX (IDX_SEENDATE)
WHERE seenDate BETWEEN '2017-03-01' AND '2017-04-01'
GROUP BY correction_value
) b
you want to have a composite index on both fields seenDate, correction_value:
CREATE INDEX seenCorr_ndx ON pc (seenDate, correction_value);
(you can drop any index on seenDate alone, and I expect you do not need the FORCE INDEX either).
You may end up needing two composite indexes, one with seenDate first, one with correction_value first.
I have a report that needs running to satisfy our reporting requirements for a government body. The report is supposed to return the study load for each student in each module for a given period of time.
For example the report needs to return the students enrolled in a given module for a given intake in a given year and semester, with a census date (a government specified date that after which the student is liable for the cost of the unit even if they withdraw)
So I've written this mysql query
SELECT
e.enrolstudent AS '313',
(SELECT c.ntiscode FROM course c WHERE c.courseid=ec.courseid) AS '307',
e.startdate as '534',
'AOU' as '333',
m.mod_eftsl as '339',
e.enrolmod as '354',
e.census_date as '489',
m.diciplinecode as '464',
(CASE
WHEN m.mode = 'Face to Face' THEN 1
WHEN m.mode = 'Online' THEN 2
WHEN m.mode = 'RPL' THEN 5
ELSE 3
END) AS '329',
'A6090' as '477',
up.citizen AS '358',
vf.maxcontribute as '392',
vf.studentstatus as '490',
vf.total_amount_charged as '384',
vf.amount_paid as '381',
vf.loan_fee as '529',
u.chessn as '488',
m.workexp as '337',
'0' as '390',
m.sumwinschool as '551',
vf.help_debt as '558'
FROM
enrolment e
INNER JOIN enrolcourse AS ec ON ec.studentid=e.enrolstudent
INNER JOIN vetfee AS vf ON vf.userid=e.enrolstudent
INNER JOIN users AS u ON u.userid = e.enrolstudent
INNER JOIN users_personal AS up ON up.userid = e.enrolstudent
INNER JOIN module AS m ON m.modshortname = e.enrolmod
WHERE
e.online_intake in (select oi.intakecode from online_intake oi where STR_TO_DATE(oi.censusdate,'%d-%m-%Y') > '2015-07-01' and STR_TO_DATE(oi.censusdate,'%d-%m-%Y') < '2015-09-31') AND
e.enrolstudent NOT LIKE '%onlinetutor%' AND
e.enrolstudent NOT LIKE '%tes%' AND
e.enrolstudent NOT like '%student%' AND
e.enrolrole = 'student'
ORDER BY e.enrolstudent;"
It seems to hang, I've left it running for an hour with no result. There's only 10189 records in th enrolment table, 1538 in enrolcourse,650 in module. I don't think its the number of records, I'm guessing I've just constructed my query wrong, first time using joins (other than natural). Any ideas or tips in improving this would be greatly appreciated.
select count(*) from enrolment;
+----------+
| count(*) |
+----------+
| 10189 |
+----------+
select count(*) from enrolcourse;
+----------+
| count(*) |
+----------+
| 1538 |
+----------+
select count(*) from vetfee;
+----------+
| count(*) |
+----------+
| 1538 |
+----------+
select count(*) from users;
+----------+
| count(*) |
+----------+
| 1249 |
+----------+
select count(*) from users_personal;
+----------+
| count(*) |
+----------+
| 941 |
+----------+
select count(*) from module;
+----------+
| count(*) |
+----------+
| 650 |
Here's the results of the EXPLAIN
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
| 1 | PRIMARY | m | ALL | NULL | NULL | NULL | NULL | 691 | Using temporary; Using filesort |
| 1 | PRIMARY | up | ALL | NULL | NULL | NULL | NULL | 987 | Using join buffer |
| 1 | PRIMARY | u | ALL | NULL | NULL | NULL | NULL | 1180 | Using where; Using join buffer |
| 1 | PRIMARY | ec | ALL | NULL | NULL | NULL | NULL | 1607 | Using where; Using join buffer |
| 1 | PRIMARY | e | ALL | NULL | NULL | NULL | NULL | 10629 | Using where; Using join buffer |
| 1 | PRIMARY | vf | ALL | NULL | NULL | NULL | NULL | 10959 | Using where; Using join buffer |
| 3 | DEPENDENT SUBQUERY | oi | ALL | NULL | NULL | NULL | NULL | 42 | Using where |
| 2 | DEPENDENT SUBQUERY | c | ALL | NULL | NULL | NULL | NULL | 23 | Using where |
+----+--------------------+-------+------+---------------+------+---------+------+-------+---------------------------------+
Get rid of those correlated subqueries. Use a join instead.
Also, use BETWEEN to reduce one STR_TO_DATE call
Finally, you should look at a way of eliminating all those LIKE calls.
SELECT
e.enrolstudent AS '313',
c.ntiscode AS '307',
e.startdate as '534',
'AOU' as '333',
m.mod_eftsl as '339',
e.enrolmod as '354',
e.census_date as '489',
m.diciplinecode as '464',
(CASE
WHEN m.mode = 'Face to Face' THEN 1
WHEN m.mode = 'Online' THEN 2
WHEN m.mode = 'RPL' THEN 5
ELSE 3
END) AS '329',
'A6090' as '477',
up.citizen AS '358',
vf.maxcontribute as '392',
vf.studentstatus as '490',
vf.total_amount_charged as '384',
vf.amount_paid as '381',
vf.loan_fee as '529',
u.chessn as '488',
m.workexp as '337',
'0' as '390',
m.sumwinschool as '551',
vf.help_debt as '558'
FROM
enrolment e
INNER JOIN enrolcourse AS ec ON ec.studentid=e.enrolstudent
INNER JOIN course AS c ON c.courseid = ec.courseid
INNER JOIN vetfee AS vf ON vf.userid=e.enrolstudent
INNER JOIN users AS u ON u.userid = e.enrolstudent
INNER JOIN users_personal AS up ON up.userid = e.enrolstudent
INNER JOIN module AS m ON m.modshortname = e.enrolmod
INNER JOIN online_intake oi ON oi.intakecode = e.online_intake
AND STR_TO_DATE(oi.censusdate, '%d-%m-%Y') BETWEEN '2015-07-01' AND '2015-09-31'
WHERE e.enrolstudent NOT LIKE '%onlinetutor%'
AND e.enrolstudent NOT LIKE '%tes%'
AND e.enrolstudent NOT like '%student%'
AND e.enrolrole = 'student'
ORDER BY e.enrolstudent;
Given your posted EXPLAIN output, you'll also want to add the following indexes:
ALTER TABLE enrolment
ADD INDEX (enrolstudent),
ADD INDEX (enrolmod),
ADD INDEX (online_intake);
ALTER TABLE enrolcourse
ADD INDEX (studentid),
ADD INDEX (courseid);
ALTER TABLE course
ADD INDEX (courseid);
ALTER TABLE vetfee
ADD INDEX (userid);
ALTER TABLE users
ADD INDEX (userid);
ALTER TABLE users_personal
ADD INDEX (userid);
ALTER TABLE module
ADD INDEX (modshortname);
ALTER TABLE online_intake
ADD INDEX (intakecode);
below is a query I use to get the latest record per serverID unfortunately this query does take endless to process. According to the stackoverflow question below it should be a very fast solution. Is there any way to speed up this query or do I have to split it up? (first get all serverIDs than get the last record for each server)
Retrieving the last record in each group
SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
FROM stats_server s1
LEFT JOIN stats_server s2
ON (s1.serverID = s2.serverID AND s1.id < s2.id)
INNER JOIN server s
ON s1.serverID=s.id
INNER JOIN modpack m
ON s.modpack=m.id
WHERE s2.id IS NULL
ORDER BY m.id
15 rows in set (34.73 sec)
Explain:
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+
1 row in set, 1 warning (0.00 sec)
Sample Output:
+-------------+---------------+----------+---------------+-------------------------+--------+
| performance | playersOnline | serverID | name | modpack | color |
+-------------+---------------+----------+---------------+-------------------------+--------+
| 99 | 18 | 15 | hub | Lobby | AAAAAA |
| 98 | 12 | 10 | horizons | Horizons | AA00AA |
| 97 | 6 | 11 | m_lobby | Monster | AA0000 |
| 99 | 1 | 12 | m_north | Monster | AA0000 |
| 86 | 10 | 13 | m_south | Monster | AA0000 |
| 87 | 17 | 14 | m_east | Monster | AA0000 |
| 98 | 10 | 16 | m_west | Monster | AA0000 |
| 84 | 7 | 5 | tppi | Test Pack Please Ignore | 55FFFF |
| 95 | 15 | 6 | agrarian_plus | Agrarian Skies | 00AA00 |
| 98 | 23 | 7 | agrarian2 | Agrarian Skies | 00AA00 |
| 74 | 18 | 9 | agrarian | Agrarian Skies | 00AA00 |
| 97 | 37 | 17 | agrarian3 | Agrarian Skies | 00AA00 |
| 99 | 17 | 3 | bteam_pvp | Attack of the B-Team | FFAA00 |
| 73 | 44 | 8 | bteam_pve | Attack of the B-Team | FFAA00 |
| 93 | 11 | 4 | crackpack | Crackpack | EFEFEF |
+-------------+---------------+----------+---------------+-------------------------+--------+
15 rows in set (38.49 sec)
Sample Data:
http://www.mediafire.com/download/n0blj1io0c503ig/mym_bridge.sql.bz2
Edit
Ok I solved it. Here is expanded rows showing your original slow query:
And here is a fast query using MAX() with GROUP BY that gives the identical results. Please try it for yourself.
SELECT s1.id
,s1.performance
,s1.playersOnline
,s1.serverID
,s.name
,m.modpack
,m.color
FROM stats_server s1
JOIN (
SELECT MAX(id) as 'id'
FROM stats_server
GROUP BY serverID
) AS s2
ON s1.id = s2.id
JOIN server s
ON s1.serverID = s.id
JOIN modpack m
ON s.modpack = m.id
ORDER BY m.id
I would phrase this query using not exists:
SELECT ss.performance, ss.playersOnline, ss.serverID, s.name, m.modpack, m.color
FROM stats_server ss INNER JOIN
server s
ON ss.serverID = s.id INNER JOIN
modpack m
ON s.modpack = m.id
WHERE NOT EXISTS (select 1
from stats_server ss2
where ss2.serverID = ss.serverID AND ss2.id > ss.id
)
Apart from the primary key indexes on server and modpack (which I assume are there), you also want an index on stats_server(ServerId, id). This index should also help your version of the query.
Am I missing something? Why wouldn't a standard uncorrelated subquery work?
SELECT x.id, x.performance, x.playersOnline, s.name, m.modpack, m.color, x.timestamp
FROM stats_server x
JOIN
( SELECT serverid, MAX(id) maxid FROM stats_server GROUP BY serverid ) y
ON y.serverid = x.serverid AND y.maxid = x.id
JOIN server s
ON x.serverID=s.id
JOIN modpack m
ON s.modpack=m.id
I'm guessing that you really want this (notice the order of the joins and the join criteria), and this matches the indexes that you've created:
SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
FROM server s
INNER JOIN stats_server s1
ON s1.serverID = s.id
LEFT JOIN stats_server s2
ON s2.serverID = s.id AND s2.id > s1.id
INNER JOIN modpack m
ON m.id = s.modpack
WHERE s2.id IS NULL
ORDER BY m.id
MySQL doesn't always inner join the tables in the order that you write them in the query since the order doesn't really matter for the result set (though it can affect index use).
With no usable index specified in the WHERE clause, MySQL might want to start with the table with the least number of rows (maybe stats_server in this case). With the ORDER BY clause, MySQL might want to start with modpack so it doesn't have to order the results later.
MySQL picks the execution plan then sees if it has the proper index for joining rather than seeing what indexes it has to join on then picking the execution plan. MySQL doesn't just automatically pick the plan that matches your indexes.
STRAIGHT_JOIN tells MySQL in what order to join the tables so that it uses the indexes that you expect it to use:
SELECT s1.performance, s1.playersOnline, s1.serverID, s.name, m.modpack, m.color
FROM server s
STRAIGHT_JOIN stats_server s1
ON s1.serverID = s.id
LEFT JOIN stats_server s2
ON s2.serverID = s.id AND s2.id > s1.id
STRAIGHT_JOIN modpack m
ON m.id = s.modpack
WHERE s2.id IS NULL
ORDER BY m.id
I don't know what indexes you've defined since you've not provided an EXPLAIN result or shown your indexes, but this should give you some idea on how to improve the situation.
Given this result-set:
mysql> EXPLAIN SELECT c.cust_name, SUM(l.line_subtotal) FROM customer c
-> JOIN slip s ON s.cust_id = c.cust_id
-> JOIN line l ON l.slip_id = s.slip_id
-> JOIN vendor v ON v.vend_id = l.vend_id WHERE v.vend_name = 'blahblah'
-> GROUP BY c.cust_name
-> HAVING SUM(l.line_subtotal) > 49999
-> ORDER BY c.cust_name;
+----+-------------+-------+--------+---------------------------------+---------------+---------+----------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------------+---------+----------------------+------+----------------------------------------------+
| 1 | SIMPLE | v | ref | PRIMARY,idx_vend_name | idx_vend_name | 12 | const | 1 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | l | ref | idx_vend_id | idx_vend_id | 4 | csv_import.v.vend_id | 446 | |
| 1 | SIMPLE | s | eq_ref | PRIMARY,idx_cust_id,idx_slip_id | PRIMARY | 4 | csv_import.l.slip_id | 1 | |
| 1 | SIMPLE | c | eq_ref | PRIMARY,cIndex | PRIMARY | 4 | csv_import.s.cust_id | 1 | |
+----+-------------+-------+--------+---------------------------------+---------------+---------+----------------------+------+----------------------------------------------+
4 rows in set (0.04 sec)
I'm a bit baffled as to why the query referenced by this EXPLAIN statement is still taking about a minute to execute. Isn't it true that this query only has to search through 449 rows? Anyone have any idea as to what could be slowing it down so much?
I think the having sum() is the root of all evil. This forces to mysql to make two joins (from customer to slip and then to line) to get the value of the sum. After this it has to retrieve all the data to properly filter by the sum() value to get a meaningful result.
It might be optimized to the following and probably get better response times:
select c.cust_name,
grouping.line_subtotal
from customer c join
(select c.cust_id,
l.vend_id,
sum(l.line_subtotal) as line_subtotal
from slip s join line l on s.slip_id = l.slip_id
group by c.cust_id, l.vend_id) grouping
on c.cust_id = grouping.cust_id
join vendor v on v.vend_id = grouping.vend_id
where v.vend_name = 'blablah'
and grouping.line_subtotal > 499999
group by c.cust_name
order by c.cust_name;
In other words, create a sub-select that does all the necessary grouping before making the real query.
You can run your select vendor query first, and then join the results with the rest:
SELECT c.cust_name, SUM(l.line_subtotal) FROM customer c
-> JOIN slip s ON s.cust_id = c.cust_id
-> JOIN line l ON l.slip_id = s.slip_id
-> JOIN (SELECT * FROM vendor WHERE vend_name='blahblah') v ON v.vend_id = l.vend_id
-> GROUP BY c.cust_name
-> HAVING SUM(l.line_subtotal) > 49999
-> ORDER BY c.cust_name;
Also, do vend_name and/or cust_name have an index? That might be an issue here.