MySQL query with join optimization - mysql

I got a query:
SELECT a.nick,grp,count(*) FROM help_mails h JOIN accounts a ON h.helper=a.id WHERE closed=1 GROUP BY helper, grp, a.nick
What is wrong with this join?
When I made 2 queries:
SELECT helper,grp,count(*) FROM help_mails h WHERE closed=1 GROUP BY helper, grp;
SELECT nick FROM accounts WHERE id IN (...)
It is 100 times faster.
EXPLAIN returns this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE h ref closed closed 1 const 1846 Using temporary; Using filesort
1 SIMPLE a ref PRIMARY PRIMARY 4 margonem.h.helper 1 Using where; Using index
accounts.id, help_mails.grp and help_mails.closed got indexes.

Note that your first query is not same as the second ones.
If you have same NICK for two account's, COUNT(*)'s for these accounts will be merged together in the first query and returned separately in the second one.
If you want separate COUNT's for separate account's to be always returned, you may combine your queries into one:
SELECT a.nick, gpr, cnt
FROM (
SELECT helper, grp, COUNT(*) AS cnt
FROM help_mails h
WHERE closed = 1
GROUP BY
helper, grp
) ho
JOIN accounts a
ON a.id = ho.helper
or change a GROUP BY condition for the first query:
SELECT a.nick, grp, count(*)
FROM help_mails h
JOIN accounts a
ON h.helper = a.id
WHERE closed = 1
GROUP BY
helper, grp, a.id, a.nick
Building a composite index on help_mails (closed, helper, grp) will help you a lot, since it will be used in GROUP BY.

It looks like what's wrong is that help_mails.helper isn't indexed.

Related

Super slow SQL query when `WHERE` and `OR` are used together [duplicate]

The following query takes mysql to execute almost 7 times longer than implementing the same using two separate queries, and avoiding OR on the WHERE statement. I prefer using a single query as I can sort and group everything.
Here is the problematic query:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1'
OR posts.user_id='7135');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ALL user_id NULL NULL NULL 169642
1 SIMPLE teams_users eq_ref PRIMARY PRIMARY 8 posts.team_id,const 1 Using where
Now if I do the following two queries instead, the aggregate execution time, as said, is shorter by 7 times:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE teams_users ref PRIMARY,status status 1 const 5822 Using where
1 SIMPLE posts ref team_id team_id 5 teams_users.team_id 9 Using where
and:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (posts.user_id='7135');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ref user_id user_id 4 const 142
1 SIMPLE teams_users eq_ref PRIMARY PRIMARY 8 posts.team_id,const 1
Obviously the amount of scanned rows is much lower on the two queries.
Why is the initial query slow?
Thanks.
Yes, OR is frequently a performance-killer. A common work-around is to do UNION. For your example:
SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1')
UNION DISTINCT
SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (posts.user_id='7135');
If you are sure there are not dups, change to the faster UNION ALL.
If you are not fishing for missing team_users rows, use JOIN instead of LEFT JOIN.
If you need ORDER BY, add some parens:
( SELECT ... )
UNION ...
( SELECT ... )
ORDER BY ...
Otherwise, the ORDER BY would apply only to the second SELECT. (If you also need 'pagination', see my blog .)
Please note that you might also need LIMIT in certain circumstances.
The queries without the OR clause are both sargable. That is, they both can be satisfied using indexes.
The query with the OR would be sargable if the MySQL query planner contained logic to figure out it can rewrite it as the UNION ALL of two queries. By the MySQL query planner doesn't (yet) have that kind of logic.
So, it does table scans to get the result set. Those are often very slow.

How to get rows even if count is 0? Only 1 table

I want the count even if the count is 0. My current query is
SELECT `id`,count(0) as `fetchpc` FROM `user` WHERE pid in('4,6,7,8') GROUP BY `id`
But it returns only those id where count is greater than 0
Edit:
the values used for in('4,6,7,8') are first fetched from database in another query. And then using a script rows are converted to 4,6,7,8.
So all the values are present in the database.
Also it is possible that the values returned can go upto 100+ values.
You could left join this query on a "fictive" query that queries these IDs as literals:
SELECT ids.id, COALESCE(cnt, 0)
FROM (SELECT 4 AS id
UNION ALL
SELECT 6 AS id
UNION ALL
SELECT 7 AS id
UNION ALL
SELECT 8 AS id) ids
LEFT JOIN (SELECT id, COUNT(*) AS cnt
FROM fetchpc
GROUP BY id) t ON t.id = ids.id
You can use a derived table. I would recommend:
SELECT i.id, COUNT(u.id) as fetchpc
FROM (SELECT 4 as id UNION ALL
SELECT 6 as id UNION ALL
SELECT 7 as id UNION ALL
SELECT 8 as id
) i LEFT JOIN
`user` u
ON u.id = i.id
GROUP BY i.id;
From a performance perspective, this is much better than aggregating first (in a subquery) and then joining. Basically, the aggregation (in that case) has to aggregate all the data and afterwards filter out the unnecessary rows.
This formulation filters the rows first, which should speed the aggregation.

MySQL - Slow Query when kept a View

In MySQL, I have a simple join between 2 tables. Something like
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
where a.id=12345
group by a.id
It runs normal as a query. But when I keep the query
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
group by a.id
in a view called view_ab, the view takes enormous amount of time when i run the following query on the view.
select * from view_ab where id = 12345
Both these tables are large tables. Unable to figure out the reason for such a drop in performance. Please help resolve this performance issue
EDIT:
This is the view SQL
CREATE VIEW view_ab AS SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid GROUP BY r.drid;
This is the query
SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
This is the query on the VIEW
SELECT * FROM view_ab WHERE drid=12718651;
Execution plan of the view
EXPLAIN EXTENDED SELECT * FROM view_ab WHERE drid=12718651;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
(NULL)
ref
4
const
10
100.00
(NULL)
2
DERIVED
s
(NULL)
ALL
idx_tbl_deliverroute_sku_drid
(NULL)
(NULL)
(NULL)
15060913
100.00
USING TEMPORARY; USING filesort
2
DERIVED
r
(NULL)
eq_ref
PRIMARY,FK_tbl_deliveryroute_1
PRIMARY
4
humdemotest.s.drid
1
100.00
USING INDEX
EXPLAIN EXTENDED SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
r
(NULL)
const
PRIMARY
PRIMARY
4
const
1
100.00
USING INDEX
1
SIMPLE
s
(NULL)
ref
idx_tbl_deliverroute_sku_drid
idx_tbl_deliverroute_sku_drid
4
const
22
100.00
(NULL)
From what I am seeing, you don't even need a join since you are dealing with a join on the same key column from A-B, the key already exists in table B, just query group by that. Also, I would have an index on your DeliveryRoute_SKU on its route ID column
SELECT
s.drid,
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651
group by
s.drID;
Since you are only doing the key and the sum, you don't even NEED the other table. Now if you needed other columns from the first table OTHER THAN the key, then yes, you would need the join. You could even simplify a step further since you are only querying a single key ID
SELECT
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651;
The reason the view is slow is simple. You are executing:
SELECT *
FROM view_ab
WHERE drid = 12718651;
What you want to execute is:
select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
where a.id = 12345
group by a.id;
What is actually being executed is:
select ab.*
from (select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
group by a.id
) ab
where ab.id = 12345;
That is, the entire aggregation is performed first. Then the where is applied. What you want is for the predicate to be pushed up (MySQL calls this merging). You can review the documentation on this subject.
One solution would seem to be rephrasing the query as a correlated subquery:
select a.id,
(select sum(b.qty) from b where b.id = a.id) as qty
from a
where a.id = 12345;
Alas, subqueries in the select have the same effect, so this doesn't work.
I don't know of a solution using a view. You can avoid using views for this. The ultimate solution would be to implement a trigger to store the summarized results in another table -- effectively materializing the view.

Optimize Subquery in Join

I have the following query:
SELECT *
FROM s
JOIN b ON s.borrowerId = b.id
JOIN (
SELECT MIN(id) AS id
FROM tbl
WHERE dealId IS NULL
GROUP BY borrowerId, created
) s2 ON s.id = s2.id
Is there a simple way to optimize this so that I can do the JOIN directly and utilize indexes?
UPDATE
The created field is part of the GROUP BY statement because due to the limitations of our version of MySQL and the ORM being used it is possible to have multiple records with the same created timestamp value. As a result I need to find the first record for each combination of borrowerId and created.
Typically I might attempt something like this:
SELECT *
FROM s
INNER JOIN b ON s.borrowerId = b.id
LEFT OUTER JOIN s2
ON s.borrowerId = s2.borrowerId
AND s.created = s2.created
AND s.id <> s2.id
AND s.id < s2.id
WHERE s2.id IS NULL
AND s.dealId IS NULL;
But I'm not sure if that works 100% the way I want.
EXPLAIN from MySQL outputs the following:
1 PRIMARY b ALL NULL NULL NULL NULL 129690
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 317751 Using join buffer
1 PRIMARY s eq_ref PRIMARY,borrowerId_2,borrowerId PRIMARY 4 s2.id 1 Using where
2 DERIVED statuses ref dealId dealId 5 183987 Using where; Using temporary; Using filesort
As you can see, it has to query a massive number of records to build the subquery data set and when joining to the derived subquery, no indexes are found and so no indexes are used.
The first query needs this composite index:
INDEX(borrowerId, created, id)
Note that MySQL rarely uses two indexes for one SELECT, but a composite index is often very handy.
The second query seems grossly inefficient.
Please provide SHOW CREATE TABLE for each table.

MySQL query optimisation with group by and order by rand

I have a problem with the following query which is very slow :
SELECT A.* FROM B
INNER JOIN A ON A.id=B.fk_A
WHERE A.creationDate BETWEEN '20120309' AND '20120607'
GROUP BY A.id
ORDER BY RAND()
LIMIT 0,5
EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE B index fk_A fk_A 4 \N 58962 Using index; Using temporary; Using filesort
1 SIMPLE A eq_ref PRIMARY,creationDate PRIMARY 4 B.fk_A 1 Using where
INDEXES :
A.id (int) = PRIMARY index
A.creationDate (date) = index
B.fk_A = index
Do you see something to optimize ?
Thanks a lot for your advice
I think the RAND() function will create a Rand() value for every row (this is why the using temporary shows up, and filesort because it can't use an index.
the best way would be to SELECT MAX(id) FROM a to get the max value.
then create 5 random numbers between 1 and MAX(id) and do a SELECT ... WHERE a.id IN (...) query.
If the result has fewer than 5 rows (because a record has been deleted) repeat the procedure until you are fine (or initially create 100 random numbers and LIMIT the query to 5.
That is not a 100% mysql solution, because you have to do the logic in your code, but will be much faster I believe.
Update
Just Found an interesting article in the net, that basically tells the same: http://akinas.com/pages/en/blog/mysql_random_row/
One possible rewriting of the query:
SELECT A.*
FROM A
WHERE A.creationDate BETWEEN '20120309' AND '20120607'
AND EXISTS
( SELECT *
FROM B
WHERE A.id = B.fk_A
)
ORDER BY RAND()
LIMIT 0,5