The following query takes mysql to execute almost 7 times longer than implementing the same using two separate queries, and avoiding OR on the WHERE statement. I prefer using a single query as I can sort and group everything.
Here is the problematic query:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1'
OR posts.user_id='7135');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ALL user_id NULL NULL NULL 169642
1 SIMPLE teams_users eq_ref PRIMARY PRIMARY 8 posts.team_id,const 1 Using where
Now if I do the following two queries instead, the aggregate execution time, as said, is shorter by 7 times:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE teams_users ref PRIMARY,status status 1 const 5822 Using where
1 SIMPLE posts ref team_id team_id 5 teams_users.team_id 9 Using where
and:
EXPLAIN SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (posts.user_id='7135');
Result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ref user_id user_id 4 const 142
1 SIMPLE teams_users eq_ref PRIMARY PRIMARY 8 posts.team_id,const 1
Obviously the amount of scanned rows is much lower on the two queries.
Why is the initial query slow?
Thanks.
Yes, OR is frequently a performance-killer. A common work-around is to do UNION. For your example:
SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (teams_users.status='1')
UNION DISTINCT
SELECT *
FROM `posts`
LEFT JOIN `teams_users`
ON (teams_users.team_id=posts.team_id
AND teams_users.user_id='7135')
WHERE (posts.user_id='7135');
If you are sure there are not dups, change to the faster UNION ALL.
If you are not fishing for missing team_users rows, use JOIN instead of LEFT JOIN.
If you need ORDER BY, add some parens:
( SELECT ... )
UNION ...
( SELECT ... )
ORDER BY ...
Otherwise, the ORDER BY would apply only to the second SELECT. (If you also need 'pagination', see my blog .)
Please note that you might also need LIMIT in certain circumstances.
The queries without the OR clause are both sargable. That is, they both can be satisfied using indexes.
The query with the OR would be sargable if the MySQL query planner contained logic to figure out it can rewrite it as the UNION ALL of two queries. By the MySQL query planner doesn't (yet) have that kind of logic.
So, it does table scans to get the result set. Those are often very slow.
Related
In MySQL, I have a simple join between 2 tables. Something like
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
where a.id=12345
group by a.id
It runs normal as a query. But when I keep the query
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
group by a.id
in a view called view_ab, the view takes enormous amount of time when i run the following query on the view.
select * from view_ab where id = 12345
Both these tables are large tables. Unable to figure out the reason for such a drop in performance. Please help resolve this performance issue
EDIT:
This is the view SQL
CREATE VIEW view_ab AS SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid GROUP BY r.drid;
This is the query
SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
This is the query on the VIEW
SELECT * FROM view_ab WHERE drid=12718651;
Execution plan of the view
EXPLAIN EXTENDED SELECT * FROM view_ab WHERE drid=12718651;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
(NULL)
ref
4
const
10
100.00
(NULL)
2
DERIVED
s
(NULL)
ALL
idx_tbl_deliverroute_sku_drid
(NULL)
(NULL)
(NULL)
15060913
100.00
USING TEMPORARY; USING filesort
2
DERIVED
r
(NULL)
eq_ref
PRIMARY,FK_tbl_deliveryroute_1
PRIMARY
4
humdemotest.s.drid
1
100.00
USING INDEX
EXPLAIN EXTENDED SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
r
(NULL)
const
PRIMARY
PRIMARY
4
const
1
100.00
USING INDEX
1
SIMPLE
s
(NULL)
ref
idx_tbl_deliverroute_sku_drid
idx_tbl_deliverroute_sku_drid
4
const
22
100.00
(NULL)
From what I am seeing, you don't even need a join since you are dealing with a join on the same key column from A-B, the key already exists in table B, just query group by that. Also, I would have an index on your DeliveryRoute_SKU on its route ID column
SELECT
s.drid,
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651
group by
s.drID;
Since you are only doing the key and the sum, you don't even NEED the other table. Now if you needed other columns from the first table OTHER THAN the key, then yes, you would need the join. You could even simplify a step further since you are only querying a single key ID
SELECT
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651;
The reason the view is slow is simple. You are executing:
SELECT *
FROM view_ab
WHERE drid = 12718651;
What you want to execute is:
select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
where a.id = 12345
group by a.id;
What is actually being executed is:
select ab.*
from (select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
group by a.id
) ab
where ab.id = 12345;
That is, the entire aggregation is performed first. Then the where is applied. What you want is for the predicate to be pushed up (MySQL calls this merging). You can review the documentation on this subject.
One solution would seem to be rephrasing the query as a correlated subquery:
select a.id,
(select sum(b.qty) from b where b.id = a.id) as qty
from a
where a.id = 12345;
Alas, subqueries in the select have the same effect, so this doesn't work.
I don't know of a solution using a view. You can avoid using views for this. The ultimate solution would be to implement a trigger to store the summarized results in another table -- effectively materializing the view.
I've got a composite key table CUSTOMER_PRODUCT_XREF
__________________________________________________________________
|CUSTOMER_ID (PK NN VARCHAR(191)) | PRODUCT_ID(PK NN VARCHAR(191))|
-------------------------------------------------------------------
In my batch program I need to select 500 updated customers and also get the PRODUCT_ID's purchased by CUSTOMERs separated by comma and update our SOLR index. In my query I'm select 500 customers and doing a left join to CUSTOMER_PRODUCT_XREF
SELECT
customer.*, group_concat(xref.PRODUCT_ID separator ', ')
FROM
CUSTOMER customer
LEFT JOIN CUSTOMER_PRODUCT_XREF xref ON customer.CUSTOMER_ID=xref.CUSTOMER_ID
group by customer.CUSTOMER_ID
LIMIT 500;
EDIT: EXPLAIN QUERY
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 74236 Using where; Using temporary; Using filesort
1 SIMPLE xref index NULL PRIMARY 1532 NULL 121627 Using where; Using index; Using join buffer (Block Nested Loop)
I got lost connection exception after 20 minutes running the above query.
I tried with the following (sub query) and it took 1.7 seconds to get result but still slow.
SELECT
customer.*, (SELECT group_concat(PRODUCT_ID separator ', ')
FROM CUSTOMER_PRODUCT_XREF xref
WHERE customer.CUSTOMER_ID=xref.CUSTOMER_ID
GROUP BY customer.CUSTOMER_ID)
FROM
CUSTOMER customer
LIMIT 500;
EDIT: EXPLAIN QUERY produces
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY customer ALL NULL NULL NULL NULL 74236 NULL
2 DEPENDENT SUBQUERY xref index NULL PRIMARY 1532 NULL 121627 Using where; Using index; Using temporary; Using filesort
Question
CUSTOMER_PRODUCT_XREF already has both columns set as PRIMARY_KEY and NOT_NULL but why is my query still very slow ? I thought having Primary Key on a column was enough to build an index for it. Do I need further indexing ?
DATABASE INFO:
All the ID's in my database are VARCHAR(191) because the id's can contain alphabets.
I'm using utf8mb4_unicode_ci character encoding
I'm using SET group_concat_max_len := ##max_allowed_packet to get maximum number of product_ids for each customer. Prefer using group_concat in one main query so that I don't have to do multiple separate queries to get products for each customer.
Your original version of the query is doing the join first and then sorting all the resulting data -- which is probably pretty big given how large the fields are.
You can "fix" that version by selecting 500 hundred customers first and then doing the join:
SELECT c.*, group_concat(xref.PRODUCT_ID separator ', ')
FROM (select c.*
from CUSTOMER customer c
order by c.customer_id
limit 500
) c LEFT JOIN
CUSTOMER_PRODUCT_XREF xref
ON c.CUSTOMER_ID=xref.CUSTOMER_ID
group by c.CUSTOMER_ID ;
An alternative that might or might not have a big impact would be to doing the aggregation by customer in a subquery and join that, as in:
SELECT c.*, xref.products
FROM (select c.*
from CUSTOMER customer c
order by c.customer_id
limit 500
) c LEFT JOIN
(select customer_id, group_concat(xref.PRODUCT_ID separator ', ') as products
from CUSTOMER_PRODUCT_XREF xref
) xref
ON c.CUSTOMER_ID=xref.CUSTOMER_ID;
What you have discovered is that the MySQL optimizer does not recognize this situation (where the limit has a big impact on performance). Some other database engines do a better job of optimization in this case.
Alright the speed of the queries in my question shot up when I created an index just on the CUSTOMER_ID in CUSTOMER_PRODUCT_XREF table.
So I've got two indexes now
PRIMARY_KEY_INDEX on PRODUCT_ID and CUSTOMER_ID
CUSTOMER_ID_INDEX on CUSTOMER_ID
Given is a mySQL table named "orders_products" with the following relevant fields:
products_id
orders_id
Both fields are indexed.
I am running the following query:
SELECT products_id, count( products_id ) AS counter
FROM orders_products
WHERE orders_id
IN (
SELECT DISTINCT orders_id
FROM orders_products
WHERE products_id = 85094
)
AND products_id != 85094
GROUP BY products_id
ORDER BY counter DESC
LIMIT 4
This query takes extremely long, around 20 seconds. The database is not very busy otherwise, and performs well on other queries.
I am wondering, what causes the query to be so slow?
The table is rather big (around 1,5 million rows, size around 210 mb), could this be a memory issue?
Is there a way to tell exactly what is taking mySQL so long?
Output of Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY orders_products range products_id products_id 4 NULL 1577863 Using where; Using temporary; Using filesort
2 DEPENDENT SUBQUERY orders_products ref orders_id,products_id products_id 4 const 2 Using where; Using temporary
Queries that use WHERE ID IN (subquery) perform notoriously badly with mysql.
With most cases of such queries however, it is possible to rewrite them as a JOIN, and this one is no exception:
SELECT
t2.products_id,
count(t2.products_id) AS counter
FROM orders_products t1
JOIN orders_products t2
ON t2.orders_id = t1.orders_id
AND t2.products_id != 85094
WHERE t1.products_id = 85094
GROUP BY t2.products_id
ORDER BY counter DESC
LIMIT 4
If you want to return rows where there are no other products (and show a zero count for them), change the join to a LEFT JOIN.
Note how the first instance of the table has the WHERE products_id = X, which allows index look up and immediately reduces the number of rows, and the second instance of the table has the target data, but it looked up on the id field (again fast), but filtered in the join condition to count the other products.
Give these a try:
MySQL does not optimize IN with a subquery - join the tables together.
Your query contains != condition, which is very difficult to deal with - can you narrow down products and use multiple lookups rather than inequity comparison?
I have a problem with the following query which is very slow :
SELECT A.* FROM B
INNER JOIN A ON A.id=B.fk_A
WHERE A.creationDate BETWEEN '20120309' AND '20120607'
GROUP BY A.id
ORDER BY RAND()
LIMIT 0,5
EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE B index fk_A fk_A 4 \N 58962 Using index; Using temporary; Using filesort
1 SIMPLE A eq_ref PRIMARY,creationDate PRIMARY 4 B.fk_A 1 Using where
INDEXES :
A.id (int) = PRIMARY index
A.creationDate (date) = index
B.fk_A = index
Do you see something to optimize ?
Thanks a lot for your advice
I think the RAND() function will create a Rand() value for every row (this is why the using temporary shows up, and filesort because it can't use an index.
the best way would be to SELECT MAX(id) FROM a to get the max value.
then create 5 random numbers between 1 and MAX(id) and do a SELECT ... WHERE a.id IN (...) query.
If the result has fewer than 5 rows (because a record has been deleted) repeat the procedure until you are fine (or initially create 100 random numbers and LIMIT the query to 5.
That is not a 100% mysql solution, because you have to do the logic in your code, but will be much faster I believe.
Update
Just Found an interesting article in the net, that basically tells the same: http://akinas.com/pages/en/blog/mysql_random_row/
One possible rewriting of the query:
SELECT A.*
FROM A
WHERE A.creationDate BETWEEN '20120309' AND '20120607'
AND EXISTS
( SELECT *
FROM B
WHERE A.id = B.fk_A
)
ORDER BY RAND()
LIMIT 0,5
I have a many-to-many query that i'd like to optimize,
what indexes should i create for it?
SELECT (SELECT COUNT(post_id)
FROM posts
WHERE post_status = 1) as total,
p.*,
GROUP_CONCAT(t.tag_name) tagged
FROM tags_relation tr
JOIN posts p ON p.post_id = tr.rel_post_id
JOIN tags t ON t.tag_id = tr.rel_tag_id
WHERE p.post_status=1
GROUP BY p.post_id
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY p ALL PRIMARY NULL NULL NULL 5 Using where; Using filesort
You can take a look at the query execution plan using the Explain statement. This will show you whether a full table scan is happening or if it was able to find an index to retrieve the data. From that point on you can optimize further.
Edit
Based on your query execution plan, first optimization step check your tables have the primary key defined and you can set an index on post_status and tag_name columns.