How can I speed up my MySQL query (JOINs + GROUP BY) - mysql

I have this query in mysql and since it take almost 20 sec to execute, I want to do selects insted of innerjoins with limits in order to make the execution faster.
SELECT t1.order_id, CONCAT(t3.first_name,' ',t3.last_name),
buyer_first_name, buyer_last_name,
max(product_quantity) as product_quantity, order_status,
order_value, t5.first_name staff_firstnamelogin,
t5.last_name staff_lastnamelogin, t6.day_name
FROM t_seller_order t0
INNER JOIN t_orders t1
ON t0.event_id = t1.event_id
AND t1.seller_order_token = t0.seller_order_token
INNER JOIN t_tickets t2
ON t1.order_id = t2.order_id
INNER JOIN t_login t3
ON t3.login_id = t1.login_id
INNER JOIN t_login t5
ON t0.login_id = t5.login_id
INNER JOIN t_event_days t6
ON t2.product_id = t6.event_day_id
WHERE t0.event_id = 35
group by t1.order_id
order by order_id desc;

There are many things about the schema that prevent speeding up the query. Let's see what can or cannot be done...
Since the WHERE and GROUP BY hit different tables, no index is useful for both. The best is to have t0: INDEX(event_id).
Indexes for JOINs: t2..t6 need indexes (or PKs) on order_id, login_id, event_day_id. t1 needs INDEX(event_id, seller_order_token) in either order.
The GROUP BY and ORDER BY are the 'same', so that will take only one sort, not two.
A potential speedup is to finish the GROUP BY before doing some of the JOINs. The current structure is "inflate-deflate", wherein the JOINs conspire to create a huge temp table, then the GROUP BY deflates the results. So...
If see if you can write a SELECT like this:
SELECT t0.id, t1.id -- I need the PRIMARY KEYs for these two tables
FROM t_seller_order AS t0
JOIN t_orders AS t1
WHERE t0.event_id = 35
GROUP BY t1.order_id
How fast is that? Hopefully we can build the rest of the query around this, but without taking too much more time. There are two approaches; I don't know which will be better.
Plan A: Use subqueries (when possible) instead of JOINs. For example, instead of JOINing to t3, plan on this being one item in theSELECT`:
( SELECT CONCAT(first_name,' ',last_name)
FROM t_login WHERE login_id = t1.login_id
) AS login_name
(Ditto for any other columns in the SELECT that touch a table only once. As it stands, t5 is touched twice, so this approach may be impractical.)
Plan B: JOIN after the GROUP BY. That is, after then "deflate".
SELECT ...
FROM ( SELECT t0.id, t1.id ... GROUP BY... ) AS x -- as discussed above
JOIN y ON y.foo = x.foo
JOIN z ON z.bar = x.bar
-- the GROUP BY is avoided
ORDER BY x.order_id desc; -- The ORDER BY is still necessary
Is your example, I lean toward Plan B, but a mixture of both 'Plans' may be desirable.
Further notes: LEFT JOIN and LIMIT add wrinkles to the above discussion. Since you did not have either, I will not clutter this discussion with them.

Related

How to optimize limit offset when I join multiple tables?

Here's the format of mysql code
select a,b,c
from table1
left join table2 on x=y
left join table3 on m=n
limit 100000, 10
I know know to optimize limit when I have a large offset. But I couldn't find the solution to optimize the one with multiple tables, is there any way to make my query faster?
First of all, offsets and limits are unpredictable unless you include ORDER BY clauses in your query. Without ORDER BY, your SQL server is allowed to return result rows in any order it chooses.
Second, Large offsets and small limits are a notorious query-performance antipattern. There's not much you can to do make the problem go away.
To get decent performance, it's helpful to rethink why you want to use this kind of access pattern, and then try to use WHERE filters on some indexed column value.
For example, let's say you're doing this kind of thing.
select a.user_id, b.user_email, c.user_account
from table1 a
left join table2 b on a.user_id = b.user_id
left join table3 c on b.account_id = c.account_id
limit whatever
Let's say you're paginating the query so you get fifty users at a time. Then you can start with a last_seen_user_id variable in your program, initialized to -1.
Your query looks like this:
select a.user_id, b.user_email, c.user_account
from (
select user_id
from table1
where user_id > ?last_seen_user_id?
order by user_id
limit 50
) u
join table1 a on u.user_id = a.user_id
left join table2 b on a.user_id = b.user_id
left join table3 c on b.account_id = c.account_id
order by a.user_id
Then, when you retrieve that result, set your last_seen_user_id to the value from the last row in the result.
Run the query again to get the next fifty users. If table1.user_id is a primary key or a unique index, this will be fast.

How to optimize mysql on left join

I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last

Efficiently combine two queries with both have a common inner join

I would like to join three tables and then union them. Two of the table that are joined are the same in the two queries which are union'd, and it seems like a waste to perform this join twice. See below for an example. How is this best performed? Thanks
SELECT t1.c1,t2.c1,t3.c1
FROM audits AS t1
INNER JOIN t2 ON t2.t1_id=t1.id
INNER JOIN t3 ON t3.t1_id=t1.id
WHERE t2.fk1=123
UNION
SELECT t1.c1,t2.c1,t4.c1
FROM audits AS t1
INNER JOIN t2 ON t2.t1_id=t1.id
INNER JOIN t4 ON t4.t1_id=t1.id
WHERE t2.fk1=123
ORDER BY t1.fk1 ASC
This would work, if the syntax is supported by MySql, and might be slightly more efficient:
SELECT t1.c1, t2.c1, t.c1
FROM audits AS t1
INNER JOIN t2 ON t2.t1_id=t1.id
INNER JOIN (
select t1_id from t3
union
select t1_id from t4
) as t ON t.t1_id=t1.id
WHERE t2.fk1=123
ORDER BY t1.fk1 ASC
The reason for a pssible performance improvement is the smaller footprint of the relation being UNION'ed; one column instead of 3. UNION eliminates duplicates (unlike UNION ALL) so the entire collection of records must be sorted to eliminate duplicates.
At the meta-level this query informs the optimizer of a specific optimization available, that it may be unable to determine on it's own.

Create a VIEW where a record in t1 is not present in t2 ? Confirmation on Union/Left Join/Inner Join?

I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;

Shorten a join query

I have a query with 3 joins:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t1
left join t2 on t1.email = t2.email
Inner join t3 on t2.entity_id = t3.order_id
Inner join t4 on t3.product_id = t4.entity_id
WHERE t4.attribute_id = 126
I think my server just can't make it :) --> time is running out so an error occurs!
Thanks a lot
Table structur:
T1:
email (which is the same then in t2)
T2:
email firstname lastname orderid (which is called entity id in t3)
T3:
entityid product id (which is called entity id in t4)
T4:
entityid attributeid value
Unless t2 links straight to t4 there is no way.
Also, do you need a left join between t1 and t2?
As #Sachin already stated, you can't "shorten" this query unless t2 links straight to t4 without requiring a comparison with t3. However, in order to speed up your query, you should have indexes on some or all of the columns referenced in your join conditions (i.e. t1.email, t2.email, t2.entity_id, etc).
Having an index on each of these columns will give you much faster SELECT queries, but it will slow down your INSERT and UPDATE queries. So if you SELECT more often than you INSERT or UPDATE, then you should definitely be using indexes. If not, try to make indexes in wise places (tables that have INSERT or UPDATE statements run less often but still have a lot of rows, for instance).
For further clarification, see the following links:
More information on how indexes work
Syntax for creating indexes
Try your query this way:
SELECT t1.email, t2.firstname, t2.lastname, t4.value
FROM t4
INNER JOIN t3 ON t3.product_id = t4.entity_id
INNER JOIN t2 ON t2.entity_id = t3.order_id
INNER JOIN t1 ON t1.email = t2.email
WHERE t4.attribute_id = 126
It's basically your query but "backwards". Your original way, your DBMS has to try to join t2 for ALL records in t1, then join t3 for ALL records found in t2 before it can even attempt to address your WHERE clause.
My way, you're finding all the records in t4 where attribute_id = 126 first, THEN attempting to join other tables. It should be a lot quicker. You should then be able to speed things up even more by making sure the proper indexes exist on the tables involved. You can prepend the keyword EXPLAIN to your query to see how the DBMS attempts to seek data in your query.