Why join on subquery is so slow? - mysql

Consider following query:
SELECT
...
FROM table1
LEFT JOIN table2 ...
LEFT JOIN table3 ...
LEFT JOIN table4 ...
LEFT JOIN table5 ...
LEFT JOIN
(
SELECT id, COUNT(*) as qty FROM other WHERE ... GROUP BY id
) temp ON temp.id = table1.id
WHERE temp.qty = 123
GROUP BY table1.id
This query is very slow, however when I execute
SELECT id, COUNT(*) as qty FROM other WHERE ... GROUP BY id
alone, it's blazing fast, it returns only few (20-30) rows...
My current solution is a temporary table with index, I fill it with data, then I use join:
DROP TABLE IF EXISTS tmp_counts;
CREATE TABLE tmp_counts id INT(11), qty INT(11) ...
INSERT INTO tmp_counts (id,qty) (SELECT id, COUNT(*) as qty FROM other WHERE ... GROUP BY id);
SELECT
...
FROM table1
LEFT JOIN table2 ...
LEFT JOIN table3 ...
LEFT JOIN table4 ...
LEFT JOIN table5 ...
LEFT JOIN tmp_counts ON tmp_counts.id = table1.id
WHERE tmp_counts.qty = 123
GROUP BY table1.id
It works very fast, but I feel like it's an ugly solution.
Is MySQL really that stupid I need to do mysql job manually by myself?

MySQL isn't that stupid. Optimizing databases is complicated. In fact, when you think about it, there is very little software that does such optimizations across such a large variety of different situations. Procedural and object oriented languages -- they are told what to do. In SQL, we say what we want and let the optimizer figure out the best. What is best, in turn, can depend heavily on the underlying data.
Sometimes the optimizer is wrong. Sometimes we can convince it otherwise. The problem here is quite possibly the choice of join order or join algorithms. One method to get around a problem like this is to replace the subquery with a correlated subquery in the select:
select
SELECT . . .,
(SELECT COUNT(*)
FROM other
WHERE . . . AND
other.id = table1.id
) as qty
...
FROM table1
LEFT JOIN table2 ...
LEFT JOIN table3 ...
LEFT JOIN table4 ...
LEFT JOIN table5 ...
GROUP BY table1.id;
This, in turn, can be further optimized by creating an index on other. At the very least, this would be other.id.

Related

How to join where duplicates are needed?

I have a table structure like this
Table1
PersonID, referrer
Table2
Event_A_ID, PersonID, status
Table3
Event_B_ID, PersonID, status
I want to get the event status for everyone from table one with referrer=X by joining all of the event tables like Event_A...K and checking for PersonID. Since people can appear in multiple events we can have cases like this
PersonID=1001, EventA_ID, referrer=X, status
PersonID=1001, EventB_ID, referrer=X, status
PersonID=1001, EventK_ID, referrer=X, status
PersonID=1002, ...
PersonID=1003, ...
But I am not sure how to JOIN all of the event tables since the IDs can be duplicates (and are desired).
I tried to make a separate select and use the in syntax
...
WHERE 1=1
AND PersonID IN (SELECT PersonID from table1 where referrer=X)
But then I realized everything before will be an aggregate of events.
Should I start with the SELECT from Table1? Select the valid IDs first and then select from the events after? If so, how do I JOIN? I feel like ideal looks like union of all the event tables and then select
You can use union all for row wise implementation of data or you can also use inner joins between tables if there is not much table events. This will represent data in column format.
SELECT * FROM tb2 AS t2 INNER JOIN tb3 t3 ON t2.personId = t3.personId INNER JOIN tb1 t1 ON t1.personId = t2.personId WHERE t1.refer='refer1'
There can be many other approach too depending on the number of tables you want to join.
You should also consider to use a better relations among your db tables as your current scenario will lead you to have as many tables as many events you have. This will create slowness in retrieving the data for multiple events.
use union all and then apply join
select a.person_id,a.referrer,b.eventID,b.PersonID,b.status from Table1 a inner join
(
select Event_A_ID as eventID, PersonID, status from Table2
union all
select Event_B_ID as eventID, PersonID, status from Table3
)b on a.personid=b.personid
You can do something like this with left joins:
SELECT t1.PersonID, t1.referrer,
t2.Event_A_ID, t2.status as status_a,
t3.Event_B_ID, t3.status as status_b
.
.
.
FROM table1 t1
LEFT JOIN table2 t2 ON t2.PersonID = t1.PersonID
LEFT JOIN table3 t3 ON t3.PersonID = t1.PersonID
.
.
.
WHERE t1.referrer = 'x'

Calculate quantity in a faster way than in this query

I need to calculate total available qty from the database, and for that I need to do joins with a couple of tables. I can not paste my whole query, but the following is the basic structure:
select sum(qty) as qty, field
from
(
(
select SUM(table1.qty) as qty , field
from
table1
left join table2 on table1.field = table2.field
left join table3 on table3.field = table2.field
where condition
group by fieldname
)
UNION ALL
(
selecy SUM(table1.qty) as qty,field
from
table1
left join table2 on table1.field = table2.field
left join table3 on table3.field = table2.field
where condition
group by fieldname
)
UNION ALL
(
select SUM(table1.qty) as qty, field
from
table1
left join table2 on table1.field = table2.field
left join table3 on table3.field = table2.field
where condition
group by fieldname
)
...
..
12 times
) as temp
LEFT JOIN another_main_table ON another_main_table.field = temp.field
I have taken care of indexes of each table, but there are some unions which are taking longer time than expected. There are around 45 tables used in this query and all are examined fully. Some tables have around 2.6 million records.
Can you please suggest me how I can get the result in 1/2 seconds? As of now I am getting the result in around one minute.
Since your given example one can not properly justify the proper solutions, but still if I roughly examine your query, you have used LEFT JOIN, So this will take a longer time compare to INNER JOIN.
So, Use INNER JOIN if your data permits

3 tables and 2 left joins

Query 1:
SELECT sum(total_revenue_usd)
FROM table1 c
WHERE c.irt1_search_campaign_id IN (
SELECT assign_id
FROM table2 ga
LEFT JOIN table3 d
ON d.campaign_id = ga.assign_id
)
Query 2:
SELECT sum(total_revenue_usd)
FROM table1 c
LEFT JOIN table2 ga
ON c.irt1_search_campaign_id = ga.assign_id
LEFT JOIN table3 d
ON d.campaign_id = ga.assign_id
Query 1 gives me the correct result where as I need it in the second style without using 'in'. However Query 2 doesn't give the same result.
How can I change the first query without using 'in' ?
The reason being is that the small query is part of a much larger query, there are other conditions that won't work with 'in'
You could try something along the lines of
SELECT sum(total_revenue_usd)
FROM table1 c
JOIN
(
SELECT DISTINCT ga.assign_id
FROM table2 ga
JOIN table3 d
ON d.campaign_id = ga.assign_id
) x
ON c.irt1_search_campaign_id = x.assign_id
The queries do very different things:
The first query sums the total_revenue_usd from table1 where irt1_search_campaign_id exists in table2 as assign_id. (The outer join to table3 is absolutely unnecessary, by the way, because it doesn't change wether a table2.assign_id exists or not.) As you look for existence in table2, you can of course replace IN with EXISTS.
The second query gets you combinations of table1, table2 and table3. So, in case there are two records in table2 for an entry in table1 and three records in table3 for each of the two table2 records, you will get six records for the one table1 record. Thus you sum its total_revenue_usd sixfold. This is not what you want. Don't join table1 with the other tables.
EDIT: Here is the query using an exists clause. As mentioned, outer joining table3 doesn't alter the results.
Select sum(total_revenue_usd)
from table1 c
where exists
(
select *
from table2 ga
-- left join table3 d on d.campaign_id = ga.assign_id
where ga.assign_id = c.irt1_search_campaign_id
);

SQL Query select from many tables

I have 5 Tables like this:
TABLE 1: PRIMARY_KEY,NAME,FK_TABLE2
TABLE 2: PRIMARY_KEY,FK_TABLE3
TABLE 3: PRIMARY_KEY,FK_TABLE4
TABLE 4: PRIMARY_KEY,FK_TABLE5
TABLE 5: PRIMARY_KEY,DIAGRAM_NAME
And what I want is when I search for a name in a search bar, it returns Name from Table1, and also DIAGRAM_NAME from table 5.
The first part is easy:
SELECT `TABLE1`.name
from Table 1
Where `TABLE1`.name LIKE '%$search%'
But for the second part I need your help...
Thank you!
You need to look into using JOIN:
SELECT T.Name, T5.Diagram_Name
FROM Table1 T
JOIN Table2 T2 ON T.FK_TABLE2 = T2.PRIMARY_KEY
JOIN Table3 T3 ON T2.FK_TABLE3 = T3.PRIMARY_KEY
JOIN Table4 T4 ON T3.FK_TABLE4 = T4.PRIMARY_KEY
JOIN Table5 T5 ON T4.FK_TABLE5 = T5.PRIMARY_KEY
WHERE T.Name LIKE '%$search%'
If you want to return the names that don't have matching diagram names, use LEFT JOIN instead.
Good luck.
You are going to need to JOIN the tables using the Primary Key and FK values:
select t1.name, t5.DIAGRAM_NAME
from table1 t1
left join table2 t2
on t1.FK_TABLE2 = t2.PRIMARY_KEY
left join table3 t3
on t2.FK_TABLE2 = t3.PRIMARY_KEY
left join table4 t4
on t3.FK_TABLE3 = t4.PRIMARY_KEY
left join table5 t5
on t4.FK_TABLE4 = t5.PRIMARY_KEY
Where t1.name LIKE '%$search%'
If you need help learning JOIN syntax, here is a great visual explanation of joins.
I used a LEFT JOIN in my example query, which will return all rows from table1 even if there is not a matching row in the remaining tables.
If you know that there is a matching row in all of the tables that you are joining, then you can use an INNER JOIN.

How to retrieve non-matching results in mysql

I'm sure this is straight-forward, but how do I write a query in mysql that joins two tables and then returns only those records from the first table that don't match. I want it to be something like:
Select tid from table1 inner join table2 on table2.tid = table1.tid where table1.tid != table2.tid;
but this doesn't seem to make alot of sense!
You can use a left outer join to accomplish this:
select
t1.tid
from
table1 t1
left outer join table2 t2 on
t1.tid = t2.tid
where
t2.tid is null
What this does is it takes your first table (table1), joins it with your second table (table2), and fills in null for the table2 columns in any row in table1 that doesn't match a row in table2. Then, it filters that out by selecting only the table1 rows where no match could be found.
Alternatively, you can also use not exists:
select
t1.tid
from
table1 t1
where
not exists (select 1 from table2 t2 where t2.tid = t1.tid)
This performs a left semi join, and will essentially do the same thing that the left outer join does. Depending on your indexes, one may be faster than the other, but both are viable options. MySQL has some good documentation on optimizing the joins, so you should check that out..