I need to calculate total available qty from the database, and for that I need to do joins with a couple of tables. I can not paste my whole query, but the following is the basic structure:
select sum(qty) as qty, field
from
(
(
select SUM(table1.qty) as qty , field
from
table1
left join table2 on table1.field = table2.field
left join table3 on table3.field = table2.field
where condition
group by fieldname
)
UNION ALL
(
selecy SUM(table1.qty) as qty,field
from
table1
left join table2 on table1.field = table2.field
left join table3 on table3.field = table2.field
where condition
group by fieldname
)
UNION ALL
(
select SUM(table1.qty) as qty, field
from
table1
left join table2 on table1.field = table2.field
left join table3 on table3.field = table2.field
where condition
group by fieldname
)
...
..
12 times
) as temp
LEFT JOIN another_main_table ON another_main_table.field = temp.field
I have taken care of indexes of each table, but there are some unions which are taking longer time than expected. There are around 45 tables used in this query and all are examined fully. Some tables have around 2.6 million records.
Can you please suggest me how I can get the result in 1/2 seconds? As of now I am getting the result in around one minute.
Since your given example one can not properly justify the proper solutions, but still if I roughly examine your query, you have used LEFT JOIN, So this will take a longer time compare to INNER JOIN.
So, Use INNER JOIN if your data permits
Related
I'm trying to filter some data, but as the data comes from 2 tables in the database I have to union them together, permitting nulls from the second table as I always need to have any values available in the first table, due to this being a product database with some products having sub-combinations and some none. Thus far I've come up with using a Union to join the two tables together, but now I need a method to filter out the data using a WHERE clause; however: this is where I get stuck. I tried putting the union as a select statement in the FROM clause: no data returned, I tried to put it into the SELECT clause as a sub: no data returned...
In short I need something like this:
SELECT id_product, id_product_attribute,upc1,upc2
FROM (UNION)
WHERE upc1='xyz' OR upc2='xyz';
where for example the result might be things such as:
-> 100, null, 9912456, null
or
-> 200, 153, 9915559, 9977123
Currently I have this (sorry I don't have more):
(SELECT product.id_product as id_product,
product.upc as upc1,
comb.id_product_attribute,
comb.upc as upc2
FROM `db`.table1 product
LEFT JOIN `db`.table2 comb
ON comb.id_product = product.id_product
)
UNION
(SELECT product.id_product as id_product,
product.upc as headCNK,
comb.id_product_attribute,
comb.upc
FROM `db`.table1 product
RIGHT JOIN `db`.table2 comb
ON comb.id_product = product.id_product
);
Also note that upc1 is coming from table 1, and upc2 from table2.
I could use the entire query, and filter out everything using some business logic in the worst case scenario, but rather not as I don't want to perform endless queries where I don't have to, my service provider doesn't like that...
UPDATE:
I also tried:
SELECT *
from db.t1 as prod
CROSS JOIN db.t2 as comb ON prod.id_product = comb.id_product
WHERE prod.upc = 'xyz' OR comb.upc = 'xyz';
This didn't work either.
Placed a fiddle here with some small sample data:
http://sqlfiddle.com/#!9/340d7d
The output for the '991002' used in the where clause in query SELECT id_product, id_product_attribute, table1.upc, table2.upc should be: 101, null, 991002, null
And for '990001' it should then be: 101, 201, 990001, 990001
For all values try
SELECT t1.id_product, t2.id_product_attribute, t1.upc, t2.upc
FROM ( SELECT upc FROM table1
UNION
SELECT upc FROM table2 ) t0
LEFT JOIN table1 t1 USING (upc)
LEFT JOIN table2 t2 USING (upc)
For definite upc value edit to
...
SELECT t1.id_product, t2.id_product_attribute, t1.upc, t2.upc
FROM ( SELECT 990001 upc ) t0
LEFT JOIN table1 t1 USING (upc)
LEFT JOIN table2 t2 USING (upc)
...
I have a table structure like this
Table1
PersonID, referrer
Table2
Event_A_ID, PersonID, status
Table3
Event_B_ID, PersonID, status
I want to get the event status for everyone from table one with referrer=X by joining all of the event tables like Event_A...K and checking for PersonID. Since people can appear in multiple events we can have cases like this
PersonID=1001, EventA_ID, referrer=X, status
PersonID=1001, EventB_ID, referrer=X, status
PersonID=1001, EventK_ID, referrer=X, status
PersonID=1002, ...
PersonID=1003, ...
But I am not sure how to JOIN all of the event tables since the IDs can be duplicates (and are desired).
I tried to make a separate select and use the in syntax
...
WHERE 1=1
AND PersonID IN (SELECT PersonID from table1 where referrer=X)
But then I realized everything before will be an aggregate of events.
Should I start with the SELECT from Table1? Select the valid IDs first and then select from the events after? If so, how do I JOIN? I feel like ideal looks like union of all the event tables and then select
You can use union all for row wise implementation of data or you can also use inner joins between tables if there is not much table events. This will represent data in column format.
SELECT * FROM tb2 AS t2 INNER JOIN tb3 t3 ON t2.personId = t3.personId INNER JOIN tb1 t1 ON t1.personId = t2.personId WHERE t1.refer='refer1'
There can be many other approach too depending on the number of tables you want to join.
You should also consider to use a better relations among your db tables as your current scenario will lead you to have as many tables as many events you have. This will create slowness in retrieving the data for multiple events.
use union all and then apply join
select a.person_id,a.referrer,b.eventID,b.PersonID,b.status from Table1 a inner join
(
select Event_A_ID as eventID, PersonID, status from Table2
union all
select Event_B_ID as eventID, PersonID, status from Table3
)b on a.personid=b.personid
You can do something like this with left joins:
SELECT t1.PersonID, t1.referrer,
t2.Event_A_ID, t2.status as status_a,
t3.Event_B_ID, t3.status as status_b
.
.
.
FROM table1 t1
LEFT JOIN table2 t2 ON t2.PersonID = t1.PersonID
LEFT JOIN table3 t3 ON t3.PersonID = t1.PersonID
.
.
.
WHERE t1.referrer = 'x'
I have about twenty rather small tables (the largest has about 2k rows, normaly about 100 rows, with from 4 up to 20 columns each) I try to join by
select ... from table1
left join table2 on table1.name = table2.t2name
left join table3 on table1.name = table3.othername
left join table4 on table2.t2name = table4.something
and so on
in theory it should return about 2k rows with maybe 80 columns, so I guess that the amount of data itself is not the problem.
But it runs out of memory. From reading several posts here I figured out that mysql internaly makes a big "all x all"-table first and reduces it later. How can I force it to excute the join after each join first, that it takes a lot less memory?
Just to make things clear, in your case the expected amount of data is not the problem.
What appears to be the problem is the fact that you are asking the system to compare A X B X C X D... rows (calculate what it means and you will get the picture).
The general idea described in one of my previous comments is to make you query look as follows:
SELECT * FROM (select ... from table1
where .....
) A
LEFT JOIN SELECT * FROM (select ... from table2
where .....
) B
ON A.name = B.t2name
LEFT JOIN SELECT * FROM (select ... from table3
where .....
) C
ON A.name = C.othername
LEFT JOIN SELECT * FROM (select ... from table4
where .....
) D
ON B.name = D.something
In this way, and assuming that this is applicable in the sense that you do have conditions to put in the where ..... clause of the inner selects, you will be reducing the number of records from each table that would need to be compared during the join.
Consider following query:
SELECT
...
FROM table1
LEFT JOIN table2 ...
LEFT JOIN table3 ...
LEFT JOIN table4 ...
LEFT JOIN table5 ...
LEFT JOIN
(
SELECT id, COUNT(*) as qty FROM other WHERE ... GROUP BY id
) temp ON temp.id = table1.id
WHERE temp.qty = 123
GROUP BY table1.id
This query is very slow, however when I execute
SELECT id, COUNT(*) as qty FROM other WHERE ... GROUP BY id
alone, it's blazing fast, it returns only few (20-30) rows...
My current solution is a temporary table with index, I fill it with data, then I use join:
DROP TABLE IF EXISTS tmp_counts;
CREATE TABLE tmp_counts id INT(11), qty INT(11) ...
INSERT INTO tmp_counts (id,qty) (SELECT id, COUNT(*) as qty FROM other WHERE ... GROUP BY id);
SELECT
...
FROM table1
LEFT JOIN table2 ...
LEFT JOIN table3 ...
LEFT JOIN table4 ...
LEFT JOIN table5 ...
LEFT JOIN tmp_counts ON tmp_counts.id = table1.id
WHERE tmp_counts.qty = 123
GROUP BY table1.id
It works very fast, but I feel like it's an ugly solution.
Is MySQL really that stupid I need to do mysql job manually by myself?
MySQL isn't that stupid. Optimizing databases is complicated. In fact, when you think about it, there is very little software that does such optimizations across such a large variety of different situations. Procedural and object oriented languages -- they are told what to do. In SQL, we say what we want and let the optimizer figure out the best. What is best, in turn, can depend heavily on the underlying data.
Sometimes the optimizer is wrong. Sometimes we can convince it otherwise. The problem here is quite possibly the choice of join order or join algorithms. One method to get around a problem like this is to replace the subquery with a correlated subquery in the select:
select
SELECT . . .,
(SELECT COUNT(*)
FROM other
WHERE . . . AND
other.id = table1.id
) as qty
...
FROM table1
LEFT JOIN table2 ...
LEFT JOIN table3 ...
LEFT JOIN table4 ...
LEFT JOIN table5 ...
GROUP BY table1.id;
This, in turn, can be further optimized by creating an index on other. At the very least, this would be other.id.
Query 1:
SELECT sum(total_revenue_usd)
FROM table1 c
WHERE c.irt1_search_campaign_id IN (
SELECT assign_id
FROM table2 ga
LEFT JOIN table3 d
ON d.campaign_id = ga.assign_id
)
Query 2:
SELECT sum(total_revenue_usd)
FROM table1 c
LEFT JOIN table2 ga
ON c.irt1_search_campaign_id = ga.assign_id
LEFT JOIN table3 d
ON d.campaign_id = ga.assign_id
Query 1 gives me the correct result where as I need it in the second style without using 'in'. However Query 2 doesn't give the same result.
How can I change the first query without using 'in' ?
The reason being is that the small query is part of a much larger query, there are other conditions that won't work with 'in'
You could try something along the lines of
SELECT sum(total_revenue_usd)
FROM table1 c
JOIN
(
SELECT DISTINCT ga.assign_id
FROM table2 ga
JOIN table3 d
ON d.campaign_id = ga.assign_id
) x
ON c.irt1_search_campaign_id = x.assign_id
The queries do very different things:
The first query sums the total_revenue_usd from table1 where irt1_search_campaign_id exists in table2 as assign_id. (The outer join to table3 is absolutely unnecessary, by the way, because it doesn't change wether a table2.assign_id exists or not.) As you look for existence in table2, you can of course replace IN with EXISTS.
The second query gets you combinations of table1, table2 and table3. So, in case there are two records in table2 for an entry in table1 and three records in table3 for each of the two table2 records, you will get six records for the one table1 record. Thus you sum its total_revenue_usd sixfold. This is not what you want. Don't join table1 with the other tables.
EDIT: Here is the query using an exists clause. As mentioned, outer joining table3 doesn't alter the results.
Select sum(total_revenue_usd)
from table1 c
where exists
(
select *
from table2 ga
-- left join table3 d on d.campaign_id = ga.assign_id
where ga.assign_id = c.irt1_search_campaign_id
);