(query updated for cdhowie comments)
Here's the situation.
I want to "count the number of tasks assigned to each worker within kind 1,2 of task location AND kind 3,4 of task department".
Suppose I have the following tables
Task : id, Name
Task_Worker_Combi : Task_id, Worker_id
Worker : id, Name
Task_Location_Combi : Task_id, Location_id
Task_Department_Combi : Task_id, Department_id
Location : id, Name
Department : id, Name
I got as far as the following:(however since it takes forever there must be something wrong with the query)
SELECT W.id, W.Name, COUNT(TWC.Task_id) AS Count
FROM Worker AS W
LEFT JOIN Task_Worker_Combi AS TWC
ON (W.id=TWC.Worker_id)
WHERE W.id>0 AND TWC.Task_id IN
(
SELECT T.id
FROM Task as T
LEFT JOIN (Task_Location_Combi AS TLC, Task_Department_Combi AS TDC)
ON (T.id=TLC.Task_id AND T.id=TDC.Task_id)
WHERE 1 AND TLC.Location_id IN (1,2) AND TDC.Department_id IN (3,4)
GROUP BY T.id
)
GROUP BY W.id
ORDER BY W.Name
Without this subquery it returns "the number of tasks assigned to each worker unconditionally" fine.
AND TWC.Task_id IN
(
SELECT T.id
FROM Task as T
LEFT JOIN (Task_Location_Combi AS TLC, Task_Department_Combi AS TDC)
ON (T.id=TLC.Task_id AND T.id=TDC.Task_id)
WHERE 1 AND TLC.Location_id IN (1,2) AND TDC.Department_id IN (3,4)
GROUP BY T.id
)
Where went wrong, and how would you rewrite this query to efficiently work? Please help me somebody. I'm stuck here for over a week now!
The actual query is the following. (assume Job as Task and Worker as Industry from above simplified query)
EXPLAIN SELECT id, Name, COUNT( J.Industry ) AS Count
FROM industry_db.industry AS I
LEFT JOIN job_db.industry AS J ON ( I.id = J.Industry )
WHERE I.id >0
AND J.Job
IN (
SELECT t1.id
FROM job_db.job AS t1
LEFT JOIN (
company_db.company AS t2, job_db.industry AS t3, location_db.city AS t4, job_db.function AS t5, job_db.tag AS t6, job_db.degree AS t7, location_db.state AS t8, location_db.group AS t9
) ON ( t1.Company = t2.id
AND t1.id = t3.Job
AND t1.City = t4.id
AND t1.id = t5.Job
AND t1.id = t6.Job
AND t1.id = t7.Job
AND t1.State = t8.id
AND t1.State_Group = t9.id )
WHERE 1
AND t1.Open = '1'
GROUP BY t1.id)
GROUP BY id
HAVING Count >0
ORDER BY Name
And the Explain result from phpmyadmin is the following.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY I range PRIMARY,id PRIMARY 1 NULL 39 Using where; Using temporary; Using filesort
1 PRIMARY J ref Industry Industry 1 industry_db.I.id 403 Using where
2 DEPENDENT SUBQUERY t1 index NULL PRIMARY 4 NULL 2868 Using where
2 DEPENDENT SUBQUERY t9 eq_ref PRIMARY PRIMARY 1 job_db.t1.State_Group 1 Using index
2 DEPENDENT SUBQUERY t2 eq_ref PRIMARY PRIMARY 2 job_db.t1.Company 1 Using index
2 DEPENDENT SUBQUERY t8 eq_ref PRIMARY PRIMARY 1 job_db.t1.State 1 Using index
2 DEPENDENT SUBQUERY t4 eq_ref PRIMARY PRIMARY 4 job_db.t1.City 1 Using index
2 DEPENDENT SUBQUERY t7 ref PRIMARY PRIMARY 4 job_db.t1.id 1 Using index
2 DEPENDENT SUBQUERY t3 ref Job Job 4 job_db.t1.id 1 Using index
2 DEPENDENT SUBQUERY t5 ref PRIMARY PRIMARY 4 job_db.t7.Job 1 Using index
2 DEPENDENT SUBQUERY t6 ref PRIMARY PRIMARY 4 job_db.t7.Job 2 Using index
Try this:
SELECT w.id, w.Name, COUNT( tw.Task_id )
FROM Worker AS w
LEFT JOIN Task_Worker_Combi AS tw
ON(
w.id = tw.Worker_id AND
EXISTS( SELECT Task_id FROM Task_Location_Combi
WHERE Task_id = tw.TaskId AND Location_id IN(1, 2) ) AND
EXISTS( SELECT Task_id FROM Task_Department_Combi
WHERE Task_id = tw_TaskId AND Department_id IN(3, 4) )
)
Related
First İ use wherehas but then I decided use this way. This way result better than wherehas but It isn't satisfy me. Query response time is a 873ms. So I have 400k+ data in the table.
select count(*) as aggregate
from `orders`
where (`pickup_address_id` in (
select `id`
from `addresses`
where `region_id` = 12)
or `delivery_address_id` in (
select `id`
from `addresses`
where `region_id` = 12)
) and `orders`.`status` = 2
Try this:
select count(distinct o.`id`) as aggregate
from `orders` o
inner join `addresses` a ON a.`id` IN (o.`pickup_address_id`, o.`delivery_address_id`)
AND a.`region_id` = 12
where o.`status` = 2
Alternatively:
SELECT count(distinct id) as aggregate
FROM (
select o.`id`
from `orders` o
inner join `addresses` a ON a.`id` = o.`pickup_address_id`
AND a.`region_id` = 12
where o.`status` = 2
UNION
select o.`id`
from `orders` o
inner join `addresses` a ON a.`id` = o.`delivery_address_id`
AND a.`region_id` = 12
where o.`status` = 2
) t
But I don't know you'll improve much to look through 400K rows in less than a second.
First, you can try to eliminate multiple (twice, to be more precise) same subquery evaluation using a Common Table Expression
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
This CTE would be evaluated once.
Second, get row count from orders table joined with cte on existence of pickup_address_id and delivery_address_id in cte.
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
SELECT COUNT (*)
FROM orders
CROSS JOIN CTE ON CTE.id = orders.delivery_address_id
OR CTE.id = orders.pickup_address_id
Finally, add filter by status = 2 and query would be like
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
SELECT COUNT (*)
FROM orders
CROSS JOIN CTE ON CTE.id = orders.delivery_address_id
OR CTE.id = orders.pickup_address_id
WHERE orders.status = 2
Also you should have the following indexes:
addresses table:
INDEX (region_id)
orders table:
INDEX (pickup_address_id),
INDEX (delivery_address_id),
INDEX (status)
Give it a try.
With empty tables I've got this
Schema (MySQL v8.0)
create table addresses (
id int primary key,
region_id int not null,
index(region_id)
);
create table orders (
id int primary key,
pickup_address_id int,
delivery_address_id int,
status int not null,
index (pickup_address_id),
index (delivery_address_id),
index(status),
foreign key (pickup_address_id) references addresses(id),
foreign key (delivery_address_id) references addresses(id)
);
Query #1
explain with cte(id) as (
select id from addresses where region_id = 12)
select count(*) from orders
cross join cte on cte.id = orders.delivery_address_id
or cte.id = orders.pickup_address_id
where status = 2;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
orders
ref
pickup_address_id,delivery_address_id,status
status
4
const
1
100
1
SIMPLE
addresses
ref
PRIMARY,region_id
region_id
4
const
1
100
Using where; Using index
In MySQL, I have a simple join between 2 tables. Something like
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
where a.id=12345
group by a.id
It runs normal as a query. But when I keep the query
select a.id, SUM(b.qty) from a inner join b on a.id=b.id
group by a.id
in a view called view_ab, the view takes enormous amount of time when i run the following query on the view.
select * from view_ab where id = 12345
Both these tables are large tables. Unable to figure out the reason for such a drop in performance. Please help resolve this performance issue
EDIT:
This is the view SQL
CREATE VIEW view_ab AS SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid GROUP BY r.drid;
This is the query
SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
This is the query on the VIEW
SELECT * FROM view_ab WHERE drid=12718651;
Execution plan of the view
EXPLAIN EXTENDED SELECT * FROM view_ab WHERE drid=12718651;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
(NULL)
ref
4
const
10
100.00
(NULL)
2
DERIVED
s
(NULL)
ALL
idx_tbl_deliverroute_sku_drid
(NULL)
(NULL)
(NULL)
15060913
100.00
USING TEMPORARY; USING filesort
2
DERIVED
r
(NULL)
eq_ref
PRIMARY,FK_tbl_deliveryroute_1
PRIMARY
4
humdemotest.s.drid
1
100.00
USING INDEX
EXPLAIN EXTENDED SELECT
r.drid AS drid,
SUM(s.return_qty) AS return_qty
FROM tbl_deliveryroute r INNER JOIN tbl_deliveryroute_sku s ON r.drid =
s.drid WHERE r.drid=12718651
GROUP BY r.drid;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
r
(NULL)
const
PRIMARY
PRIMARY
4
const
1
100.00
USING INDEX
1
SIMPLE
s
(NULL)
ref
idx_tbl_deliverroute_sku_drid
idx_tbl_deliverroute_sku_drid
4
const
22
100.00
(NULL)
From what I am seeing, you don't even need a join since you are dealing with a join on the same key column from A-B, the key already exists in table B, just query group by that. Also, I would have an index on your DeliveryRoute_SKU on its route ID column
SELECT
s.drid,
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651
group by
s.drID;
Since you are only doing the key and the sum, you don't even NEED the other table. Now if you needed other columns from the first table OTHER THAN the key, then yes, you would need the join. You could even simplify a step further since you are only querying a single key ID
SELECT
sum( s.return_qty ) Return_Qty
from
tbl_DeliveryRoute_Sku s
where
s.drID = 12718651;
The reason the view is slow is simple. You are executing:
SELECT *
FROM view_ab
WHERE drid = 12718651;
What you want to execute is:
select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
where a.id = 12345
group by a.id;
What is actually being executed is:
select ab.*
from (select a.id, SUM(b.qty)
from a inner join
b
on a.id = b.id
group by a.id
) ab
where ab.id = 12345;
That is, the entire aggregation is performed first. Then the where is applied. What you want is for the predicate to be pushed up (MySQL calls this merging). You can review the documentation on this subject.
One solution would seem to be rephrasing the query as a correlated subquery:
select a.id,
(select sum(b.qty) from b where b.id = a.id) as qty
from a
where a.id = 12345;
Alas, subqueries in the select have the same effect, so this doesn't work.
I don't know of a solution using a view. You can avoid using views for this. The ultimate solution would be to implement a trigger to store the summarized results in another table -- effectively materializing the view.
The below example is just a demonstration of the problem i am facing.
I have two tables A & B on which i want to query. There schemas are as below
create table A
(
id int,
B_id int,
D_id int,
created date
);
create table B
(
id int,
C_id int
);
Table A can have multiple rows for a given B_id.
I insert test data as below :
insert into A(id, B_id, D_id, created) values(2, 1, 0, now());
insert into A(id, B_id, D_id, created) values(3, 1, 0, now());
Now, I want to fetch the newest(whose created is having the highest value) rows in A which have B_id = 1
Now, the problem :
I tried below double inner join which did not work
select * from A
inner join B on A.B_id = B.id
inner join ( select * from A where A.B_id = B.id order by created desc limit 1) as A1 on A.id = A1.id
and A.B_id = 1;
this fails with the error "Unknown column 'B.id' in 'where clause'"
However if i replace the second inner join with a where clause as below, it works :
select * from A
inner join B on A.B_id = B.id
and A.id = ( select id from A where A.B_id = B.id order by created desc limit 1)
and A.B_id = 1;
Why can where clause access the B.id in global scope but inner join can't??
This is a good point. It took me a while to get my head around. First of all I personally would not call it "global" scope. But I've got your point :)
Here is how I understand it. Please correct me if I'm wrong.
First query: I changed your query B.id to 1, so I can run the query correctly. I changed it to the following:
select * from A
inner join B on A.B_id = B.id
inner join ( select * from A where A.B_id = 1 order by created desc limit 1) as A1 on A.id = A1.id
and A.B_id = 1;
After I changed it, I did explain select ... to see how it work. Here is what I've got.
id select_type table type possible_keys key key_len ref rows Extra
--------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> system null null null null 1 null
1 PRIMARY B ALL null null null null 1 Using where
1 PRIMARY A ALL null null null null 4 Using where; Using join buffer (Block Nested Loop)
2 DERIVED A ALL null null null null 4 Using where; Using filesort
It seems that your subquery select * from A where A.B_id = 1 order by created desc limit 1 is executed during or before INNER JOIN. B.id is not yet available since INNER JOIN hasn't been done yet.
Second query: I did the same explain select ...
id select_type table type possible_keys key key_len ref rows Extra
----------------------------------------------------------------------------------------------------
1 PRIMARY B ALL null null null null 1 Using where
1 PRIMARY A ALL null null null null 4 Using where; Using join buffer (Block Nested Loop)
2 DEPENDENT SUBQUERY A ALL null null null null 4 Using where; Using filesort
As you can see, your subquery is executed after INNER JOIN. Therefore B.id is available.
I want to do optimize this query. I have give statistics for using tables.
products and products_categories table have around 500000 record. But for below mentioned category it has 1600 record. I have create slots for this 1600 records. Every product can have minimum 1 slot and maximum 10 slots. But slot table have around 300000 record. slot table can have already expire slots also. I want to get products, which are going to expire soon come first and rest of the products come behind of this products.
I have created index for end_time column. But I used conditional operator, so index not using in this query. I want to optimize this query. kindly tell me the best way.
EXPLAIN
SELECT
xcart_products.*
FROM xcart_products
INNER JOIN xcart_products_categories
ON xcart_products_categories.productid = xcart_products.productid
LEFT JOIN (SELECT
t1.*
FROM bvira_megahour_time_slot t1
LEFT OUTER JOIN bvira_megahour_time_slot t2
ON (t1.product_id = t2.product_id
AND t1.end_time > t2.end_time
AND t1.end_time > NOW())
WHERE t2.product_id IS NULL) as bvira_megahour_time_slot
ON bvira_megahour_time_slot.product_id = xcart_products.productid
WHERE xcart_products_categories.categoryid = '4410'
AND xcart_products.saleid = 2
GROUP BY xcart_products.productid
Below is the result of explain query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY xcart_products_categories ref PRIMARY,cpm,productid,orderby,pm cpm 4 const 1523 Using index; Using temporary; Using filesort
1 PRIMARY xcart_products eq_ref PRIMARY,saleid PRIMARY 4 wwwbvira_xcart.xcart_products_categories.productid 1 Using where
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 77215
2 DERIVED t1 ALL NULL NULL NULL NULL 398907
2 DERIVED t2 ref i_product_id,i_end_time i_product_id 4 wwwbvira_xcart.t1.product_id 4 Using where; Not exists
I have rewritten your query as follows:
EXPLAIN
SELECT p.*
FROM xcart_products p
INNER JOIN xcart_products_categories c
ON c.productid = p.productid
LEFT JOIN (
SELECT t1.*
FROM bvira_megahour_time_slot t1
LEFT JOIN bvira_megahour_time_slot t2
ON ( t1.product_id = t2.product_id
AND t1.end_time > t2.end_time
AND t1.end_time > NOW()
)
WHERE t2.product_id IS NULL
) AS bvira_megahour_time_slot
ON bvira_megahour_time_slot.product_id = p.productid
WHERE c.categoryid = '4410'
AND p.saleid = 2
GROUP BY p.productid
Please make sure that you have following compound (multi-column) indexes:
bvira_megahour_time_slot: (product_id, end_time)
xcart_products: (productid, sale_id)
xcart_products_categories: (productid, category_id)
or (category_id, productid)
With these indexes, it should work much better.
I have a Link table with from_uid and to_uid (both indexed) and I want to filter out certain ids. So I do:
SELECT l.uid
FROM Link l
JOIN filter_ids t1 ON l.from_uid = t1.id
JOIN filter_ids t2 ON l.to_uid = t2.id
Now for some reason this is unexpectedly slow :( whereas each individual join is very fast. Can it not use the index right?
EXPLAIN tells me:
id select table type possible_keys key key_len ref rows Extra
1 SIMPLE t1 index Null PRIMARY 34 Null 12205 Using index
1 SIMPLE l ref from_uid,to_uid from_uid 96 func 6 Using where
1 SIMPLE t2 index Null PRIMARY 34 Null 12205 Using where; Using index; Using join buffer
No idea if it'll help but try:
select l.uid
from Link l
where l.from_uid in (select id from filter_ids)
and l.to_uid in (select id from filter_ids)
Maybe it'll make better work with indexes.
The EXPLAIN tells you that the JOIN actually starts from the t1 table. That is you need to add a new index on Link (or better extend the current from_uid index):
(from_uid, to_uid, uid)
or if uid is the primary key, just:
(from_uid, to_uid)
UPD
What you are describing is strange. You can try running just:
SELECT STRAIGHT_JOIN l.uid
FROM Link l
JOIN filter_ids t1 ON l.from_uid = t1.id
JOIN filter_ids t2 ON l.to_uid = t2.id