I have a following query:
select
qs.*
from
(
select
tsk.id,
tsk.request_id,
tsk.hash_id
from
user_tasks as usr
inner join unassigned_tasks as tsk on usr.task_id = tsk.id
where
usr.assigned_to = 53
AND tsk.product_id NOT IN (
SELECT
product_id
FROM
product_progresses
WHERE
request_id = tsk.request_id
)
ORDER BY
tsk.id
) as qs <-- this takes about 233ms.
WHERE
qs.id = ( <-- should this subquery execute for every row from outer result-set row?
SELECT
min(qt.id)
FROM
(
select
tsk.id,
tsk.request_id,
tsk.hash_id
from
user_tasks as usr
inner join unassigned_tasks as tsk on usr.task_id = tsk.id
where
usr.assigned_to = 53
AND tsk.product_id NOT IN (
SELECT
product_id
FROM
product_progresses
WHERE
request_id = tsk.request_id
)
ORDER BY
tsk.id
) as qt <-- this takes about 233ms.
WHERE
qt.id > 83934
)
Both qs and qt takes about the same time to execute, i.e. around 233ms.
Also, executing this whole query takes about the same time.
I have a conception that inner query inside where qs.id = (...) executes once for every row in the result-set generated from qs.
In this current case, qs outputs 10 rows.
So I decided to test if the sub-query is executed for each row 10 times.
Here's what I put inside sub-query:
WHERE qs.id = (
SELECT
SLEEP(1)
)
instead of where qs.id = ( select min(qt.id) ... ).
This took about 10.246 s which proves that the inner query is being run for each row.
So with this, shouldn't the initial query take about (qt time) 233 * 10 (rows from qs) = 2330ms?
Is it because select min().. is calculated once only?
In MySQL subqueries within the FROM clause are evaluated before subqueries in the WHERE clause.
Useful article:
https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/
In your case, this is causing qs to be fully evaluated before the WHERE clause is factored in. If you want qs to be limited by qt you should be able to achieve that by unnesting qs or moving qt inside qs.
In practice though, rather than use a subquery for the WHERE clause, it would be probably better to use joins.
See SQL select only rows with max value on a column
Related
I have an SQL join function ready to implement however, if I run the snippet below only rows from wp_tours_tours are returned which also have rows in wp_tours_tours_bookings.
SELECT
wp_tours_tours.*,
SUM(wp_tours_tours_bookings.qty) AS "TotalBookings"
FROM wp_tours_tours
left join wp_tours_tours_bookings ON wp_tours_tours.tour_id=wp_tours_tours_bookings.tour_id
WHERE wp_tours_tours.post_id=12
If I use the below script without the SUM() function in the select statement it returns multiple rows (even those without any rows in wp_tours_tours_bookings)
SELECT
wp_tours_tours.*
FROM wp_tours_tours
left join wp_tours_tours_bookings ON wp_tours_tours.tour_id=wp_tours_tours_bookings.tour_id
WHERE wp_tours_tours.post_id=12
I need to get the first snippet working so that it just returns 0 for "TotalBookings" if it has no rows and to pull through all data from tours_tours even if tours_tours_bookings has no rows.
I would use a correlated subquery here:
select
t.*,
(
select coalesce(sum(tb.qty), 0)
from wp_tours_tours_bookings tb
where tb.tour_id = t.tour_id
) as totalbookings
from wp_tours_tours t
where t.post_id=12
Your query is malformed, because you have an aggregation function (SUM()) and columns that are not aggregated -- but no GROUP BY.
You need a GROUP BY:
SELECT tt.*, COALESCE(SUM(ttb.qty), 0) AS TotalBookings
FROM wp_tours_tours tt LEFT JOIN
wp_tours_tours_bookings ttb
ON tt.tour_id = ttb.tour_id
WHERE tt.post_id = 12
GROUP BY tt.tour_id; -- or whatever the primary key is
You can also use a correlated subquery (which probably has better performance):
SELECT tt.*,
(SELECT COALESCE(SUM(ttb.qty), 0)
FROM wp_tours_tours_bookings ttb
WHERE tt.tour_id = ttb.tour_id
) AS TotalBookings
FROM wp_tours_tours tt
WHERE tt.post_id = 12
I have a query which gets the correct result but it is taking 5.5 sec to get the output.. Is there any other way to write a query for this -
SELECT metricName, metricValue
FROM Table sm
WHERE createdtime = (
SELECT MAX(createdtime)
FROM Table b
WHERE sm.metricName = b.metricName
AND b.sinkName='xx'
)
AND sm.sinkName='xx'
In your code, the subselect has to be run for every result row of the outer query, which should be quite expensive. Instead, you could select your filter data in a separate query and join both accordingly:
SELECT `metricName`, `metricValue` FROM Table sm
INNER JOIN (SELECT max(`createdtime`) AS `maxTime, `metricName` from Table b WHERE b.sinkName='xx' GROUP BY `metricName` ) filter
ON (sm.`createdtime` = filter.`maxTime`) AND ( sm.`metricName` = filter.`metricName`)
WHERE sm.sinkName='xx'
Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.
I am trying to create an update query and making little progress in getting the right syntax.
The following query is working:
SELECT t.Index1, t.Index2, COUNT( m.EventType )
FROM Table t
LEFT JOIN MEvents m ON
(m.Index1 = t.Index1 AND
m.Index2 = t.Index2 AND
(m.EventType = 'A' OR m.EventType = 'B')
)
WHERE (t.SpecialEventCount IS NULL)
GROUP BY t.Index1, t.Index2
It creates a list of triplets Index1,Index2,EventCounts.
It only does this for case where t.SpecialEventCount is NULL. The update query I am trying to write should set this SpecialEventCount to that count, i.e. COUNT(m.EventType) in the query above. This number could be 0 or any positive number (hence the left join). Index1 and Index2 together are unique in Table t and they are used to identify events in MEvent.
How do I have to modify the select query to become an update query? I.e. something like
UPDATE Table SET SpecialEventCount=COUNT(m.EventType).....
but I am confused what to put where and have failed with numerous different guesses.
I take it that (Index1, Index2) is a unique key on Table, otherwise I would expect the reference to t.SpecialEventCount to result in an error.
Edited query to use subquery as it didn't work using GROUP BY
UPDATE
Table AS t
LEFT JOIN (
SELECT
Index1,
Index2,
COUNT(EventType) AS NumEvents
FROM
MEvents
WHERE
EventType = 'A' OR EventType = 'B'
GROUP BY
Index1,
Index2
) AS m ON
m.Index1 = t.Index1 AND
m.Index2 = t.Index2
SET
t.SpecialEventCount = m.NumEvents
WHERE
t.SpecialEventCount IS NULL
Doing a left join with a subquery will generate a giant
temporary table in-memory that will have no indexes.
For updates, try avoiding joins and using correlated
subqueries instead:
UPDATE
Table AS t
SET
t.SpecialEventCount = (
SELECT COUNT(m.EventType)
FROM MEvents m
WHERE m.EventType in ('A','B')
AND m.Index1 = t.Index1
AND m.Index2 = t.Index2
)
WHERE
t.SpecialEventCount IS NULL
Do some profiling, but this can be significantly faster in some cases.
my example
update card_crowd as cardCrowd
LEFT JOIN
(
select cc.id , count(1) as num
from card_crowd cc LEFT JOIN
card_crowd_r ccr on cc.id = ccr.crowd_id
group by cc.id
) as tt
on cardCrowd.id = tt.id
set cardCrowd.join_num = tt.num;