Query for self join in MySQL - mysql

I have a query which gets the correct result but it is taking 5.5 sec to get the output.. Is there any other way to write a query for this -
SELECT metricName, metricValue
FROM Table sm
WHERE createdtime = (
SELECT MAX(createdtime)
FROM Table b
WHERE sm.metricName = b.metricName
AND b.sinkName='xx'
)
AND sm.sinkName='xx'

In your code, the subselect has to be run for every result row of the outer query, which should be quite expensive. Instead, you could select your filter data in a separate query and join both accordingly:
SELECT `metricName`, `metricValue` FROM Table sm
INNER JOIN (SELECT max(`createdtime`) AS `maxTime, `metricName` from Table b WHERE b.sinkName='xx' GROUP BY `metricName` ) filter
ON (sm.`createdtime` = filter.`maxTime`) AND ( sm.`metricName` = filter.`metricName`)
WHERE sm.sinkName='xx'

Related

optimizing mysql subquery execution time

I have a following query:
select
qs.*
from
(
select
tsk.id,
tsk.request_id,
tsk.hash_id
from
user_tasks as usr
inner join unassigned_tasks as tsk on usr.task_id = tsk.id
where
usr.assigned_to = 53
AND tsk.product_id NOT IN (
SELECT
product_id
FROM
product_progresses
WHERE
request_id = tsk.request_id
)
ORDER BY
tsk.id
) as qs <-- this takes about 233ms.
WHERE
qs.id = ( <-- should this subquery execute for every row from outer result-set row?
SELECT
min(qt.id)
FROM
(
select
tsk.id,
tsk.request_id,
tsk.hash_id
from
user_tasks as usr
inner join unassigned_tasks as tsk on usr.task_id = tsk.id
where
usr.assigned_to = 53
AND tsk.product_id NOT IN (
SELECT
product_id
FROM
product_progresses
WHERE
request_id = tsk.request_id
)
ORDER BY
tsk.id
) as qt <-- this takes about 233ms.
WHERE
qt.id > 83934
)
Both qs and qt takes about the same time to execute, i.e. around 233ms.
Also, executing this whole query takes about the same time.
I have a conception that inner query inside where qs.id = (...) executes once for every row in the result-set generated from qs.
In this current case, qs outputs 10 rows.
So I decided to test if the sub-query is executed for each row 10 times.
Here's what I put inside sub-query:
WHERE qs.id = (
SELECT
SLEEP(1)
)
instead of where qs.id = ( select min(qt.id) ... ).
This took about 10.246 s which proves that the inner query is being run for each row.
So with this, shouldn't the initial query take about (qt time) 233 * 10 (rows from qs) = 2330ms?
Is it because select min().. is calculated once only?
In MySQL subqueries within the FROM clause are evaluated before subqueries in the WHERE clause.
Useful article:
https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/
In your case, this is causing qs to be fully evaluated before the WHERE clause is factored in. If you want qs to be limited by qt you should be able to achieve that by unnesting qs or moving qt inside qs.
In practice though, rather than use a subquery for the WHERE clause, it would be probably better to use joins.
See SQL select only rows with max value on a column

How to speedup my update query associated with subquery

I have a query which is pretty that contains LEFT JOIN subquery. It takes 20 minutes to load completely.
Here is my query:
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, COUNT(id) AS count_files, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
WHERE oci.count_files = 1
Is there any way that I can make this query runs faster?
Move Where condition in Temp Table and replace WHERE with HAVING Clause, this will eliminate unnecessary rows from temp table so reduce the filtering and may help to improve performance
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
HAVING COUNT(id) = 1
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
I would suggest to create separate column for Order_id substring and make index on it. Then use this column in WHERE.

Performing queries on based on the output of a subquery

I am trying to perform a query based on the output of a sub-query. I would join the tables and perform one big query but that forces the query to search the entire table of one of the tables. The goal is to have the who (from table1), the what (from table2) and the when (from table3).
Here's what I have,
SELECT
DB1.TB1.`Date`,
DB1.TB1.`Sequence`,
DB1.TB1.`InstanceId`
FROM
(
SELECT
DB1.TB2.`UserName` AS USER,
DB1.TB2.`FirstName`,
DB1.TB2.`LastName`,
DB1.TB3.`ObjectName` AS OBJECT,
DB1.TB3.`ObjectType`
FROM
DB1.TB2
INNER JOIN DB1.TB3 ON DB1.TB2.`UserName` = DB1.TB3.`UserName`
INNER JOIN DB1.TB4 ON DB1.TB3.`ObjectName` = DB1.TB4.`ObjectName`
WHERE
DB1.TB4.`ADD` = 'Y' AND
DB1.TB2.`ADUC` NOT LIKE 'ServiceAccount' AND
DB1.TB3.`ObjectName` NOT IN ('ThisAdmin','ThatAdmin')
) AS MySubQuery
WHERE
DB1.TB1.`UserName` LIKE 'USER' AND
DB1.TB1.`ActionDetail` LIKE '%OBJECT%'
ORDER BY
DB1.TB1.`Date` DESC LIMIT 1

MySQL MAX(datetime) not working

I am trying to retrieve the max(date_entered) for a group of computer_ids.
The first query won't return accurate results. The second query gives me accurate results but essentially hangs unless I filter by a specific computer_id.
I'd rather use this first query
SELECT *, max(reports.date_entered)
FROM reports, hardware_reports
WHERE reports.report_id=hardware_reports.report_id
GROUP BY computer_id;
than this second query
SELECT *
FROM reports a
JOIN hardware_reports
ON a.report_id=hardware_reports.report_id
AND a.date_entered = (
SELECT MAX(date_entered)
FROM reports AS b
WHERE a.computer_id = b.computer_id)
and computer_id = 1648;
I need to either optimize second or get max to work in first.
You can alternative join it on a subquery that gets the latest record for every computer_ID.
SELECT a.*, c.*
FROM reports a
INNER JOIN
(
SELECT computer_ID, MAX(date_entered) date_entered
FROM reports
GROUP BY computer_ID
) b ON a.computer_ID = b.computer_ID
AND a.date_entered = b.date_entered
INNER JOIN hardware_reports c
ON a.report_id = c.report_id
To make it more efficient, provide an index on columns:
ALTER TABLE reports INDEX idx_report_compDate (computer_ID, date_entered)

MySQL Inner Join with where clause sorting and limit, subquery?

Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.