I have a MySQL query that maps Users to Zones according to their location, and the zone boundaries:
UPDATE User u SET u.zoneId = (
SELECT z.id FROM Zone z
WHERE ST_Contains(z.boundary, u.location)
ORDER BY z.level DESC
LIMIT 1
);
This works fine, but it is quite slow as it's performing a subquery for every single record.
Is it possible to rewrite it using a JOIN, even though it's using ORDER BY ... LIMIT 1 in the subquery?
This ORDER BY ... LIMIT 1 is necessary as several encapsulated zones can match a location, and only the smallest one (highest level) must be assigned.
In the absence of any test data or full DDLs I can't really test this, but this might work. Using joins and a sub query, but burying the sub query an extra level down which might allow MySQL to ignore he use of the table to be updated in the sub query.
Does rely on the user table having a unique key ( I have just taken it as being called id ).
UPDATE User u
INNER JOIN Zone z
ON ST_Contains(z.boundary, u.location)
INNER JOIN
(
SELECT id, MaxLevel
FROM
(
SELECT u.id, MAX(z.level) AS MaxLevel
FROM User u
INNER JOIN Zone z
ON ST_Contains(z.boundary, u.location)
GROUP BY u.id
) Sub1
) Sub2
ON u.id = Sub2.id AND z.level = Sub2.MaxLevel
SET u.zoneId = z.id
If you could set up some test data in SQL fiddle I can test this.
Related
For a reporting output, I used to DROP and recreate a table 'mis.pr_approval_time'. but now I just TRUNCATE it.
After populating the above table with data, I run an UPDATE statement, but I have written that as a SELECT below...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
AND scheduled_at <= h.created_at
AND TIME_TO_SEC(TIMEDIFF(h.created_at, scheduled_at)) < 91
);
When I run the above statement or even just...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
);
...it runs forever and does not seem to finish. There are only ~3,400 rows in hj_approval_survey table and 29,000 rows in pr_approval_time. I run this on an Amazon AWS instance with 15+ GB RAM.
Now, if I simply right click on pr_approval_time table and choose ALTER TABLE option, and just close without doing anything, then the above queries run within seconds.
I guess when I trigger the ALTER TABLE option and Workbench populates the table fields, it probably improves its execution plan somehow, but I am not sure why. Has anyone faced anything similar to this? How can I trigger a better execution plan check without right clicking the table and choosing 'ALTER TABLE'
EDIT
It may be noteworthy to mention that my organisation also uses DOMO. Originally, I had this setup as an MySQL Dataflow on DOMO, but the query would not complete on most occassions, but I have observed it finish at times.
This was the reason why I moved this query back to our AWS MySQL RDS. So the problem has not only been observed on our own MySQL RDS, but probably also on DOMO
I suspect this is slow because of the correlated subquery (subquery depends on row values from parent table, meaning it has to execute for each row). I'd try and rework the pr_approval_time table slightly so it's point-in-time and then you can use the JOIN to pick the correct rows without doing a correlated subquery. Something like:
SELECT
hj_approval_survey.country
, hj_approval_survey.created_at
, pr_approval_time.account_id
FROM
#hj_approval_survey AS hj_approval_survey
JOIN (
SELECT
current_row.country
, current_row.scheduled_at AS scheduled_at_start
, COALESCE( MIN( next_row.scheduled_at ), GETDATE() ) AS scheduled_at_end
FROM
#pr_approval_time AS current_row
LEFT OUTER JOIN
#pr_approval_time AS next_row ON (
next_row.country = current_row.country
AND next_row.scheduled_at > current_row.scheduled_at
)
GROUP BY
current_row.country
, current_row.scheduled_at
) AS pr_approval_pit ON (
pr_approval_pit.country = hj_approval_survey.country
AND ( hj_approval_survey.created_at >= pr_approval_pit.scheduled_at_start
AND hj_approval_survey.created_at < pr_approval_pit.scheduled_at_end
)
)
JOIN #pr_approval_time AS pr_approval_time ON (
pr_approval_time.country = pr_approval_pit.country
AND pr_approval_time.scheduled_at = pr_approval_pit.scheduled_at_start
)
WHERE
TIME_TO_SEC( TIMEDIFF( hj_approval_survey.created_at, pr_approval_time.scheduled_at ) ) < 91
Assuming you have proper index on the columns involved in join
You could try refactoring your query using a grouped by subquery and join on country
SELECT t.account_id
FROM mis.hj_approval_survey h
INNER JOIN mis.pr_approval_time t ON h.country = t.country
INNER JOIN (
SELECT country, MAX(scheduled_at) max_sched
FROM mis.pr_approval_time
group by country
) z on z.contry = t.country and t.scheduled_at = z.max_sched
I have two tables:
history
business
I want to run this query :
SELECT name, talias.*
FROM
(SELECT business.bussName as name history.*
FROM history
INNER JOIN business on history.bussID = business.bussID
WHERE history.activity = 'Insert' OR history.activity = 'Update'
UNION
SELECT name as Null, history.*
FROM history
WHERE history.activity = 'Delete'
) as talias
WHERE 1
order by talias.date DESC
LIMIT $fetch,20
this query take 13 second , I think the problem is that Mysql join all the rows at history and business tables ! While it should join just 20 rows !
how could I fix that ?
If I understand you correctly you want all rows from history where the activity is deleted plus all those rows where the activity is 'Insert' or 'Update' and there is a corresponding row in the business table.
I don't know if that is going to be faster than your query - you will need to check the execution plan to verify this.
SELECT *
FROM history
where activity = 'Delete'
or ( activity in ('Insert','Update')
AND exists (select 1
from business
where history.bussID = business.bussID))
order by `date` DESC
LIMIT $fetch,20
Edit (after the question has changed)
If you do need columns from the business table, replacing the union with an outer join might improve performance.
But to be honest, I don't expect it. The MySQL optimizer isn't very smart and I wouldn't be surprised if the outer join was actually implemented using some kind of union. Again only you can test that by looking at the execution plan.
SELECT h.*,
b.bussName as name
FROM history
LEFT JOIN business b
ON h.bussID = b.bussID
AND h.activity in ('Insert','Update')
WHERE h.activity in ('Delete', 'Insert','Update')
ORDER BY h.`date` DESC
LIMIT $fetch,20
Btw: date is a horrible column name. First because it's a reserved word, second (and more important) because it doesn't document anything. Is that the "creation date"? The "deletion date"? A "due date"? Some other date?
Try this:
SELECT h.*
FROM history AS h
WHERE (h.activity IN ('Insert', 'Update')
AND EXISTS (SELECT * FROM business AS b WHERE b.bussID = h.bussID))
OR h.activity = 'Delete'
ORDER BY h.date DESC
LIMIT $fetch, 20
For the ORDER BY and LIMIT to be efficient, make sure you have an index on history.date.
I am trying to retrieve the max(date_entered) for a group of computer_ids.
The first query won't return accurate results. The second query gives me accurate results but essentially hangs unless I filter by a specific computer_id.
I'd rather use this first query
SELECT *, max(reports.date_entered)
FROM reports, hardware_reports
WHERE reports.report_id=hardware_reports.report_id
GROUP BY computer_id;
than this second query
SELECT *
FROM reports a
JOIN hardware_reports
ON a.report_id=hardware_reports.report_id
AND a.date_entered = (
SELECT MAX(date_entered)
FROM reports AS b
WHERE a.computer_id = b.computer_id)
and computer_id = 1648;
I need to either optimize second or get max to work in first.
You can alternative join it on a subquery that gets the latest record for every computer_ID.
SELECT a.*, c.*
FROM reports a
INNER JOIN
(
SELECT computer_ID, MAX(date_entered) date_entered
FROM reports
GROUP BY computer_ID
) b ON a.computer_ID = b.computer_ID
AND a.date_entered = b.date_entered
INNER JOIN hardware_reports c
ON a.report_id = c.report_id
To make it more efficient, provide an index on columns:
ALTER TABLE reports INDEX idx_report_compDate (computer_ID, date_entered)
Everything in the following query results in one line for each invBlueprintTypes row with the correct information. But I'm trying to add something to it. See below the codeblock.
Select
blueprintType.typeID,
blueprintType.typeName Blueprint,
productType.typeID,
productType.typeName Item,
productType.portionSize,
blueprintType.basePrice * 0.9 As bpoPrice,
productGroup.groupName ItemGroup,
productCategory.categoryName ItemCategory,
blueprints.productionTime,
blueprints.techLevel,
blueprints.researchProductivityTime,
blueprints.researchMaterialTime,
blueprints.researchCopyTime,
blueprints.researchTechTime,
blueprints.productivityModifier,
blueprints.materialModifier,
blueprints.wasteFactor,
blueprints.maxProductionLimit,
blueprints.blueprintTypeID
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
So what I need to get in here is the following table with the columns below it so I can use the values timestamp and sort the entire result by profitHour
tablename: invBlueprintTypesPrices
columns: blueprintTypeID, timestamp, profitHour
I need this information with the following select in mind. Using a select to show my intention of the JOIN/in-query select or whatever that can do this.
SELECT * FROM invBlueprintTypesPrices
WHERE blueprintTypeID = blueprintType.typeID
ORDER BY timestamp DESC LIMIT 1
And I need the main row from table invBlueprintTypes to still show even if there is no result from the invBlueprintTypesPrices. The LIMIT 1 is because I want the newest row possible, but deleting the older data is not a option since history is needed.
If I've understood correctly I think I need a subquery select, but how to do that? I've tired adding the exact query that is above with a AS blueprintPrices after the query's closing ), but did not work with a error with the
WHERE blueprintTypeID = blueprintType.typeID
part being the focus of the error. I have no idea why. Anyone who can solve this?
You'll need to use a LEFT JOIN to check for NULL values in invBlueprintTypesPrices. To mimic the LIMIT 1 per TypeId, you can use the MAX() or to truly make sure you only return a single record, use a row number -- this depends on whether you can have multiple max time stamps for each type id. Assuming not, then this should be close:
Select
...
From
invBlueprintTypes As blueprints
Inner Join invTypes As blueprintType On blueprints.blueprintTypeID = blueprintType.typeID
Inner Join invTypes As productType On blueprints.productTypeID = productType.typeID
Inner Join invGroups As productGroup On productType.groupID = productGroup.groupID
Inner Join invCategories As productCategory On productGroup.categoryID = productCategory.categoryID
Left Join (
SELECT MAX(TimeStamp) MaxTime, TypeId
FROM invBlueprintTypesPrices
GROUP BY TypeId
) blueprintTypePrice On blueprints.blueprintTypeID = blueprintTypePrice.typeID
Left Join invBlueprintTypesPrices blueprintTypePrices On
blueprintTypePrice.TypeId = blueprintTypePrices.TypeId AND
blueprintTypePrice.MaxTime = blueprintTypePrices.TimeStamp
Where
blueprints.techLevel = 1 And
blueprintType.published = 1 And
productType.marketGroupID Is Not Null And
blueprintType.basePrice > 0
Order By
blueprintTypePrices.profitHour
Assuming you might have the same max time stamp with 2 different records, replace the 2 left joins above with something similar to this getting the row number:
Left Join (
SELECT #rn:=IF(#prevTypeId=TypeId,#rn+1,1) rn,
TimeStamp,
TypeId,
profitHour,
#prevTypeId:=TypeId
FROM (SELECT *
FROM invBlueprintTypesPrices
ORDER BY TypeId, TimeStamp DESC) t
JOIN (SELECT #rn:=0) t2
) blueprintTypePrices On blueprints.blueprintTypeID = blueprintTypePrices.typeID AND blueprintTypePrices.rn=1
You don't say where you are putting the subquery. If in the select clause, then you have a problem because you are returning more than one value.
You can't put this into the from clause directly, because you have a correlated subquery (not allowed).
Instead, you can put it in like this:
from . . .
(select *
from invBLueprintTypesPrices ibptp
where ibtp.timestamp = (select ibptp2.timestamp
from invBLueprintTypesPrices ibptp2
where ibptp.blueprintTypeId = ibptp2.blueprintTypeId
order by timestamp desc
limit 1
)
) ibptp
on ibptp.blueprintTypeId = blueprintType.TypeID
This identifies the most recent records for all the blueprintTypeids in the subquery. It then joins in the one that matches.
Here's an SQL statement (actually two statements) that works -- it's taking a series of matching rows and adding a delivery_number which increments for each row:
SELECT #i:=0;
UPDATE pipeline_deliveries AS d
SET d.delivery_number = #i:=#i+1
WHERE d.pipelineID = 11
ORDER BY d.setup_time;
But now, the client no longer wants them ordered by setup_time. They needed to be ordered according to departure time, which is a field in another table. I can't figure out how to do this.
The MySQL docs, as well as this answer, suggest that in version 4.0 and up (we're running MySQL 5.0) I should be able to do this:
SELECT #i:=0;
UPDATE pipeline_deliveries AS d RIGHT JOIN pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
SET d.delivery_number = #i:=#i+1
WHERE d.pipelineID = 11
ORDER BY r.departure_time,d.pipeline_deliveryID;
but I get the error #1221 - Incorrect usage of UPDATE and ORDER BY.
So what's the correct usage?
You can't mix UPDATE joining 2 (or more) tables and ORDER BY.
You can bypass the limitation, with something like this:
UPDATE
pipeline_deliveries AS upd
JOIN
( SELECT t.pipeline_deliveryID,
#i := #i+1 AS row_number
FROM
( SELECT #i:=0 ) AS dummy
CROSS JOIN
( SELECT d.pipeline_deliveryID
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
ORDER BY
r.departure_time, d.pipeline_deliveryID
) AS t
) AS tmp
ON tmp.pipeline_deliveryID = upd.pipeline_deliveryID
SET
upd.delivery_number = tmp.row_number ;
The above uses two features of MySQL, user defined variables and ordering inside a derived table. Because the latter is not standard SQL, it may very well break in a feature release of MySQL (when the optimizer is clever enough to figure out that ordering inside a derived table is useless unless there is a LIMIT clause). In fact the query would do exactly that in the latest versions of MariaDB (5.3 and 5.5). It would run as if the ORDER BY was not there and the results would not be the expected. See a related question at MariaDB site: GROUP BY trick has been optimized away.
The same may very well happen in any future release of main-strean MySQL (maybe in 5.6, anyone care to test this?) that will improve the optimizer code.
So, it's better to write this in standard SQL. The best would be window functions which haven't been implemented yet. But you could also use a self-join, which will be not very bad regarding efficiency, as long as you are dealing with a small subset of rows to be affected by the update.
UPDATE
pipeline_deliveries AS upd
JOIN
( SELECT t1.pipeline_deliveryID
, COUNT(*) AS row_number
FROM
( SELECT d.pipeline_deliveryID
, r.departure_time
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
) AS t1
JOIN
( SELECT d.pipeline_deliveryID
, r.departure_time
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
) AS t2
ON t2.departure_time < t2.departure_time
OR t2.departure_time = t2.departure_time
AND t2.pipeline_deliveryID <= t1.pipeline_deliveryID
OR t1.departure_time IS NULL
AND ( t2.departure_time IS NOT NULL
OR t2.departure_time IS NULL
AND t2.pipeline_deliveryID <= t1.pipeline_deliveryID
)
GROUP BY
t1.pipeline_deliveryID
) AS tmp
ON tmp.pipeline_deliveryID = upd.pipeline_deliveryID
SET
upd.delivery_number = tmp.row_number ;
Based on this documentation
For the multiple-table syntax, UPDATE updates rows in each table named
in table_references that satisfy the conditions. In this case, ORDER
BY and LIMIT cannot be used.
Without knowing too much about MySQL you could open up a cursor and process this row by row, or by passing it back to the client code (PHP,Java, etc) that you maintain to handle this processing.
After more digging:
To eliminate the badly optimized subquery, you need to rewrite the
subquery as a join, but how can you do that and retain the LIMIT and
ORDER BY? One way is to find the rows to be updated in a subquery in
the FROM clause, so the LIMIT and ORDER BY can be nested inside the
subquery. In this way work_to_do is joined against the ten
highest-priority unclaimed rows of itself. Normally you can’t
self-join the update target in a multi-table UPDATE, but since it’s
within a subquery in the FROM clause, it works in this case.
update work_to_do as target
inner join (
select w. client, work_unit
from work_to_do as w
inner join eligible_client as e on e.client = w.client
where processor = 0
order by priority desc
limit 10
) as source on source.client = target.client
and source.work_unit = target.work_unit
set processor = #process_id;
There is one downside: the rows are not locked in primary key order.
This may help explain the occasional deadlock we get on this table
The hard way:-
ALTER TABLE eav_attribute_option
ADD temp_value TEXT NOT NULL
AFTER sort_order;
UPDATE eav_attribute_option o
JOIN eav_attribute_option_value ov ON o.option_id=ov.option_id
SET o.temp_value = ov.value
WHERE o.attribute_id=90;
SET #x = 0;
UPDATE eav_attribute_option
SET sort_order = (#x:=#x+1)
WHERE attribute_id=90
ORDER BY temp_value ASC;
ALTER TABLE eav_attribute_option
DROP temp_value;