Here's an SQL statement (actually two statements) that works -- it's taking a series of matching rows and adding a delivery_number which increments for each row:
SELECT #i:=0;
UPDATE pipeline_deliveries AS d
SET d.delivery_number = #i:=#i+1
WHERE d.pipelineID = 11
ORDER BY d.setup_time;
But now, the client no longer wants them ordered by setup_time. They needed to be ordered according to departure time, which is a field in another table. I can't figure out how to do this.
The MySQL docs, as well as this answer, suggest that in version 4.0 and up (we're running MySQL 5.0) I should be able to do this:
SELECT #i:=0;
UPDATE pipeline_deliveries AS d RIGHT JOIN pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
SET d.delivery_number = #i:=#i+1
WHERE d.pipelineID = 11
ORDER BY r.departure_time,d.pipeline_deliveryID;
but I get the error #1221 - Incorrect usage of UPDATE and ORDER BY.
So what's the correct usage?
You can't mix UPDATE joining 2 (or more) tables and ORDER BY.
You can bypass the limitation, with something like this:
UPDATE
pipeline_deliveries AS upd
JOIN
( SELECT t.pipeline_deliveryID,
#i := #i+1 AS row_number
FROM
( SELECT #i:=0 ) AS dummy
CROSS JOIN
( SELECT d.pipeline_deliveryID
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
ORDER BY
r.departure_time, d.pipeline_deliveryID
) AS t
) AS tmp
ON tmp.pipeline_deliveryID = upd.pipeline_deliveryID
SET
upd.delivery_number = tmp.row_number ;
The above uses two features of MySQL, user defined variables and ordering inside a derived table. Because the latter is not standard SQL, it may very well break in a feature release of MySQL (when the optimizer is clever enough to figure out that ordering inside a derived table is useless unless there is a LIMIT clause). In fact the query would do exactly that in the latest versions of MariaDB (5.3 and 5.5). It would run as if the ORDER BY was not there and the results would not be the expected. See a related question at MariaDB site: GROUP BY trick has been optimized away.
The same may very well happen in any future release of main-strean MySQL (maybe in 5.6, anyone care to test this?) that will improve the optimizer code.
So, it's better to write this in standard SQL. The best would be window functions which haven't been implemented yet. But you could also use a self-join, which will be not very bad regarding efficiency, as long as you are dealing with a small subset of rows to be affected by the update.
UPDATE
pipeline_deliveries AS upd
JOIN
( SELECT t1.pipeline_deliveryID
, COUNT(*) AS row_number
FROM
( SELECT d.pipeline_deliveryID
, r.departure_time
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
) AS t1
JOIN
( SELECT d.pipeline_deliveryID
, r.departure_time
FROM
pipeline_deliveries AS d
JOIN
pipeline_routesXdeliveryID AS rXd
ON d.pipeline_deliveryID = rXd.pipeline_deliveryID
LEFT JOIN
pipeline_routes AS r
ON rXd.pipeline_routeID = r.pipeline_routeID
WHERE
d.pipelineID = 11
) AS t2
ON t2.departure_time < t2.departure_time
OR t2.departure_time = t2.departure_time
AND t2.pipeline_deliveryID <= t1.pipeline_deliveryID
OR t1.departure_time IS NULL
AND ( t2.departure_time IS NOT NULL
OR t2.departure_time IS NULL
AND t2.pipeline_deliveryID <= t1.pipeline_deliveryID
)
GROUP BY
t1.pipeline_deliveryID
) AS tmp
ON tmp.pipeline_deliveryID = upd.pipeline_deliveryID
SET
upd.delivery_number = tmp.row_number ;
Based on this documentation
For the multiple-table syntax, UPDATE updates rows in each table named
in table_references that satisfy the conditions. In this case, ORDER
BY and LIMIT cannot be used.
Without knowing too much about MySQL you could open up a cursor and process this row by row, or by passing it back to the client code (PHP,Java, etc) that you maintain to handle this processing.
After more digging:
To eliminate the badly optimized subquery, you need to rewrite the
subquery as a join, but how can you do that and retain the LIMIT and
ORDER BY? One way is to find the rows to be updated in a subquery in
the FROM clause, so the LIMIT and ORDER BY can be nested inside the
subquery. In this way work_to_do is joined against the ten
highest-priority unclaimed rows of itself. Normally you can’t
self-join the update target in a multi-table UPDATE, but since it’s
within a subquery in the FROM clause, it works in this case.
update work_to_do as target
inner join (
select w. client, work_unit
from work_to_do as w
inner join eligible_client as e on e.client = w.client
where processor = 0
order by priority desc
limit 10
) as source on source.client = target.client
and source.work_unit = target.work_unit
set processor = #process_id;
There is one downside: the rows are not locked in primary key order.
This may help explain the occasional deadlock we get on this table
The hard way:-
ALTER TABLE eav_attribute_option
ADD temp_value TEXT NOT NULL
AFTER sort_order;
UPDATE eav_attribute_option o
JOIN eav_attribute_option_value ov ON o.option_id=ov.option_id
SET o.temp_value = ov.value
WHERE o.attribute_id=90;
SET #x = 0;
UPDATE eav_attribute_option
SET sort_order = (#x:=#x+1)
WHERE attribute_id=90
ORDER BY temp_value ASC;
ALTER TABLE eav_attribute_option
DROP temp_value;
Related
For a reporting output, I used to DROP and recreate a table 'mis.pr_approval_time'. but now I just TRUNCATE it.
After populating the above table with data, I run an UPDATE statement, but I have written that as a SELECT below...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
AND scheduled_at <= h.created_at
AND TIME_TO_SEC(TIMEDIFF(h.created_at, scheduled_at)) < 91
);
When I run the above statement or even just...
SELECT t.account_id FROM mis.hj_approval_survey h INNER JOIN mis.pr_approval_time t ON h.country = t.country AND t.scheduled_at =
(
SELECT MAX(scheduled_at) FROM mis.pr_approval_time
WHERE country = h.country
);
...it runs forever and does not seem to finish. There are only ~3,400 rows in hj_approval_survey table and 29,000 rows in pr_approval_time. I run this on an Amazon AWS instance with 15+ GB RAM.
Now, if I simply right click on pr_approval_time table and choose ALTER TABLE option, and just close without doing anything, then the above queries run within seconds.
I guess when I trigger the ALTER TABLE option and Workbench populates the table fields, it probably improves its execution plan somehow, but I am not sure why. Has anyone faced anything similar to this? How can I trigger a better execution plan check without right clicking the table and choosing 'ALTER TABLE'
EDIT
It may be noteworthy to mention that my organisation also uses DOMO. Originally, I had this setup as an MySQL Dataflow on DOMO, but the query would not complete on most occassions, but I have observed it finish at times.
This was the reason why I moved this query back to our AWS MySQL RDS. So the problem has not only been observed on our own MySQL RDS, but probably also on DOMO
I suspect this is slow because of the correlated subquery (subquery depends on row values from parent table, meaning it has to execute for each row). I'd try and rework the pr_approval_time table slightly so it's point-in-time and then you can use the JOIN to pick the correct rows without doing a correlated subquery. Something like:
SELECT
hj_approval_survey.country
, hj_approval_survey.created_at
, pr_approval_time.account_id
FROM
#hj_approval_survey AS hj_approval_survey
JOIN (
SELECT
current_row.country
, current_row.scheduled_at AS scheduled_at_start
, COALESCE( MIN( next_row.scheduled_at ), GETDATE() ) AS scheduled_at_end
FROM
#pr_approval_time AS current_row
LEFT OUTER JOIN
#pr_approval_time AS next_row ON (
next_row.country = current_row.country
AND next_row.scheduled_at > current_row.scheduled_at
)
GROUP BY
current_row.country
, current_row.scheduled_at
) AS pr_approval_pit ON (
pr_approval_pit.country = hj_approval_survey.country
AND ( hj_approval_survey.created_at >= pr_approval_pit.scheduled_at_start
AND hj_approval_survey.created_at < pr_approval_pit.scheduled_at_end
)
)
JOIN #pr_approval_time AS pr_approval_time ON (
pr_approval_time.country = pr_approval_pit.country
AND pr_approval_time.scheduled_at = pr_approval_pit.scheduled_at_start
)
WHERE
TIME_TO_SEC( TIMEDIFF( hj_approval_survey.created_at, pr_approval_time.scheduled_at ) ) < 91
Assuming you have proper index on the columns involved in join
You could try refactoring your query using a grouped by subquery and join on country
SELECT t.account_id
FROM mis.hj_approval_survey h
INNER JOIN mis.pr_approval_time t ON h.country = t.country
INNER JOIN (
SELECT country, MAX(scheduled_at) max_sched
FROM mis.pr_approval_time
group by country
) z on z.contry = t.country and t.scheduled_at = z.max_sched
I have a query which is pretty that contains LEFT JOIN subquery. It takes 20 minutes to load completely.
Here is my query:
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, COUNT(id) AS count_files, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
WHERE oci.count_files = 1
Is there any way that I can make this query runs faster?
Move Where condition in Temp Table and replace WHERE with HAVING Clause, this will eliminate unnecessary rows from temp table so reduce the filtering and may help to improve performance
UPDATE orders AS o
LEFT JOIN (
SELECT obe_order_master_id, id, added
FROM customer_instalments
GROUP BY obe_order_master_id
HAVING COUNT(id) = 1
) AS oci ON oci.obe_order_master_id = SUBSTRING(o.order_id, 4)
SET o.final_customer_file_id = oci.id,
o.client_work_delivered = oci.added
I would suggest to create separate column for Order_id substring and make index on it. Then use this column in WHERE.
I am trying to compare values against same table which has more than 1,000,000 rows. Below is my query and it takes around 25 secs to get results.
EXPLAIN SELECT DISTINCT a.studyid,a.number,a.load_number,b.studyid,b.number,b.load_number FROM
(SELECT t1.*, buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1031719 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS a
JOIN
(SELECT t1.*,buildnumber,platformid FROM t t1
INNER JOIN testlog t2 ON t1.`testid` = t2.`testid`
WHERE (buildnumber =1030716 AND platformid IN (SELECT platformid FROM platform WHERE platform.`Description` = "Windows 7 SP1"))
)AS b
ON a.studyid=b.studyid AND a.load_number = b.load_number AND a.number = b.number
Could you anyone help me to improve this query to get fast enough results?
The problem here is even I have number and load_number index, the query doesn't use that. I dont know why it is always ignored..
Thanks.
First, you have a silly query. You are retrieving six columns, but there are only three values. Look at the on clause.
I think your best bet is to rewrite the query using conditional aggregation. I think the following is equivalent:
SELECT t1.studyid, t1.load_number, t1.number
FROM t t1 INNER JOIN
testlog t2
ON t1.testid = t2.testid
WHERE t2.buildnumber IN (1031719, 1030716) AND
platformid IN (SELECT platformid FROM platform p WHERE p.Description = 'Windows 7 SP1'))
GROUP BY studyid, load_number, number
HAVING MIN(buildnumber) <> MAX(buildnumber)
For this query, you want indexes on platform(Description, platformid) and testlog(buildnumber, platformid) and t(testid).
Problem #1:
IN ( SELECT ... ) optimizes very poorly. The subquery is rerun again and again. It looks like you are expecting exactly one id from that query; if so, change it to = ( SELECT ... ). That way it will be run exactly once.
Problem #2:
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
optimizes poorly because neither subquery. Can you merge the two subqueries into one, as Gordon was trying? If not, then put one of them into a TEMPORARY TABLE and add an appropriate index to that table so that the ON will be able to use it. Probably PRIMARY KEY(studyid, load_number, number).
Footnote: The latest versions of MySQL have made improvements on these problems by dynamically generating indexes. What version are you using?
I have a MySQL query that maps Users to Zones according to their location, and the zone boundaries:
UPDATE User u SET u.zoneId = (
SELECT z.id FROM Zone z
WHERE ST_Contains(z.boundary, u.location)
ORDER BY z.level DESC
LIMIT 1
);
This works fine, but it is quite slow as it's performing a subquery for every single record.
Is it possible to rewrite it using a JOIN, even though it's using ORDER BY ... LIMIT 1 in the subquery?
This ORDER BY ... LIMIT 1 is necessary as several encapsulated zones can match a location, and only the smallest one (highest level) must be assigned.
In the absence of any test data or full DDLs I can't really test this, but this might work. Using joins and a sub query, but burying the sub query an extra level down which might allow MySQL to ignore he use of the table to be updated in the sub query.
Does rely on the user table having a unique key ( I have just taken it as being called id ).
UPDATE User u
INNER JOIN Zone z
ON ST_Contains(z.boundary, u.location)
INNER JOIN
(
SELECT id, MaxLevel
FROM
(
SELECT u.id, MAX(z.level) AS MaxLevel
FROM User u
INNER JOIN Zone z
ON ST_Contains(z.boundary, u.location)
GROUP BY u.id
) Sub1
) Sub2
ON u.id = Sub2.id AND z.level = Sub2.MaxLevel
SET u.zoneId = z.id
If you could set up some test data in SQL fiddle I can test this.
Learning SQL, sorry if this is rudimentary. Trying to figure out a working UPDATE solution for the following pseudoish-code:
UPDATE tableA
SET tableA.col1 = '$var'
WHERE tableA.user_id = tableB.id
AND tableB.username = '$varName'
ORDER BY tableA.datetime DESC LIMIT 1
The above is more like SELECT syntax, but am basically trying to update a single column value in the latest row of tableA, where a username found in tableB.username (represented by $varName) is linked to its ID number in tableB.id, which exists as the id in tableA.user_id.
Hopefully, that makes sense. I'm guessing some kind of JOIN is necessary, but subqueries seem troublesome for UPDATE. I understand ORDER BY and LIMIT are off limits when multiple tables are involved in UPDATE... But I need the functionality. Is there a way around this?
A little confused, thanks in advance.
The solution is to nest ORDER BY and LIMIT in a FROM clause as part of a join. This let's you find the exact row to be updated (ta.id) first, then commit the update.
UPDATE tableA AS target
INNER JOIN (
SELECT ta.id
FROM tableA AS ta
INNER JOIN tableB AS tb ON tb.id = ta.user_id
WHERE tb.username = '$varName'
ORDER BY ta.datetime DESC
LIMIT 1) AS source ON source.id = target.id
SET col1 = '$var';
Hat tip to Baron Schwartz, a.k.a. Xaprb, for the excellent post on this exact topic:
http://www.xaprb.com/blog/2006/08/10/how-to-use-order-by-and-limit-on-multi-table-updates-in-mysql/
You can use following query syntax:
update work_to_do as target
inner join (
select w. client, work_unit
from work_to_do as w
inner join eligible_client as e on e.client = w.client
where processor = 0
order by priority desc
limit 10
) as source on source.client = target.client
and source.work_unit = target.work_unit
set processor = #process_id;
This works perfectly.