I tried to come up with a query that updates records in a MySQL table using other records in the same table, but I had mixed results between local testing and production. I don't know much about subqueries, so I want to bring this question here. In local development with MySQL InnoDB 5.6.23, a query on a dataset of about 180k records take 25 to 30 seconds. On a staging server with MySQL InnoDB 5.5.32 and a dataset of 254k records, the query seems to stall for hours until it's stopped, taking 100% of a CPU core.
This is the query I came up with:
UPDATE
`product_lang` AS `pl1`
SET
pl1.`name` = (
SELECT pl2.`name` FROM (SELECT `name`, `id_product`, `id_lang` FROM `product_lang`) AS `pl2`
WHERE pl1.`id_product` = pl2.`id_product`
AND pl2.`id_lang` = 1
)
WHERE
pl1.`id_lang` != 1
The objective is to replace the value of name in product records where id_lang is not 1 (default language for the sake of explaining) with the value of name of records value with the default id_lang of 1.
I know that subqueries are inefficient, but I really don't know how to solve this problem, and it would be a great plus to leave this in SQL-land instead of using the app layer to do the heavy lifting.
If you write the query like this:
UPDATE product_lang pl1
SET pl1.name = (SELECT pl2.`name`
FROM (SELECT `name`, `id_product`, `id_lang`
FROM `product_lang`
) `pl2`
WHERE pl1.`id_product` = pl2.`id_product` AND pl2.`id_lang` = 1
)
WHERE pl1.`id_lang` <> 1
Then you have a problem. The only index that can help is on product_lang(id_lang).
I would recommend writing this as a join:
UPDATE product_lang pl1 join
(select id_product, pl.name
from product_lang
where id_lang = 1
) pl2
on pl1.id_lang <> 1 and pl2.id_product = pl1.id_product
SET pl1.name = pl2.name
WHERE pl1.id_lang <> 1
The index that you want for this query is product_lang(id_lang, id_product) and product_lang(id_product). However, this seems like a strange update, because it will set all the names to the name from language 1.
UPDATE product_lang AS pl1
JOIN product_lang AS pl2
ON pl1.`id_product` =
pl2.`id_product`
SET pl1.name = pl2.name
WHERE pl2.`id_lang` = 1
AND pl1.`id_lang` != 1;
And have INDEX(id_lang, id_product).
Make sure that there is an index specifying columns id _ product and id_lang.
update pl1
set pl1.name=pl2.name
from product_lang pl1
,product_lang pl2
where pl1.id_product = pl2.id_product AND pl2.id_lang = 1
and pl1.id_lang <> 1
The composit index that will be required will be id_product and id_lang for
Related
I would like to optimize the following MySQL Query, It is picking data from 2 tables where authorchecklist table has records in details while selectedrevlist table is using for the reviewer's reviews.
I want to get those records from selectedrevlist where scoreSubmit = 1
SELECT *
FROM `authorchecklist` acl
WHERE acl.manuscriptStatus = 'Awaiting Reviewer Assignment'
AND acl.submitStatus = '1'
AND
( SELECT COUNT( 1 )
FROM selectedrevlist srl
WHERE srl.OrderNumber = acl.OrderNumber
AND srl.editorType = 'Editor'
AND srl.editorID = '10'
AND srl.scoreSubmit = '1'
) = 1
The above query is working fine but it takes aprox 20 seconds to load the records.
This is your query:
SELECT acl.*
FROM authorchecklist acl
WHERE acl.manuscriptStatus = 'Awaiting Reviewer Assignment' AND
acl.submitStatus = 1 AND
(SELECT COUNT(1)
FROM selectedrevlist srl
WHERE srl.OrderNumber = acl.OrderNumber AND
srl.editorType = 'Editor' AND
srl.editorID = 10 AND
srl.scoreSubmit = 1
) = 1 ;
For this query, you want indexes on authorchecklist(submitStatus, manuscriptStatus, OrderNumber) and selectedrevlist(OrderNumber, editorId, scoreSubmit).
I rearranged the query to make it easier to read:
SELECT *
FROM `authorchecklist` acl
WHERE acl.manuscriptStatus = 'Awaiting Reviewer Assignment'
AND acl.submitStatus = '1'
AND (
SELECT COUNT( 1 )
FROM selectedrevlist srl
WHERE srl.OrderNumber = acl.OrderNumber
AND srl.editorType = 'Editor'
AND srl.editorID = '10' AND srl.scoreSubmit = '1'
) = 1
I'm assuming there's only one selectedrevlist for each authorchecklist.
You didn't send the tables definitions ("CREATE TABLE ..."), but probably at least one of those fields are not indexed:
authorchecklist.manuscriptStatus
selectedrevlist.OrderNumber
If they're not indexed, the SQL server will need to transverse all the records. It will transverse all the authorchecklist rows, and for each authorchecklist row, it will transverse all the selectedrevlist rows to find the "srl.OrderNumber = acl.OrderNumber". Indexes may make the insertions a bit slower, but they speed up readings if they're used appropriately.
[removed wrong assertion]
If you're using MySQL, add the "LIMIT 1" when you're sure that there will always be fetched only one record. Also, use the InnoDB engine and add foreign keys - those are indexes that validate the relations.
Have a look at these:
https://logicalread.com/optimize-mysql-indexes-mc12/#.WsDpLtP4_Eg
https://www.eversql.com/choosing-the-best-indexes-for-mysql-query-optimization/
https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
https://logicalread.com/mysql-foreign-keys-mc13/#.WsDrx9P4_Eg
Add these indexes:
ALTER TABLE
`authorchecklist`
ADD
INDEX `authorchecklist_idx_manuscriptstatu_submitstatus` (`manuscriptStatus`, `submitStatus`);
ALTER TABLE
`authorchecklist`
ADD
INDEX `authorchecklist_idx_ordernumber` (`OrderNumber`);
ALTER TABLE
`selectedrevlist`
ADD
INDEX `selectedrevlist_idx_editort_editori_scoresu_ordernu` (
`editorType`,
`editorID`,
`scoreSubmit`,
`OrderNumber`
);
Use EXISTS instead of a subquery that counts records.
Exists subquery will exit once something is found, instead of counting all rows matching the filters.
SELECT
*
FROM
`authorchecklist` acl
WHERE
acl.manuscriptStatus = 'Awaiting Reviewer Assignment'
AND acl.submitStatus = '1'
AND EXISTS (SELECT *
FROM
selectedrevlist srl
WHERE
srl.OrderNumber = acl.OrderNumber
AND srl.editorType = 'Editor'
AND srl.editorID = '10'
AND srl.scoreSubmit = '1')
I've this query:
"explain UPDATE requests R JOIN profile as P ON R.intern_id = P.intern_id OR R.intern_id_decoded = P.intern_id OR R.intern_id_full_decode = P.intern_id SET R.found_id=P.id WHERE R.id >= 28000001 AND R.id <= 28000001+2000000 AND R.found_id is NULL"
1 UPDATE R NULL range PRIMARY,intern_id_customer_id_batch_num,id_found_id PRIMARY 4 NULL 3616888 10.00 Using where
1 SIMPLE P NULL ALL intern_id_dt_snapshot,intern_id NULL NULL NULL 179586254 27.10 Range checked for each record (index map: 0x6)
That query takes about 40 seconds to execute, it's updating 5000-10000 rows from the set of 2 million rows.
I am currently updating in 2 million row "jobs" to make the join perform faster.
The whole table is 170 million records currently.
The EXPLAIN shows the second part without using an INDEX, I am not sure if that's right or not.
The intern_id fields are varchars, found_id and id are INT
Does the explain output look like it's working performantly ?
I noticed the second line does not use an index, not sure if that's normal.
I would do this logic using multiple joins:
UPDATE requests r LEFT JOIN
profile p1
ON r.intern_id = p1.intern_id LEFT JOIN
profile p2
ON r.intern_id_decoded = p2.intern_id AND p1.id IS NULL LEFT JOIN
profile p3
ON r.intern_id_full_decode = p3.intern_id AND p2.id IS NULL
SET r.found_id = COALESCE(p1.id, p2.id, p3.id)
WHERE R.id >= 28000001 AND R.id <= 28000001 + 2000000 AND
R.found_id is NULL;
Databases are very bad at optimizing OR in JOIN conditions. It might be better with explicit JOINs.
The ON conditions also ensure only the first match.
I would do 3 chunked-up UPDATEs -- one for each of the ON conditions.
10K rows to update is excessive; crank it down to perhaps 1K. That means cranking the chunking down to 200K. (The speed might even be faster.)
UPDATE ... ON P.intern_id = R.intern_id SET ... WHERE ...
UPDATE ... ON P.intern_id = R.intern_id_decoded SET ... WHERE ...
UPDATE ... ON P.intern_id = R.intern_id_full SET ... WHERE ...
(The range is the same fore each set of 3, thereby helping with caching of R.)
Possibly INDEX(found_id) would help, but this is not a given.
See here for more chunking suggestions, especially the tip on finding 1000 rows before starting the operation:
SELECT id WHERE id > ... AND found_id IS NULL LIMIT 1000,1;
Then using that as the limit instead of the 2-millionth. A goal here is to even out the number of rows updated.
Let's assume I have the following tables:
items table
item_id|view_count
item_views table
view_id|item_id|ip_address|last_view
What I would like to do is:
If last view of item with given item_id by given ip_address was 1+ hour ago I would like to increment view_count of item in items table. And as a result get the view count of item. How I will do it normally:
q = SELECT count(*) FROM item_views WHERE item_id='item_id' AND ip_address='some_ip' AND last_view < current_time-60*60
if(q==1) then q = UPDATE items SET view_count = view_count+1 WHERE item_id='item_id'
//and finally get view_count of item
q = SELECT view_count FROM items WHERE item_id='item_id'
Here I used 3 SQL queries. How can I merge it into one SQL query? And how can it affect the processing time? Will it be faster or slower than previous method?
I don't think your logic is correct for what you describe that you want. The query:
SELECT count(*)
FROM item_views
WHERE item_id='item_id' AND
ip_address='some_ip' AND
last_view < current_time-60*60
is counting the number of views longer ago than your time frame. I think you want:
last_view > current_time-60*60
and then have if q = 0 on the next line.
MySQL is pretty good with the performance of not exists, so the following should work well:
update items
set view_count = view_count+1
WHERE item_id='item_id' and
not exists (select 1
from item_views
where item_id='item_id' AND
ip_address='some_ip' AND
last_view > current_time-60*60
)
It will work much better with an index on item_views(item_id, ip_address, last_view) and an index on item(item_id).
In MySQL scripting, you could then write:
. . .
set view_count = (#q := view_count+1)
. . .
This would also give you the variable you are looking for.
update target
set target.view_count = target.view_count + 1
from items target
inner join (
select item_id
from item_views
where item_id = 'item_id'
and ip_address = 'some_ip'
and last_view < current_time - 60*60
) ref
on ref.item_id = target.item_id;
You can only combine the update statement with the condition using a join as in the above example; but you'll still need a separate select statement.
It may be slower on very large set and/or unindexed table.
The following query is constantly timing out, is there a less overhead way to achieve the same function ?
UPDATE Invoices SET ispaid = 0
WHERE Invoice_number IN (SELECT invoice_number
FROM payment_allocation
WHERE transactionID=305)
What I'm doing is unallocating invoices from a transaction, there can be up to 30+ records returned but it stops the database dead everytime I try to run it
USE JOIN instead of subquery it will improve the performance.
Create index on Invoice_number column in both table if you haven't created.
Try this:
UPDATE Invoices i
INNER JOIN payment_allocation pa ON i.Invoice_number = pa.invoice_number
SET i.ispaid = 0
WHERE pa.transactionID = 305;
I'd try EXISTS :
UPDATE Invoices a set ispaid=0
WHERE EXISTS
(
SELECT NULL FROM payment_allocation b
WHERE b.Invoice_number =a.Invoice_number AND b.transactionID=305
)
As of MySQL 5.5, Subquery Selects (another full select statement inside the query) cannot be optimized. This is probably why your query is so slow. Refactor you query to get rid of the inner select statement.
UPDATE Invoices, payment_allocation
SET ispaid=0
WHERE payment_allocation.transactionID=305 AND
Invoices.Invoice_number = payment_allocation.invoice_number
An interesting sidenote... But MariaDB (a branch of MySQL by the original creator) has implemented Subquery select optimization.
UPDATE invoices i
JOIN payment_allocation pa
ON pa.invoice_number = i.invoice_number
SET i.ispaid=0
WHERE pa.transactionID = 305;
I have a table like this (MySQL 5.0.x, MyISAM):
response{id, title, status, ...} (status: 1 new, 3 multi)
I would like to update the status from new (status=1) to multi (status=3) of all the responses if at least 20 have the same title.
I have this one, but it does not work :
UPDATE response SET status = 3 WHERE status = 1 AND title IN (
SELECT title FROM (
SELECT DISTINCT(r.title) FROM response r WHERE EXISTS (
SELECT 1 FROM response spam WHERE spam.title = r.title LIMIT 20, 1)
)
as u)
Please note:
I do the nested select to avoid the famous You can't specify target table 'response' for update in FROM clause
I cannot use GROUP BY for performance reasons. The query cost with a solution using LIMIT is way better (but it is less readable).
EDIT:
It is possible to do SELECT FROM an UPDATE target in MySQL. See solution here
The issue is on the data selected which is totaly wrong.
The only solution I found which works is with a GROUP BY:
UPDATE response SET status = 3
WHERE status = 1 AND title IN (SELECT title
FROM (SELECT title
FROM response
GROUP BY title
HAVING COUNT(1) >= 20)
as derived_response)
Thanks for your help! :)
MySQL doesn't like it when you try to UPDATE and SELECT from the same table in one query. It has to do with locking priorities, etc.
Here's how I would solve this problem:
SELECT CONCAT('UPDATE response SET status = 3 ',
'WHERE status = 1 AND title = ', QUOTE(title), ';') AS sql
FROM response
GROUP BY title
HAVING COUNT(*) >= 20;
This query produces a series of UPDATE statements, with the quoted titles that deserve to be updated embedded. Capture the result and run it as an SQL script.
I understand that GROUP BY in MySQL often incurs a temporary table, and this can be costly. But is that a deal-breaker? How frequently do you need to run this query? Besides, any other solutions are likely to require a temporary table too.
I can think of one way to solve this problem without using GROUP BY:
CREATE TEMPORARY TABLE titlecount (c INTEGER, title VARCHAR(100) PRIMARY KEY);
INSERT INTO titlecount (c, title)
SELECT 1, title FROM response
ON DUPLICATE KEY UPDATE c = c+1;
UPDATE response JOIN titlecount USING (title)
SET response.status = 3
WHERE response.status = 1 AND titlecount.c >= 20;
But this also uses a temporary table, which is why you try to avoid using GROUP BY in the first place.
I would write something straightforward like below
UPDATE `response`, (
SELECT title, count(title) as count from `response`
WHERE status = 1
GROUP BY title
) AS tmp
SET response.status = 3
WHERE status = 1 AND response.title = tmp.title AND count >= 20;
Is using GROUP BY really that slow ? The solution you tried to implement looks like requesting again and again on the same table and should be way slower than using GROUP BY if it worked.
This is a funny peculiarity with MySQL - I can't think of a way to do it in a single statement (GROUP BY or no GROUP BY).
You could select the appropriate response rows into a temporary table first then do the update by selecting from that temp table.
you'll have to use a temporary table:
create temporary table r_update (title varchar(10));
insert r_update
select title
from response
group
by title
having count(*) < 20;
update response r
left outer
join r_update ru
on ru.title = r.title
set status = case when ru.title is null then 3 else 1;