I am using innodb tables in MySQL. I want to update several rows in a table, with each row getting a different value, e.g.:
UPDATE tbl_1 SET
col1=3 WHERE id=25,
col1=5 WHERE id=26
In Postgres I believe this is possible:
UPDATE tbl_1 SET col1 = t.col1 FROM (VALUES
(25, 3)
(26, 5)
) AS t(id, col1)
WHERE tbl_1.id = t.id;
How do you do this efficiently and effectively in a transaction?
Issues I hit so far:
using an intermediate temporary MEMORY table turns out to not be transaction safe
using a TEMPORARY table - persumably MEMORY type again - is virtually undocumented and I can find no real explanation of how it works and how well it works in my case, for example any discussion on whether the table is truncated after each transaction on the session
using an InnoDB table as a temporary table and filling, joining to update and then truncating it in the transaction seems a very expensive thing to do; I've been fighting MySQL's poor throughput enough as it is
Do you update with a case and set value for col1 depending on id
UPDATE tbl_1 SET col1=CASE id WHEN 25 THEN 3 WHEN 26 THEN 5 END WHERE id IN (25,26)
Related
What does the bold text refer to? The "SELECT part acts like READ COMMITTED" part I already understand with this sql
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION; -- snapshot 1 for this transaction is created
SELECT * FROM t1; -- result is 1 row, snapshot 1 is used
-- another transaction (different session) inserts and commits new row into t1 table
SELECT * FROM t1; -- result is still 1 row, because its REPEATABLE READ, still using snapshot 1
INSERT INTO t2 SELECT * FROM t1; -- this SELECT creates new snapshot 2
SELECT * FROM t2; -- result are 2 rows
SELECT * FROM t1; -- result is still 1 row, using snapshot 1
Here: https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
The type of read varies for selects in clauses like INSERT INTO ...
SELECT, UPDATE ... (SELECT), and CREATE TABLE ... SELECT that do not
specify FOR UPDATE or FOR SHARE:
By default, InnoDB uses stronger locks for those statements and the
SELECT part acts like READ COMMITTED, where each consistent read, even
within the same transaction, sets and reads its own fresh snapshot.
I do not understand THIS, what does a stronger block mean?
InnoDB uses stronger locks for those statements
This question helped me, but I still don't understand that part of the sentence.
Prevent INSERT INTO ... SELECT statement from creating its own fresh snapshot
I've encountered an undocumented behavior of "SET #my_var = (SELECT ..)" inside a transaction:
The first one is that it locks rows ( depends whether it is a unique index or not ).
Example -
START TRANSACTION;
SET #my_var = (SELECT id from table_name where id = 1);
select trx_rows_locked from information_schema.innodb_trx;
ROLLBACKL;
The output is 1 row locked, which is strange, it shouldn't gain a reading lock.
Also, the equivalent statement SELECT id INTO #my_var won't produce a lock.
It can lead to a deadlock in case of an UPDATED after the SET statement ( for 2 concurrent requests )
In REPEATABLE READ -
The SELECT inside the SET statement gets a new snapshot of the data, instead of using the original SNAPSHOT.
SESSION 1:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START transaction;
SELECT data FROM my_table where id = 2; # Output : 2
SESSION 2:
UPDATE my_table set data = 3 where id = 2 ;
SESSION 1:
SET #data = (SELECT data FROM my_table where id = 2);
SELECT #data; # Output : 3, instead of 2
ROLLBACK;
However, I would expect that #data will contain the original value from the first snapshot ( 2 ).
If I use SELECT data into #data from my_table where id = 2 then I will get the expected value - 2;
Do you have an idea what is the source of the different behavior of SET = (SELECT ..) compared to SELECT data INTO #var FROM .. ?
Thanks.
Correct — when you SELECT in a context where you're copying the results into a variable or a table, it implicitly works as if you had used a locking read SELECT ... FOR SHARE.
This means it places a shared lock on the rows examined, and it also means that the statement reads only the most recently committed version of rows, as if your transaction were in READ-COMMITTED isolation level.
I'm not sure why SELECT ... INTO #var does not do the same kind of implicit locking in MySQL 8.0. My memory is that in older versions of MySQL it did do locking in that query form. I've searched the manual for an explanation but I can't find one yet.
Other cases that implicitly lock the rows examined by SELECT, and therefore reads data as if you transaction is READ-COMMITTED:
INSERT INTO <table> SELECT ...
UPDATE or DELETE multi-table, even if you don't update or delete a given table, the rows joined become locked.
SELECT inside a trigger
bulk update like this:
update table_name set price='x1' where sku='x1'; #updated because sku=x1 exist
update table_name set price='x2' where sku='x2'; #ignored because sku=x2 does not exist
update table_name set price='x3' where sku='x3'; #ignored because sku=x2 does not exist
...about 10000 lines...
some lines did not update anything because they do not exist, I want to know, would they make mysql slow OR just affect nothing?
If you have an index on sku, then the updates should have reasonable performance. If they are in a single transaction, then the performance should be okay.
However, you would be better off putting the new data into a table and using a join for the update.
This is from MySQL docs(link provided below)
Note
The snapshot of the database state applies to SELECT statements within a transaction, not necessarily to DML statements. If you insert or modify some rows and then commit that transaction, a DELETE or UPDATE statement issued from another concurrent REPEATABLE READ transaction could affect those just-committed rows, even though the session could not query them. If a transaction does update or delete rows committed by a different transaction, those changes do become visible to the current transaction. For example, you might encounter a situation like the following:
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz'; -- Returns 0: no rows match.
DELETE
FROM t1
WHERE c1 = 'xyz'; -- Deletes several rows recently committed by other transaction.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc'; -- Returns 0: no rows match.
UPDATE t1
SET c2 = 'cba'
WHERE c2 = 'abc'; -- Affects 10 rows: another txn just
-- committed 10 rows with 'abc' values.
SELECT COUNT(c2)
FROM t1
WHERE c2 = 'cba'; -- Returns 10: this txn can now see the rows it just
Link to docs
Could someone authoritatively answer on such question: in example above we could see that SELECT after UPDATE ables to see changes commited by different concurrent transaction. Looks like UPDATE statement freshes snapshot view for SELECT statement, right? Does it fresh whole snapshot view for subsequent SELECT statements or just fresh snapshot view of "t1" table?
I am using mySQL from their C API, but that shouldn't be relevant.
My code must process records from a table that match some criteria, and then update the said records to flag them as processed. The lines in the table are modified/inserted/deleted by another process I don't control. I am afraid in the following, the UPDATE might flag some records erroneously since the set of records matching might have changed between step 1 and step 3.
SELECT * FROM myTable WHERE <CONDITION>; # step 1
<iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1 WHERE <CONDITION> # step 3
What's the smart way to ensure that the UPDATE updates all the lines processed, and only them? A transaction doesn't seem to fit the bill as it doesn't provide isolation of that sort: a recently modified record not in the originally selected set might still be targeted by the UPDATE statement. For the same reason, SELECT ... FOR UPDATE doesn't seem to help, though it sounds promising :-)
The only way I can see is to use a temporary table to memorize the set of rows to be processed, doing something like:
CREATE TEMPORARY TABLE workOrder (jobId INT(11));
INSERT INTO workOrder SELECT myID as jobId FROM myTable WHERE <CONDITION>;
SELECT * FROM myTable WHERE myID IN (SELECT * FROM workOrder);
<iterate over the selected set of lines. This may take some time.>
UPDATE myTable SET processed=1 WHERE myID IN (SELECT * FROM workOrder);
DROP TABLE workOrder;
But this seems wasteful and not very efficient.
Is there anything smarter?
Many thanks from a SQL newbie.
There are several options:
You could lock the table
You could add an AND foo_id IN (all_the_ids_you_processed) as the update condition.
you could update before selecting and then only selecting the updated rows (i.e. by processing date)
I eventually solved this issue by using a column in that table that flags lines according to their status. This column let's me implement a simple state machine. Conceptually, I have two possible values for this status:
kNoProcessingPlanned = 0; #default "idle" value
kProcessingUnderWay = 1;
Now my algorithm does something like this:
UPDATE myTable SET status=kProcessingUnderWay WHERE <CONDITION>; # step 0
SELECT * FROM myTable WHERE status=kProcessingUnderWay; # step 1
<iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1, status=kNoProcessingPlanned WHERE status=kProcessingUnderWay # step 3
This idea of having rows in several states can be extended to as many states as needed.