High level summary of the issue: Getting issues around locking of the inventory table when placing orders resulting in order failures due to timeouts.
Tracing through the checkout process, I see the following queries being executed: (comments added by me)
-- Lock stock and product tables
SELECT `si`.*, `p`.`type_id` FROM `cataloginventory_stock_item` AS `si`
INNER JOIN `catalog_product_entity` AS `p` ON p.entity_id=si.product_id
WHERE (stock_id=1) AND (product_id IN(28775, 28777)) FOR UPDATE
-- Perform the actual stock update
UPDATE `cataloginventory_stock_item`
SET `qty` =
CASE product_id
WHEN 28775 THEN qty-2
WHEN 28777 THEN qty-1
ELSE
qty
END
WHERE (product_id IN (28775, 28777)) AND (stock_id = 1)
My understanding of the FOR UPDATE modifier of a SELECT statement is that all rows in tables that were returned in the SELECT will be locked (read and write?) until the transaction is committed.
From my understanding of MySQL, the fact that the cataloginventory_stock_item query has a calculated value for the qty column (i.e. the value wasn't calculated in PHP and passed into the query, the new column value is based on the existing column value when the query is performed) means it will not be susceptible to race conditions.
My questions are:
Are my assumptions correct?
Why does Magento need a lock of catalog_product_entity in order to update the stock?
Why does Magento need a lock of cataloginventory_stock_item if the cataloginventory_stock_item UPDATE is atomic?
1) Yes, your assumptions regarding FOR UPDATE are correct, the rows selected in both cataloginventory_stock_item and catalog_product_entity will be locked for reading and writing. That is, other queries for these rows will block.
2) I don't know, and in fact it seems it doesn't.. Perhaps this to prevent race conditions when a user is manually updating stock status or similar, but I still don't see why it couldn't be removed. Another possibility is the original author intended to support multiple stock items per product and thought the "parent" should be locked.
3) Because the PHP code checks if the item is "salable" using the loaded values before issuing the update. Without locking, two processes could load the same value and then race to update the value. So even though it is atomic, the query doesn't fail properly if there was a race condition when loading the data.
Related
I read about optimistic locking scheme, where clients can read the values, perform there computation and when write needs to happen, Updates are validated before being written to database.
Lets say If we employ version mechanism for Optimistic Locks then (In case two clients) both will be having update statements as :
update tableName Set field = val, version = oldVersion +1 where
version = OldVersion and Id = x;
Now lets consider the following scenario with Two Clients :
Both Clients read the values of field and version.
Both Clients compute something at there end. Generate new value of field.
Now Both Clients send query Request to Database Server.
As soon as it reaches database :
One Client Update Query starts executing.
But in the mean time interleaving happens and other Client Update
starts executing.
Will these query interleaving causes data races at table
I mean to say, we can't say that Optimistic Lock executes on its own, for example I understand the case where row level locking happens or other locking like table level locking happens, then its fine. But then its like Optimistic Locks doesn't work on its own, it needs pessimistic lock also(row level/ table level, which totally depends on underlying Storage Engine Implementation).
What happens when there is no Row / table level locks already there, but want to implement Optimistic Locking strategy. With query interleaving will it causes data races at table.(I mean to say only field is updated and version is not and then interleaving happens. Is this totally depends on what Isolation levels are set for query)?
I'm little bit confused with this scenario.
Also what is the right use case where optimistic Locking can be really helpful and increase the overall performance of application as compared to Pessimistic Locking.
The scenario in pseudo code for the worst case scenario: Two clients update the same record:
Scenario 1 (your scenario: optimistic locking):
Final constraints are checked at the server side. Optimistic locking is used only for presentation purposes.
Client one orders a product of which there is only 1 in stock.
Client two orders the same product of which there is only 1 in stock.
Both clients get this presented on the screen.
Products table:
CREATE TABLE products (
product_id VARCHAR(200),
stock INT,
price DOUBLE(5,2)
) ENGINE=InnoDB;
Presentation code:
-- Presentation:
SELECT * FROM products WHERE product_id="product_a";
-- Presented to client
Order code:
-- Verification of record (executed in the same block of code within
-- an as short time interval as possible):
SELECT stock FROM products WHERE product_id="product_a";
IF(stock>0) THEN
-- Client clicks "order" (one click method=also payment);
START TRANSACTION;
-- Gets a record lock
SELECT * FROM products WHERE product_id="product_a" FOR UPDATE;
UPDATE products SET stock=stock-1 WHERE product_id="product_a";
INSERT INTO orders (customer_id,product_id,price)
VALUES (customer_1, "product_a",price);
COMMIT;
END IF;
The result of this scenario is that both orders can succeed: They both get the stock>0 from the first select, and then execute the order placement. This is an unwanted situation (in almost any scenario). So this would then have to be addressed in code by cancelling the order, taking a few more transactions.
Scenario 2: Alternative to optimistic locking:
Final constraints are checked at the database side. Optimistic locking is used only for presentation purposes. Less database queries then in the previous optimistic locking scenario, less chance of redos.
Client one orders a product of which there is only 1 in stock.
Client two orders the same product of which there is only 1 in stock.
Both clients get this presented on the screen.
Products table:
CREATE TABLE products (
product_id VARCHAR(200),
stock INT,
price DOUBLE(5,2),
CHECK (stock>=-1) -- The constraint preventing ordering
) ENGINE=InnoDB;
Presentation code:
-- Presentation:
SELECT * FROM products WHERE product_id="product_a";
-- Presented to client
Order code:
-- Client clicks "order" (one click method=also payment);
START TRANSACTION;
-- Gets a record lock
SELECT * FROM products WHERE product_id="product_a" FOR UPDATE;
UPDATE products SET stock=stock-1 WHERE product_id="product_a";
INSERT INTO orders (customer_id,product_id,price)
VALUES (customer_1, "product_a",price);
COMMIT;
So now two customers get presented this product, and click order on the same time. The system executes both orders simultaneous. The result will be: One order will be placed, the other gets an exception since the constraint will fail to verify, and the transaction will be aborted. This abort (exception) will have to be handled in code but does not take any further queries or transactions.
It is unclear to me (by reading MySQL docs) if the following query ran on INNODB tables on MySQL 5.1, would create WRITE LOCK for each of the rows the db updates internally (5000 in total) or LOCK all the rows in the batch. As the database has really heavy load, this is very important.
UPDATE `records`
INNER JOIN (
SELECT id, name FROM related LIMIT 0, 5000
) AS `j` ON `j`.`id` = `records`.`id`
SET `name` = `j`.`name`
I'd expect it to be per row but as I do not know a way to make sure it is so, I decided to ask someone with deeper knowledge. If this is not the case and the db would LOCK all the rows in the set, I'd be thankful if you give me explanation why.
The UPDATE is running in transaction - it's an atomic operation, which means that if one of the rows fails (because of unique constrain for example) it won't update any of the 5000 rows. This is one of the ACID properties of a transactional database.
Because of this the UPDATE hold a lock on all of the rows for the entire transaction. Otherwise another transaction can further update the value of a row, based on it's current value (let's say update records set value = value * '2'). This statement should produce different result depending if the first transaction commits or rollbacks. Because of this it should wait for the first transaction to complete all 5000 updates.
If you want to release the locks, just do the update in (smaller) batches.
P.S. autocommit controls if each statement is issued in own transaction, but does not effect the execution of a single query
I am using mysql database, and now I must make sure that a value in a column should be 1 while it may be 1 already. So, considering the following two statements:
UPDATE category SET is_leaf=1 WHERE id=9
or
UPDATE category SET is_leaf=1 WHERE id=9 AND is_leaf=0
id is the primary key, the difference is when is_leaf is already 1, then to update it or not, which is more efficient?
I know it doesn't matter a lot, but I want to find out to better understanding mysql.
UPDATE category SET is_leaf=1 WHERE id=9 AND is_leaf=0
is more efficient. because then it update only relevant data. otherwise even if the is_leaf=1 records are also going to update.
Assume that there are 1000 record on the table and it take 1s to update one record. if u trying to update all records then it will take 1000 S. but assume in this scenario there are is_leaf=0 record count is 150 then if you use this second statement it will take only 150 seconds instead of 1000 s.
Edit :
Queries With Search Arguments (SARGs)
A WHERE clause helps you to restrict the number of rows returned by a query. However, the manner in which the WHERE condition is specified can impact the performance of the query. If the WHERE condition is written such that it uses a function that takes an indexed column as the input, then the index is ignored and the entire table is scanned. This results in performance degradation.
For example, the following results in a table scan because the column OrderDate is used in a function:
SELECT CustomerID, EmployeeID FROM Orders
WHERE DATEDIFF(m, OrderDate, GetDate())>3
If the function is rewritten as shown below, then the query seeks the required value using an index and this improves performance:
SELECT CustomerID, EmployeeID FROM Orders
WHERE OrderDate < DATEADD(m, -3, GetDate())
The filter criteria in the second query is said to use a Searchable Argument or SARG because the query optimizer can use an index seek operation during execution.
for more information about this you better read improving query performance
and also read this for Speeding up Searches and Filters
The query with the AND is_leaf=0 included can be more efficient sometimes.
It's not going to make much difference when locating the row based on the primary key. (The availability of an index on (id,is_leaf) might make a small difference.) But as soon as MySQL identifies that there is no row to be updated, it can take a shorter code path.
Absent that predicate, however, MySQL is going to have to locate the row, obtain a row lock (in the case of InnoDB), and fire any 'BEFORE UPDATE FOR EACH ROW' trigger. Then MySQL has to check if any column values are actually being changed (note that the execution of the trigger may be setting one or more columns to different values). If MySQL detects there is no change to the row, it can skip the setting of any 'ON UPDATE' timestamp column, fire off any 'AFTER UPDATE FOR EACH ROW' trigger, and set the affected rows count to zero. (In the context of a transaction, it's not clear if MySQL can release the row lock once it determines the row is not being changed, or whether a row lock will continue to be held until the commit or rollback.)
So, one big difference between the two statements is that MySQL will fire the 'FOR EACH ROW' triggers even if there are no actual changes to the row; but it won't fire any 'FOR EACH ROW' triggers for rows that are excluded by the WHERE clause.
In the simple case, absent any triggers, I don't expect there is any measurable difference in performance.
My personal preference is to include the extra predicate. This ensures that no (InnoDB) intent row locks will be requested or held, and no FOR EACH ROW triggers will be fired.
And apart from the row locking and trigger execution, as far as those two statements being exactly the same, they aren't really. At least not in the general case, where there is a possibility that is_leaf can contain a NULL or a value other than 0 or 1.
Given this statement:
UPDATE category SET is_leaf=1 WHERE id=9
For an equivalent statement that sets is_leaf to 1 whenever it is not already equal to 1, we would actually need to check for NULL and any value different than 1, such as:
UPDATE category SET is_leaf=1 WHERE id=9 AND NOT (is_leaf <=> 1)
Consider what happens when is_leaf is NULL or 2, for example, with this statement:
UPDATE category SET is_leaf=1 WHERE id=9 AND is_leaf=0
I need a little help with SELECT FOR UPDATE (resp. LOCK IN SHARE MODE).
I have a table with around 400 000 records and I need to run two different processing functions on each row.
The table structure is appropriately this:
data (
`id`,
`mtime`, -- When was data1 set last
`data1`,
`data2` DEFAULT NULL,
`priority1`,
`priority2`,
PRIMARY KEY `id`,
INDEX (`mtime`),
FOREIGN KEY ON `data2`
)
Functions are a little different:
first function - has to run in loop on all records (is pretty fast), should select records based on priority1; sets data1 and mtime
second function - has to run only once on each records (is pretty slow), should select records based on priority2; sets data1 and mtime
They shouldn't modify the same row at the same time, but the select may return one row in both of them (priority1 and priority2 have different values) and it's okay for transaction to wait if that's the case (and I'd expect that this would be the only case when it'll block).
I'm selecting data based on following queries:
-- For the first function - not processed first, then the oldest,
-- the same age goes based on priority
SELECT id FROM data ORDER BY mtime IS NULL DESC, mtime, priority1 LIMIT 250 FOR UPDATE;
-- For the second function - only processed not processed order by priority
SELECT if FROM data ORDER BY priority2 WHERE data2 IS NULL LIMIT 50 FOR UPDATE;
But what I am experiencing is that every time only one query returns at the time.
So my questions are:
Is it possible to acquire two separate locks in two separate transactions on separate bunch of rows (in the same table)?
Do I have that many collisions between first and second query (I have troubles debugging that, any hint on how to debug SELECT ... FROM (SELECT ...) WHERE ... IN (SELECT) would be appreciated )?
Can ORDER BY ... LIMIT ... cause any issues?
Can indexes and keys cause any issues?
Key things to check for before getting much further:
Ensure the table engine is InnoDB, otherwise "for update" isn't going to lock the row, as there will be no transactions.
Make sure you're using the "for update" feature correctly. If you select something for update, it's locked to that transaction. While other transactions may be able to read the row, it can't be selected for update, updated or deleted by any other transaction until the lock is released by the original locking transaction.
To keep things clean, try explicitly starting a transaction using "START TRANSACTION", run your select "for update", do whatever you're going to do to the records that are returned, and finish up by explicitly executing a "COMMIT" to close out the transaction.
Order and limit will have no impact on the issue you're experiencing as far as I can tell, whatever was going to be returned by the Select will be the rows that get locked.
To answer your questions:
Is it possible to acquire two separate locks in two separate transactions on separate bunch of rows (in the same table)?
Yes, but not on the same rows. Locks can only exist at the row level in one transaction at a time.
Do I have that many collisions between first and second query (I have troubles debugging that, any hint on how to debug SELECT ... FROM (SELECT ...) WHERE ... IN (SELECT) would be appreciated )?
There could be a short period where the row lock is being calculated, which will delay the second query, however unless you're running many hundreds of these select for updates at once, it shouldn't cause you any significant or noticable delays.
Can ORDER BY ... LIMIT ... cause any issues?
Not in my experience. They should work just as they always would on a normal select statement.
Can indexes and keys cause any issues?
Indexes should exist as always to ensure sufficient performance, but they shouldn't cause any issues with obtaining a lock.
All points in accepted answer seem fine except below 2 points:
"whatever was going to be returned by the Select will be the rows that get locked." &
"Can indexes and keys cause any issues?
but they shouldn't cause any issues with obtaining a lock."
Instead all the rows which are internally read by DB during deciding which rows to select and return will be locked. For example below query will lock all rows of the table but might select and return only few rows:
select * from table where non_primary_non_indexed_column = ? for update
Since there is no index, DB will have to read the entire table to search for your desired row and hence lock entire table.
If you want to lock only one row either you need to specify its primary key or an indexed column in the where clause. Thus indexing becomes very important in case of locking only the appropriate rows.
This is a good reference - https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html
I was wondering as to
how do I ensure multiple queries are executed
or rollback to original state if one of the query fail
For example:
$qry1 = insert into table 1 (coloum1,coloum2) values(a,b);
$qry2 = update table 2 set coloum3 = coloum3 - 1;
Similarily there are about 4 queries which ae to be executed.
in a scenario like:
inserting item sales to items table.
updating the stock of all those items in the stock balance table.
inseting the journal entries to the journal tables.
and so on.
Basically either all the queries should run or none of them runs.
Transactions are the way to go. http://www.devshed.com/c/a/MySQL/Using-Transactions-In-MySQL-Part-1/
It's called a transaction.
What you're looking for are MySQL transactions. More info in their manual.