I wonder — if I have have an SQL transaction where in one statement I do a select query, and then in a later statement an update query, is it guaranteed that nothing has been changed by outside factors in between the two?
So I select a number of rows ⟵ in the transaction
Another procedure changes one of the rows ⟵ outside the transaction
Than I want to do the update ⟵ in the transaction
So can this happen? I know the total transaction either happens or it doesn't, but are all individual statements in 1 transaction also executed as 1 atomic unit where nothing can happen in between two different statements? Or is the only way to ensure that the database is locked in between the two statements by setting a manual table lock?
This problem is about this by the way: I transfer money from one user (the buyer) to another (the seller). However, the buyer already deposited money when placing the buy order. Now he may cancel this buy order at any moment. I will then give him back the deposited money. So now it can happen that I'm in the process of transferring the deposited money from the buyer to the seller, while the buyer cancels his order, and I give him back his money. So now the money is given to the buyer, AND to the seller. This requires some high level isolation right?
It depends on transaction isolation level.
Default isolation level on all of databases I have ever used is Read Committed and this level allows to see changes made by other committed transactions.
In contrast Serial or Snapshot isolation levels isolate current transaction from others but it does not scale as good as Read Committed.
You can change isolation level per transaction or globally on all of modern databases but I would not suggest to do it without a very good reason, Read Committed is a good isolation for typical use cases because it needs no locking for reads, Serial isolation uses heavy locking to make transactions serial instead of concurrent and it might not scale for typical uses cases.
Related
Several locks are introduced in MySQL, among which SELECT ... FROM is a consistent read, a snapshot of the read data and no lock is set (unless the transaction level is SERIALIZABLE) (https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html)
The snapshot (MVCC) is implemented in MySQL by adding a header (including transaction version and pointer) and logical visibility rules to the tuple
But we always emphasize the design principle of the visibility rule, but ignore the fact that reading and writing to the tuple Header are two mutually exclusive actions, which can only be avoided by locking.
So how does the sentence of consistent read without lock understand? Is it just that there is no lock in a broad sense? How is the atomic reading and writing of the tuple Header designed? Is there any performance overhead? Is there any information available in this regard?
----- supplementary instruction -----
When a row(tuple) is updated, the new version of the row is kept, along with the old copy/copies. Each copy has a sequence number (transaction id) with it.
The transaction ID and the pointer to the copies are stored in the row header, that is, when creating a copies, you need to modify the row header (update the transaction ID and the pointer to the copies). When accessing the row, you need to access the row header first to judge the version (location) we want to access.
Modifying the row header and visiting row header should be two mutually exclusive actions (otherwise dirty data will be read in the case of concurrent reading and writing), and what I want to know is that MySQL performs this part (row header) How is the logic of reading and writing designed? Is it a read-write lock / spin lock or is there any other clever method?
I think the answer goes something like this.
When a row is updated, the new version of the row is kept, along with the old copy/copies. Each copy has a sequence number (transaction id) with it. After both transactions COMMIT or ROLLBACK, the set of rows is cleaned up -- the surviving one is kept; the other(s) are tossed.
That sequence number has the effect of labeling rows as being part of the snapshot that of the dataset that was taken at BEGIN time.
Rows with sequence numbers that are equal to or older than the transaction in question are considered as fair game for reading by the transaction. Note that no lock is needed for such a "consistent read".
Each copy of the row has its own tuple header, with a different sequence number. The copies are chained together in a "history list". Updates/deletes to the row will add new items to the list, but leave the old copies unchanged. (Again, this points out that no lock is needed for an old read.)
Transaction isolation "read dirty" allows the transaction to go through the history list to 'see' the latest copy.
Performance overhead? Sure. Everything has a performance overhead. But... The "performance" that matters is the grand total of all actions. There is a lot of complexity, but the end result is 'better' performance.
The history list is a lot of overhead, but it helps by decreasing the locking, etc.
InnoDB uses "optimistic" locking -- That is it starts a transaction with the assumption (hope) that it will COMMIT. The cost is that ROLLBACK is less efficient. This seems like a reasonable tradeoff.
InnoDB has a lot of overhead, yet it can beat the much-simpler MyISAM Engine in many benchmarks. Faster and ACID -- May as well get rid of MyISAM. And that is the direction Oracle is taking MySQL.
I’m really new to relational databases. I’m working on a project that involves finances and so I want any actions that affect balance not to occur at the same time and I want to achieve that using locks, however I’m not sure how to use them. Vision I have now:
I want to have a separate table for each action and a balance field in users table, value of which would be derived from all the relevant tables. That being sad I’m never actually going to update existing records - only adding them. I want to make sure only one record for each user is being inserted at a time in these tables. For instance: 3 transactions occur at the same time and so 3 records are about to be added to any relevant tables. Two of the records have the same userid, a foreign key to my users table, and the other one has a different one. I want my records with the same foreign keys to be pipelined, and the other one can be done whenever. How do I achieve this? Are there any better ways to approach this?
I want any actions that affect balance not to occur at the same time
Why?
I want to achieve that using locks
Why?
To give you a counter example. Let's say you want to avoid having negative account balances. When a user withdraws 500$, how can you model that without locks.
UPDATE accounts
SET balance = balance - 500
WHERE accountholderid = 42
AND balance >= 500
This works without any explicit locks and is safe for concurrent access. You will have to check the update count, if it is 0 you would have overdrawn the account.
(I'm aware MySQL will still acquire a row lock)
It still makes sense to have a ledger but even there the need for locks is not obvious to me.
Use ENGINE=InnoDB for all your tables.
Use transactions:
BEGIN;
do all the work for a single action
COMMIT;
The classic example of a single action is to remove money from one account and add it to another account. The removing would include a check for overdraft, in which case you would have code to ROLLBACK instead of COMMIT.
The locks you get assure that everything for the single action is either completely done, or nothing at all is done. This even applies if the system crashes between the BEGIN and COMMIT.
Without begin and commit, but with autocommit=ON, each statement is implicitly surrounded by begin and commit. That is the UPDATE example in a previous answer is 'atomic'. However, if the money deducted from the one account needs to be added to another account, what will happen if a crash occurs just after the UPDATE? The money vanishes. So, you really need
BEGIN;
if not enough funds, ROLLBACK and exit
UPDATE to take money from one account
UPDATE to add that money to another account
INSERT into some log or audit trail to track all transactions
COMMIT;
Check after each step -- ROLLBACK and take evasive action on any unexpected error.
What happens if 2 (or more) actions happen at the "same time"?
One waits for the other.
There is a deadlock and a ROLLBACK is forced on it.
But, in no case, will the data be messed up.
A further note... In some cases you need FOR UPDATE:
BEGIN;
SELECT some stuff from a row FOR UPDATE;
test the stuff, such as account balance
UPDATE that same row;
COMMIT;
The FOR UPDATE says to other threads "Keep your hands off this row, I'm likely to change it; please wait until I am finished." Without FOR UPDATE, another thread could sneak in and drain the account of the money you thought was there.
Comments on some of your thoughts:
One table is usually sufficient for many users and their account. It would contain the "current" balance for each account. I mentioned a "log"; that would be a separate table; it would contain a "history" (as opposed to just the "current" info).
FOREIGN KEYs are mostly irrelevant in this discussion. They serve 2 purposes: Verify that another table has a row that should be there; and implicitly create an INDEX to make that check faster.
Pipelining? If you are not doing more than a hundred 'transactions' per second, the BEGIN..COMMIT logic is all you need to worry about.
"Same time" and "simultaneously" are misused terms. It is very unlikely that two users will hit the database at the "same time" -- consider browser delays, network delays, OS delays, etc. Plus the fact that most of those steps force activity to go single-file. The network forces one message to get there before another. Meanwhile, if one of your 'transactions' takes 0.01 second, who cares if the "simultaneous" request has to wait for it to finish. The point is that what I described will force the "wait" if needed to avoid messing up the data.
All that said, there still can be some "at the same time" -- If transactions don't touch the same rows, then the few milliseconds it takes from BEGIN to COMMIT could overlap. Consider this timeline of two transactions that came in almost simultaneously:
BEGIN; -- A
pull money from Alice -- A
BEGIN; -- B
pull money from Bobby -- B
give Alice's money to Alan -- A
give Bobby's money to Betty --B
COMMIT; --A
COMMIT; --B
I was practicing some "system design" coding questions and I was interested in how to solve a concurrency problem in MySQL. The problem was "design an inventory checkout system".
Let's say you are trying to check out a specific item from an inventory, a library book for instance.
If two people are on the website, looking to book it, is it possible that they both check it out? Let's assume the query is updating the status of the row to mark a boolean checked_out to True.
Would transactions solve this issue? It would cause the second query that runs to fail (assuming they are the same query).
Alternatively, we insert rows into a checkouts table. Since both queries read that the item is not checked out currently, they could both insert into the table. I don't think a transaction would solve this, unless the transaction includes reading the table to see if a checkout currently exists for this item that hasn't yet ended.
One of the suggested methods
How would I simulate two writes at the exact same time to test this?
No, transactions alone do not address concurrency issues. Let's quickly revisit mysql's definition of transactions:
Transactions are atomic units of work that can be committed or rolled back. When a transaction makes multiple changes to the database, either all the changes succeed when the transaction is committed, or all the changes are undone when the transaction is rolled back.
To sum it up: transactions are a way to ensure data integrity.
RDBMSs use various types of locking, isolation levels, and storage engine level solutions to address concurrency. People often mistake transactions as a mean to control concurrency because transactions affect how long certain locks are held.
Focusing on InnoDB: when you issue an update statement, mysql places an exclusive lock on the record being updated. Only the transaction holding the exclusive lock can modify the given record, the others have to wait until the transaction is committed.
How does this help you preventing multiple users checking out the same book? Let's say you have an id field uniquely identifying the books and a checked_out field indicating the status of the book.
You can use the following atomic update to check out a book:
update books set checked_out=1 where id=xxx and checked_out=0
The checked_out=0 criteria makes sure that the update only succeeds if the book is not checked out yet. So, if the above statement affects a row, then the current user checks out the book. If it does not affect any rows, then someone else has already checked out the book. The exclusive lock makes sure that only one transaction can update the record at any given time, thus serializing the access to that record.
If you want to use a separate checkouts table for reserving books, then you can use a unique index on book ids to prevent the same book being checked out more than once.
Transactions don't cause updates to fail. They cause sequences of queries to be serialized. Only one accessor can run the sequence of queries; others wait.
Everything in SQL is a transaction, single-statement update operations included. The kind of transaction denoted by BEGIN TRANSACTION; ... COMMIT; bundles a series of queries together.
I don't think a transaction would solve this, unless the transaction
includes reading the table to see if a checkout currently exists for
this item.
That's generally correct. Checkout schemes must always read availability from the database. The purpose of the transaction is to avoid race conditions when multiple users attempt to check out the same item.
SQL doesn't have thread-safe atomic test-and-set instructions like multithreaded processor cores have. So you need to use transactions for this kind of thing.
The simplest form of checkout uses a transaction, something like this.
BEGIN TRANSACTION;
SELECT is_item_available, id FROM item WHERE catalog_number = whatever FOR UPDATE;
/* if the item is not available, tell the user and commit the transaction without update*/
UPDATE item SET is_item_available = 0 WHERE id = itemIdPreviouslySelected;
/* tell the user the checkout succeeded. */
COMMIT;
It's clearly possible for two or more users to attempt to check out the same item more-or-less simultaneously. But only one of them actually gets the item.
A more complex checkout scheme, not detailed here, uses a two-step system. First step: a transaction to reserve the item for a user, rejecting the reservation if someone else has it checked out or reserved. Second step: reservation holder has a fixed amount of time to accept the reservation and check out the item, or the reservation expires and some other user may reserve the item.
Can two transactions occur at the same time? Let's say you have transactions A and B, each of which will perform a read to get the max value of some column then a write to insert a new row with that max+1. Is it possible that A performs a read to get the max, then B performs a read before A writes, causing both transactions to write the same value to the column?
Doing this with isolation level set to read uncommitted to false seems to prevent duplicates, but I can't wrap my head around why.
Can two transactions occur at the same time?
Yes, that is quite possible and in fact it is required for all the RDBMS to support that feature out of the box to speed up things. Think about an application accessed by Thousands of users simultaneously, if everything goes in sequence the users may have to wait day in order to get the response.
Let's say you have transactions A and B, each of which will perform a read to get the max value of some column then a write to insert a new row with that max+1. Is it possible that A performs a read to get the max, then B performs a read before A writes, causing both transactions to write the same value to the column?
If A & B are happening into two different sessions, its quite possible user case.
Doing this with isolation level set to read uncommitted to false seems to prevent duplicates, but I can't wrap my head around why?
I think, your requirement to get next increment number with isolation block is quite common, and here you need to instruct database to do a mutual exclusive read operation for writing operation has to happen, you could instruct the database to do it, by setting isolation, or may be 'temporary isolation' level should solve your.
If gettting next number is only problem and you don't have other constrained then
My Sql AUTO_INCREMENT would be best suited answer for you.
But it seems, you have asked this question specifically, means, you may have constrained.
Refer my similar questions and answer.
Your solution should be something like below.
begin;
select last_number from TABLE1 ... FOR UPDATE;
Read the result in App.
update TABLE1 set last_number=last_number+1 where ...;
commit;
I use a table with one row to keep the last used ID (I have my reasons to not use auto_increment), my app should work in a server farm so I wonder how I can update the last inserted ID (ie. increment it) and select the new ID in one step to avoid problems with thread safety (race condition between servers in the server farm).
You're going to use a server farm for the database? That doesn't sound "right".
You may want to consider using GUID's for Id's. They may be big but they don't have duplicates.
With a single "next id" value you will run into locking contention for that record. What I've done in the past is use a table of ranges of id's (RangeId, RangeFrom, RangeTo). The range table has a primary key of "RangeId" that is a simple number (eg. 1 to 100). The "get next id" routine picks a random number from 1 to 100, gets the first range record with an id lower than the random number. This spreads the locks out across N records. You can use 10's, 100's or 1000's of range records. When a range is fully consumed just delete the range record.
If you're really using multiple databases then you can manually ensure each database's set of range records do not overlap.
You need to make sure that your ID column is only ever accessed in a lock - then only one person can read the highest and set the new highest ID.
You can do this in C# using a lock statement around your code that accesses the table, or in your database you can put together a transaction on your read/write. I don't know the exact syntax for this on mysql.
Use a transactional database and control transactions manually. That way you can submit multiple queries without risking having something mixed up. Also, you may store the relevant query sets in stored procedures, so you can simply invoke these transactional queries.
If you have problems with performance, increment the ID by 100 and use a thread per "client" server. The thread should do the increment and hand each interested party a new ID. This way, the thread needs only access the DB once for 100 IDs.
If the thread crashes, you'll loose a couple of IDs but if that doesn't happen all the time, you shouldn't need to worry about it.
AFAIK the only way to get this out of a DB with nicely incrementing numbers is going to be transactional locks at the DB which is hideous performance wise. You can get a lockless behaviour using GUIDs but frankly you're going to run into transaction requirements in every CRUD operation you can think of anyway.
Assuming that your database is configured to run with a transaction isolation of READ_COMMITTED or better, then use one SQL statement that updates the row, setting it to the old value selected from the row plus an increment. With lower levels of transaction isolation you might need to use INSERT combined with SELECT FOR UPDATE.
As pointed out [by Aaron Digulla] it is better to allocate blocks of IDs, to reduce the number of queries and table locks.
The application must perform the ID acquisition in a separate transaction from any business logic, otherwise any transaction that needs an ID will end up waiting for every transaction that asks for an ID first to commit/rollback.
This article: http://www.ddj.com/architect/184415770 explains the HIGH-LOW strategy that allows your application to obtain IDs from multiple allocators. Multiple allocators improve concurrency, reliability and scalability.
There is also a long discussion here: http://www.theserverside.com/patterns/thread.tss?thread_id=4228 "HIGH/LOW Singleton+Session Bean Universal Object ID Generator"