Is there a mechanism in MySQL (5.6 or later) to have a transaction (or statement) to volunteer to be a victim in the case it is involved in a deadlock?
With InnoDB deadlock detection on, when a deadlock is identified, InnoDB determines which transaction to kill (to be the victim) in order to allow the other transaction(s) to proceed. There's an algorithm used to determine which transaction is the victim.
My question is whether there is any syntax that we can use in a statement that will influence the algorithm, that basically tells InnoDB "if this statement/transaction is involved in a deadlock, then pick me as the victim."
"It is a far, far better thing I do..." — TRX #8675309
The current algorithm for which transaction is killed in case of a deadlock is that the transaction that has changed fewer rows is killed. In case of a tie, the choice is made arbitrarily by InnoDB internal code; we do not know the reason for the choice.
So the only way one transaction can "volunteer" is to change fewer rows than the other transaction.
Related
What is the default implementation of concurrency control in MySQL? Is it optimistic locking (multi version concurrency control), or pessimistic locking (2 phase locking)? More specifically, how does InnoDb do it?
Internally, how does mysql (with innodb) decide on the start of a transaction whether to lock the row, or rollback after a conflict?
InnoDB uses optimistic locking.
There is no locking at the start of a transaction. How would it know which rows to lock until you execute a specific query? It doesn't even know which table(s) that you will eventually need to lock rows in.
There is no need for a rollback after a lock conflict. If you do a query in one transaction that has to wait because another session holds the lock, then your query waits up to a certain number of seconds (per the config option innodb_lock_wait_timeout, default 50 seconds).
If the other session commits before the timeout, then your session stops waiting, acquires the locks it needs, and proceeds with the query.
If your wait times out before the other session commits, your query returns an error. This still does NOT rollback your transaction; previous changes you made during your transaction are still able to be committed. You can even try the query that timed out again.
Exception: in cases of deadlock, InnoDB chooses one of the transactions involved in the deadlock, and forcibly does a rollback on one of them. It tries to choose the transaction that has modified fewer rows. If the transactions are tied, then the choice is arbitrary.
I was reading about database locking(pessimistic,optimistic) mechanism,
session 1:
t1: open transaction:
t2: sleep(3 sec)
t5: updte user set name='x' where id =1
session 2:
t2:update user set name='y' where id=1
my doubts are:
1. What will happen at t5
2. does It has to do any thing with Isolation level?if yes what will be the behavior in different isolation level.
3. Does database(mysql,oracle) only do pessimistic locking?
Let me answer your questions in a reverse order bacause this way I do not have to repeat certain parts.
Since optimistic locking means that the records read in a transaction are not locked, optimistic locks cannot be implemented. You should not really use the term optimistic lock, use optimistic concurrency control instead. The pessimistic locking strategy is the one that involves database level locks, which are implemented by all rdbms that use transactions - including mysql with innodb.
Mysql does not have any database level support for optimistic concurrency control. This does not mean that other rdbms do not support OCC either. You need to check out their manuals.
Isolation levels do not affect the outcome of the scenario described in the question, since there is no select there, only 2 atomic updates and the field referenced in the where clause is not updated.
Isolation levels mainly influence how data is read by transactions, not how they can update it.
The outcome of the scenario described in the question depends on which session issues the update first and how long that transaction is open. Whichever session executes the update first will make the change and sets an exclusive lock on the index record. The other transaction will not be able to execute the update until the first transaction completes. If the first transaction runs for a long time, then the other one may time out while waiting for the lock to be released.
I know there is one issue in MySQL with concurrent SELECT and INSERT. However, my question is if I open up two connections with MySQL and keep loading data using both of them, does MySQL takes data concurrently or waits for one to finish before loading another?
I’d like to know how MySQL behaves in both cases. Like when I am trying to load data in the same table or different tables concurrently when opening separate connections.
If you will create a new connection to the database and perform inserts from both the links, then from the database's perspective, it will still be sequential.
The documentation of Concurrent Inserts for MyISAM on the MySQL's documentation page says something like this:
If MyISAM storage is used and table has no holes, multiple INSERT statements are queued and performed in sequence, concurrently with the SELECT statements.
Mind that there is no control over the order in which two concurrent inserts will take place. The order in this concurrency is at the mercy of a lot of different factors. To ensure order, by default you will have to sacrifice concurrency.
MySQL does support parallel data inserts into the same table.
But approaches for concurrent read/write depends upon storage engine you use.
InnoDB
MySQL uses row-level locking for InnoDB tables to support simultaneous write access by multiple sessions, making them suitable for multi-user, highly concurrent, and OLTP applications.
MyISAM
MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, allowing only one session to update those tables at a time, making them more suitable for read-only, read-mostly, or single-user applications
But, the above mentioned behavior of MyISAM tables can be altered by concurrent_insert system variable in order to achieve concurrent write. Kindly refer to this link for details.
Hence, as a matter of fact, MySQL does support concurrent insert for InnoDB and MyISAM storage engine.
You ask about Deadlock detection, ACID and particulary MVCC, locking and transactions:
Deadlock Detection and Rollback
InnoDB automatically detects transaction deadlocks and rolls back a
transaction or transactions to break the deadlock. InnoDB tries to
pick small transactions to roll back, where the size of a transaction
is determined by the number of rows inserted, updated, or deleted.
When InnoDB performs a complete rollback of a transaction, all locks
set by the transaction are released. However, if just a single SQL
statement is rolled back as a result of an error, some of the locks
set by the statement may be preserved. This happens because InnoDB
stores row locks in a format such that it cannot know afterward which
lock was set by which statement.
https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlock-detection.html
Locking
The system of protecting a transaction from seeing or changing data
that is being queried or changed by other transactions. The locking
strategy must balance reliability and consistency of database
operations (the principles of the ACID philosophy) against the
performance needed for good concurrency. Fine-tuning the locking
strategy often involves choosing an isolation level and ensuring all
your database operations are safe and reliable for that isolation
level.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_locking
ACID
An acronym standing for atomicity, consistency, isolation, and
durability. These properties are all desirable in a database system,
and are all closely tied to the notion of a transaction. The
transactional features of InnoDB adhere to the ACID principles.
Transactions are atomic units of work that can be committed or rolled
back. When a transaction makes multiple changes to the database,
either all the changes succeed when the transaction is committed, or
all the changes are undone when the transaction is rolled back. The
database remains in a consistent state at all times -- after each
commit or rollback, and while transactions are in progress. If related
data is being updated across multiple tables, queries see either all
old values or all new values, not a mix of old and new values.
Transactions are protected (isolated) from each other while they are
in progress; they cannot interfere with each other or see each other's
uncommitted data. This isolation is achieved through the locking
mechanism. Experienced users can adjust the isolation level, trading
off less protection in favor of increased performance and concurrency,
when they can be sure that the transactions really do not interfere
with each other.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_acid
MVCC
InnoDB is a multiversion concurrency control (MVCC) storage engine
which means many versions of the single row can exist at the same
time. In fact there can be a huge amount of such row versions.
Depending on the isolation mode you have chosen, InnoDB might have to
keep all row versions going back to the earliest active read view, but
at the very least it will have to keep all versions going back to the
start of SELECT query which is currently running
https://www.percona.com/blog/2014/12/17/innodbs-multi-versioning-handling-can-be-achilles-heel/
It depends.
It depends on the client -- some clients allow concurrent access; some will serialize access, thereby losing the expected gain. You have not even specified PHP vs Java vs ... or Apache vs ... or Windows vs ... Many combinations simply do not provide any parallelism.
If different tables, there is only general contention for I/O, CPU, Mutexes on the buffer_pool, etc. A reasonable amount of parallelism is possible.
If same table, it depends on the indexes and access patterns. In some cases the threads will block each other. In some cases it will even "deadlock" and rollback one of the transactions. Deadlocks not only slow you down, but make you retry the inserts.
If you looking for high speed ingestion of a lot of rows, see my blog. It lays out techniques, and points out sever of the ramifications, such as replication, Engine choice, multi-threading.
Multiple threads inserting into the same tables -- It depend a lot on the values you are providing for any PRIMARY or UNIQUE keys. It depends on whether other actions are taken in the same transaction. It depends on how much I/O is involved. It depends on whether you are doing single-row inserts, or batching. It depends on ... (Sorry to be vague, but your question is not very specific.)
If you would like to present specifics on two or three designs, we can discuss the specifics.
When running queries while using myisam engine, because its not transactional, long queries (as far as I understand) don't affect the data from other queries.
In InnoDB, one of the things it warns is to avoid long queries. When InnoDB snapshots, is it snap shotting everything?
The reason I am asking this is: say a query for whatever reason takes a longer time than normal and eventually rolls back. Meanwhile, 200 other users have updated or inserted rows into the database. When the long query rolls back, does it also remove the updates/inserts that were made by the other users? or are the rows that involved the other users safe, unless they crossed over with the one that gets rolled back?
Firstly, I think that it would be useful as background to read up on multi-version concurrency control (MVCC) as a background to this answer.
InnoDB implements MVCC, which means it can use non-locking reads for regular SELECT. This does not require creating a "snapshot" and in fact InnoDB doesn't have any real concept of a snapshot as an object. Instead, every record in the database keeps track of its own version number and maintains a "roll pointer" to an "undo log" record (which may or may not still exist) which modifies the row to its previous version. If an older version of a record is needed, the current version is read and those roll pointers are followed and undo records applied until a sufficiently old version of the record is produced.
Normally the system is constantly cleaning up those undo logs and re-using the space they consume.
Any time any long-running transaction (note, not necessarily a single query) is present, the undo logs must be kept (not purged) in order to sufficiently recreate old enough versions of all records to satisfy that transaction. In a very busy system, those undo logs can very quickly accumulate to consume gigabytes of space. Additionally if specific individual records are very frequently modified, reverting that record to an old enough version to satisfy the query could take very many undo log applications (thousands).
That is what makes "long-running queries" expensive and frowned upon. They will increase disk space consumption for keeping the undo logs in the system tablespace, and they will perform poorly due to undo log record application to revert row versions upon read.
Some databases implement a maximum amount of undo log space that can be consumed, and once they have reached that limit they start throwing away older undo log records and invalidating running transactions. This generates a "snapshot too old" error message to the user. InnoDB has no such limit, and allows accumulation indefinitely.
Whether your queries affect concurrency or not have to do with the types of queries. Having many read queries won't affect concurrency in MyISAM or InnoDB (besides performance issues).
Inserts (to the end of an index with InnoDB, or the end of a table with MyISAM) also don't impact concurrency.
However, as soon as you have an update query, rows get locked in InnoDB, and with MyISAM, it's the entire table that gets write locked. When you try to update a record (or table) that has a write lock, you must wait until the lock is released before you can proceed. In MyISAM, updates are served before reads, so you have to wait until the updates are processed.
MyISAM can be more performant because table locks are faster than record locks (though record locks are fast). However, when you start making a significant number of updates, InnoDB is generally preferred because different users are generally not likely to contend for the same records. So, with InnoDB, many users can work in parallel without affecting each other too much, thanks to the record level locking (rather than table locks).
Not to mention the benefit of full ACID compliance that you get with InnoDB, enforcement of foreign key constraints, and the speed of clustered indexes.
Snapshots (log entries) are kept long enough to complete the current transaction and are discarded if they are rolled back or committed. The longer a transaction runs, the more likely it is that other updates will occur, which grows the number of log entries required to roll back.
There will be no "cross-over" due to locking. When there is write contention for the same records, one user must wait until the other commits or rolls back.
You can read more about The InnoDB Transaction Model and Locking.
can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.