Store mysql deadlocks - mysql

I was wondering if there is a way of storing every transaction that causes a deadlock in a mysql database in a seperate table the moment it is recorded in the innodb?

In version 5.5.30, innodb_print_all_deadlocks became available. Set it to ON, but be aware that the log file (probably error.log) that it uses may clutter disk.
Techniques for diminishing the number of deadlocks:
Speed up the transactions.
Move DML statements out of a transaction (whey it is OK to do so).
If there is an IN or OR in some statement (eg DELETEing several rows), sort them.
The last one may turn a deadlock into a "locK_wait_timeout", wherein one of the transactions is silently stalled until the other finishes.

Related

Long running innodb query generate a big undo file in mariadb

I have a big query in php using MYSQLI_USE_RESULT not to put all the results into the php memory.
Because if I use MYSQLI_STORE_RESULT it will put all of the data into memory for all results, which takes multiple GB of ram, instead of getting row by row.
It returns millions of rows and each row will generate an api request, so the query will be running for days.
In the mean time, I have other mysql queries that update/insert the tables related to the first query, and I think it cause the undo log to grow without stopping.
I setup innodb_undo_tablespaces=2 and innodb_undo_log_truncate = ON
so the undo log is separated from ibdata1, but the undo files are still big until I kill the queries that have been running for days.
I executed "SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;" before running the long running query, hoping that it would prevent undo file to grow, but it didn't.
The other queries that are updating/inserting have autocommit.
In 1-2 day, the undo file is already 40GB large.
The question : how to prevent this undo file to increase ? As I don't want to keep the previous version of the data while the query is running. It's not important if I get updated data instead of the data that was at the time of the query.
Regardless of your transaction isolation level, a given query will always establish a fixed snapshot, which requires the data to be preserved in the state it was when the query started.
In other words, READ-COMMITTED or READ-UNCOMMITTED allow subsequent queries in the same transaction to see updated data, but a single query will never see a changing data set. Thus concurrent updates to data will force old record versions to be copied to the undo log, and those record versions will be preserved there until your long-running query is finished.
READ-UNCOMMITTED doesn't help any more than READ-COMMITTED. In fact, I've never needed to use READ-UNCOMMITTED for any reason. Allowing "dirty reads" of unfinished transactions breaks rules of ACID databases, and leads to anomalies.
The only way to avoid long-lasting growth of your undo log is to finish your query.
The simplest way to achieve this is to use multiple short-running queries, each fetching a subset of the result. Finish each query in a timely way.
Another solution would be to run the whole query for the millions of rows of result, and store the result somewhere that isn't constrained by InnoDB transaction isolation.
MyISAM table
Message queue
Plain file on disk
Cache like Memcached or Redis
PHP memory (but you said you aren't comfortable with this because of the size)

How do I add millions of rows to a live production mysql table?

I'm looking to add about 7 million rows to a live production database table that gets 1-2 writes per second. Can I do this without locking the database for writes? I think so because the table uses InnoDB?
Are there other considerations or do I just write the insert statement and let it rip?
If you're using InnoDB, you don't need to do anything special.
Just run your inserts. InnoDB uses row level locking for these situations, it will not lock the entire table.
Of course your performance could still take a hit due to the parallel work.
To answer your other question:
"One confusion about transactions: If I am working on transaction A and a stack of writes B come in, do those writes get processed after I commit my transaction"
In general, no. It will not need to wait. This does depend if you are working within the same keyspace or not, and also what isolation level you are working within.

How does MySQL handle concurrent inserts?

I know there is one issue in MySQL with concurrent SELECT and INSERT. However, my question is if I open up two connections with MySQL and keep loading data using both of them, does MySQL takes data concurrently or waits for one to finish before loading another?
I’d like to know how MySQL behaves in both cases. Like when I am trying to load data in the same table or different tables concurrently when opening separate connections.
If you will create a new connection to the database and perform inserts from both the links, then from the database's perspective, it will still be sequential.
The documentation of Concurrent Inserts for MyISAM on the MySQL's documentation page says something like this:
If MyISAM storage is used and table has no holes, multiple INSERT statements are queued and performed in sequence, concurrently with the SELECT statements.
Mind that there is no control over the order in which two concurrent inserts will take place. The order in this concurrency is at the mercy of a lot of different factors. To ensure order, by default you will have to sacrifice concurrency.
MySQL does support parallel data inserts into the same table.
But approaches for concurrent read/write depends upon storage engine you use.
InnoDB
MySQL uses row-level locking for InnoDB tables to support simultaneous write access by multiple sessions, making them suitable for multi-user, highly concurrent, and OLTP applications.
MyISAM
MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, allowing only one session to update those tables at a time, making them more suitable for read-only, read-mostly, or single-user applications
But, the above mentioned behavior of MyISAM tables can be altered by concurrent_insert system variable in order to achieve concurrent write. Kindly refer to this link for details.
Hence, as a matter of fact, MySQL does support concurrent insert for InnoDB and MyISAM storage engine.
You ask about Deadlock detection, ACID and particulary MVCC, locking and transactions:
Deadlock Detection and Rollback
InnoDB automatically detects transaction deadlocks and rolls back a
transaction or transactions to break the deadlock. InnoDB tries to
pick small transactions to roll back, where the size of a transaction
is determined by the number of rows inserted, updated, or deleted.
When InnoDB performs a complete rollback of a transaction, all locks
set by the transaction are released. However, if just a single SQL
statement is rolled back as a result of an error, some of the locks
set by the statement may be preserved. This happens because InnoDB
stores row locks in a format such that it cannot know afterward which
lock was set by which statement.
https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlock-detection.html
Locking
The system of protecting a transaction from seeing or changing data
that is being queried or changed by other transactions. The locking
strategy must balance reliability and consistency of database
operations (the principles of the ACID philosophy) against the
performance needed for good concurrency. Fine-tuning the locking
strategy often involves choosing an isolation level and ensuring all
your database operations are safe and reliable for that isolation
level.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_locking
ACID
An acronym standing for atomicity, consistency, isolation, and
durability. These properties are all desirable in a database system,
and are all closely tied to the notion of a transaction. The
transactional features of InnoDB adhere to the ACID principles.
Transactions are atomic units of work that can be committed or rolled
back. When a transaction makes multiple changes to the database,
either all the changes succeed when the transaction is committed, or
all the changes are undone when the transaction is rolled back. The
database remains in a consistent state at all times -- after each
commit or rollback, and while transactions are in progress. If related
data is being updated across multiple tables, queries see either all
old values or all new values, not a mix of old and new values.
Transactions are protected (isolated) from each other while they are
in progress; they cannot interfere with each other or see each other's
uncommitted data. This isolation is achieved through the locking
mechanism. Experienced users can adjust the isolation level, trading
off less protection in favor of increased performance and concurrency,
when they can be sure that the transactions really do not interfere
with each other.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_acid
MVCC
InnoDB is a multiversion concurrency control (MVCC) storage engine
which means many versions of the single row can exist at the same
time. In fact there can be a huge amount of such row versions.
Depending on the isolation mode you have chosen, InnoDB might have to
keep all row versions going back to the earliest active read view, but
at the very least it will have to keep all versions going back to the
start of SELECT query which is currently running
https://www.percona.com/blog/2014/12/17/innodbs-multi-versioning-handling-can-be-achilles-heel/
It depends.
It depends on the client -- some clients allow concurrent access; some will serialize access, thereby losing the expected gain. You have not even specified PHP vs Java vs ... or Apache vs ... or Windows vs ... Many combinations simply do not provide any parallelism.
If different tables, there is only general contention for I/O, CPU, Mutexes on the buffer_pool, etc. A reasonable amount of parallelism is possible.
If same table, it depends on the indexes and access patterns. In some cases the threads will block each other. In some cases it will even "deadlock" and rollback one of the transactions. Deadlocks not only slow you down, but make you retry the inserts.
If you looking for high speed ingestion of a lot of rows, see my blog. It lays out techniques, and points out sever of the ramifications, such as replication, Engine choice, multi-threading.
Multiple threads inserting into the same tables -- It depend a lot on the values you are providing for any PRIMARY or UNIQUE keys. It depends on whether other actions are taken in the same transaction. It depends on how much I/O is involved. It depends on whether you are doing single-row inserts, or batching. It depends on ... (Sorry to be vague, but your question is not very specific.)
If you would like to present specifics on two or three designs, we can discuss the specifics.

MySQL/InnoDB and long-running queries

When running queries while using myisam engine, because its not transactional, long queries (as far as I understand) don't affect the data from other queries.
In InnoDB, one of the things it warns is to avoid long queries. When InnoDB snapshots, is it snap shotting everything?
The reason I am asking this is: say a query for whatever reason takes a longer time than normal and eventually rolls back. Meanwhile, 200 other users have updated or inserted rows into the database. When the long query rolls back, does it also remove the updates/inserts that were made by the other users? or are the rows that involved the other users safe, unless they crossed over with the one that gets rolled back?
Firstly, I think that it would be useful as background to read up on multi-version concurrency control (MVCC) as a background to this answer.
InnoDB implements MVCC, which means it can use non-locking reads for regular SELECT. This does not require creating a "snapshot" and in fact InnoDB doesn't have any real concept of a snapshot as an object. Instead, every record in the database keeps track of its own version number and maintains a "roll pointer" to an "undo log" record (which may or may not still exist) which modifies the row to its previous version. If an older version of a record is needed, the current version is read and those roll pointers are followed and undo records applied until a sufficiently old version of the record is produced.
Normally the system is constantly cleaning up those undo logs and re-using the space they consume.
Any time any long-running transaction (note, not necessarily a single query) is present, the undo logs must be kept (not purged) in order to sufficiently recreate old enough versions of all records to satisfy that transaction. In a very busy system, those undo logs can very quickly accumulate to consume gigabytes of space. Additionally if specific individual records are very frequently modified, reverting that record to an old enough version to satisfy the query could take very many undo log applications (thousands).
That is what makes "long-running queries" expensive and frowned upon. They will increase disk space consumption for keeping the undo logs in the system tablespace, and they will perform poorly due to undo log record application to revert row versions upon read.
Some databases implement a maximum amount of undo log space that can be consumed, and once they have reached that limit they start throwing away older undo log records and invalidating running transactions. This generates a "snapshot too old" error message to the user. InnoDB has no such limit, and allows accumulation indefinitely.
Whether your queries affect concurrency or not have to do with the types of queries. Having many read queries won't affect concurrency in MyISAM or InnoDB (besides performance issues).
Inserts (to the end of an index with InnoDB, or the end of a table with MyISAM) also don't impact concurrency.
However, as soon as you have an update query, rows get locked in InnoDB, and with MyISAM, it's the entire table that gets write locked. When you try to update a record (or table) that has a write lock, you must wait until the lock is released before you can proceed. In MyISAM, updates are served before reads, so you have to wait until the updates are processed.
MyISAM can be more performant because table locks are faster than record locks (though record locks are fast). However, when you start making a significant number of updates, InnoDB is generally preferred because different users are generally not likely to contend for the same records. So, with InnoDB, many users can work in parallel without affecting each other too much, thanks to the record level locking (rather than table locks).
Not to mention the benefit of full ACID compliance that you get with InnoDB, enforcement of foreign key constraints, and the speed of clustered indexes.
Snapshots (log entries) are kept long enough to complete the current transaction and are discarded if they are rolled back or committed. The longer a transaction runs, the more likely it is that other updates will occur, which grows the number of log entries required to roll back.
There will be no "cross-over" due to locking. When there is write contention for the same records, one user must wait until the other commits or rolls back.
You can read more about The InnoDB Transaction Model and Locking.

InnoDB transactions: Lock wait timeout

I have a table in my database (actually a few related tables) that get can be manipulated manually from various points through our interface but also automatically from two sources on a continuous basis. The periodic updates can contain huge amounts of data and can result in thousands of inserts/updates. In order to improve performance of the inserts/updates I have used "SET autocommit = 0" around the updates from these automated sources. This has resulted in the desired performance improvement, maybe even more than expected. However the problem now is that if the automated sources overlap or if a manual update is performed very often the database locks up and after a while throws an error:
Lock wait timeout exceeded; try restarting transaction
This may be thrown even in a single statement with autocommit on and no transaction but I guess that is reasonable as well if it conflicts with a transaction. I have read various suggestions, unfortunately there is no ideal solution. I guess my options are:
Try to order updates/inserts on the tables so that locks on all threads are requested in the same order and there is no deadlock. Unfortunately this is no possible, updates need to be applied in the order they are received.
Use LOCK TABLES to serialize transactions. This is theoretically possible but a) Apart from the two automated sources the tables are updated from many points in the system, including triggers, schedules, manually from various interfaces. It would be a nightmare to identify and maintain LOCK tables around all these places and no easy way to know that all have been identified, and b) LOCK TABLES has to lock all tables involved and the updates/inserts though not often but sometimes may need to update many tables as a result of the updates and again need to identify and maintain all the tables that might be updated so that they are included in the LOCK TABLES.
Use a semaphore table before each update in order to achieve the serialization of updates as with LOCK TABLES above but without actually having to use LOCK TABLES. This is an improvement but still has problem a) of LOCK TABLES above.
Any other suggestions? Could the improvement benefits of autocommit = 0 (transactions) be achieved some other way that does not involve locks? Could innodb be configured to actually not lock or lock much less on updates/inserts?
Last resort option may be to move to MyISAM tables. Would this actually achieve performance improvements with heavy inserts/update operations?
Thank you
You can achieve the benefits of autocommit = 0 while still not using long transactions.
a) You can commit the transaction every X statements, assuming that you don't need to rollback the entire transaction
b) instead of using autocommit = 0 you can use ALTER TABLE x DISABLE keys / ALTER TABLE x ENABLE keys before/after the import. This is the reason for the performance improvement of the operation - the non-unique indexes are not updated until the transaction finishes, and then are updated in bulk.