MySQL/InnoDB and long-running queries - mysql

When running queries while using myisam engine, because its not transactional, long queries (as far as I understand) don't affect the data from other queries.
In InnoDB, one of the things it warns is to avoid long queries. When InnoDB snapshots, is it snap shotting everything?
The reason I am asking this is: say a query for whatever reason takes a longer time than normal and eventually rolls back. Meanwhile, 200 other users have updated or inserted rows into the database. When the long query rolls back, does it also remove the updates/inserts that were made by the other users? or are the rows that involved the other users safe, unless they crossed over with the one that gets rolled back?

Firstly, I think that it would be useful as background to read up on multi-version concurrency control (MVCC) as a background to this answer.
InnoDB implements MVCC, which means it can use non-locking reads for regular SELECT. This does not require creating a "snapshot" and in fact InnoDB doesn't have any real concept of a snapshot as an object. Instead, every record in the database keeps track of its own version number and maintains a "roll pointer" to an "undo log" record (which may or may not still exist) which modifies the row to its previous version. If an older version of a record is needed, the current version is read and those roll pointers are followed and undo records applied until a sufficiently old version of the record is produced.
Normally the system is constantly cleaning up those undo logs and re-using the space they consume.
Any time any long-running transaction (note, not necessarily a single query) is present, the undo logs must be kept (not purged) in order to sufficiently recreate old enough versions of all records to satisfy that transaction. In a very busy system, those undo logs can very quickly accumulate to consume gigabytes of space. Additionally if specific individual records are very frequently modified, reverting that record to an old enough version to satisfy the query could take very many undo log applications (thousands).
That is what makes "long-running queries" expensive and frowned upon. They will increase disk space consumption for keeping the undo logs in the system tablespace, and they will perform poorly due to undo log record application to revert row versions upon read.
Some databases implement a maximum amount of undo log space that can be consumed, and once they have reached that limit they start throwing away older undo log records and invalidating running transactions. This generates a "snapshot too old" error message to the user. InnoDB has no such limit, and allows accumulation indefinitely.

Whether your queries affect concurrency or not have to do with the types of queries. Having many read queries won't affect concurrency in MyISAM or InnoDB (besides performance issues).
Inserts (to the end of an index with InnoDB, or the end of a table with MyISAM) also don't impact concurrency.
However, as soon as you have an update query, rows get locked in InnoDB, and with MyISAM, it's the entire table that gets write locked. When you try to update a record (or table) that has a write lock, you must wait until the lock is released before you can proceed. In MyISAM, updates are served before reads, so you have to wait until the updates are processed.
MyISAM can be more performant because table locks are faster than record locks (though record locks are fast). However, when you start making a significant number of updates, InnoDB is generally preferred because different users are generally not likely to contend for the same records. So, with InnoDB, many users can work in parallel without affecting each other too much, thanks to the record level locking (rather than table locks).
Not to mention the benefit of full ACID compliance that you get with InnoDB, enforcement of foreign key constraints, and the speed of clustered indexes.
Snapshots (log entries) are kept long enough to complete the current transaction and are discarded if they are rolled back or committed. The longer a transaction runs, the more likely it is that other updates will occur, which grows the number of log entries required to roll back.
There will be no "cross-over" due to locking. When there is write contention for the same records, one user must wait until the other commits or rolls back.
You can read more about The InnoDB Transaction Model and Locking.

Related

Store mysql deadlocks

I was wondering if there is a way of storing every transaction that causes a deadlock in a mysql database in a seperate table the moment it is recorded in the innodb?
In version 5.5.30, innodb_print_all_deadlocks became available. Set it to ON, but be aware that the log file (probably error.log) that it uses may clutter disk.
Techniques for diminishing the number of deadlocks:
Speed up the transactions.
Move DML statements out of a transaction (whey it is OK to do so).
If there is an IN or OR in some statement (eg DELETEing several rows), sort them.
The last one may turn a deadlock into a "locK_wait_timeout", wherein one of the transactions is silently stalled until the other finishes.

How does MySQL handle concurrent inserts?

I know there is one issue in MySQL with concurrent SELECT and INSERT. However, my question is if I open up two connections with MySQL and keep loading data using both of them, does MySQL takes data concurrently or waits for one to finish before loading another?
I’d like to know how MySQL behaves in both cases. Like when I am trying to load data in the same table or different tables concurrently when opening separate connections.
If you will create a new connection to the database and perform inserts from both the links, then from the database's perspective, it will still be sequential.
The documentation of Concurrent Inserts for MyISAM on the MySQL's documentation page says something like this:
If MyISAM storage is used and table has no holes, multiple INSERT statements are queued and performed in sequence, concurrently with the SELECT statements.
Mind that there is no control over the order in which two concurrent inserts will take place. The order in this concurrency is at the mercy of a lot of different factors. To ensure order, by default you will have to sacrifice concurrency.
MySQL does support parallel data inserts into the same table.
But approaches for concurrent read/write depends upon storage engine you use.
InnoDB
MySQL uses row-level locking for InnoDB tables to support simultaneous write access by multiple sessions, making them suitable for multi-user, highly concurrent, and OLTP applications.
MyISAM
MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, allowing only one session to update those tables at a time, making them more suitable for read-only, read-mostly, or single-user applications
But, the above mentioned behavior of MyISAM tables can be altered by concurrent_insert system variable in order to achieve concurrent write. Kindly refer to this link for details.
Hence, as a matter of fact, MySQL does support concurrent insert for InnoDB and MyISAM storage engine.
You ask about Deadlock detection, ACID and particulary MVCC, locking and transactions:
Deadlock Detection and Rollback
InnoDB automatically detects transaction deadlocks and rolls back a
transaction or transactions to break the deadlock. InnoDB tries to
pick small transactions to roll back, where the size of a transaction
is determined by the number of rows inserted, updated, or deleted.
When InnoDB performs a complete rollback of a transaction, all locks
set by the transaction are released. However, if just a single SQL
statement is rolled back as a result of an error, some of the locks
set by the statement may be preserved. This happens because InnoDB
stores row locks in a format such that it cannot know afterward which
lock was set by which statement.
https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlock-detection.html
Locking
The system of protecting a transaction from seeing or changing data
that is being queried or changed by other transactions. The locking
strategy must balance reliability and consistency of database
operations (the principles of the ACID philosophy) against the
performance needed for good concurrency. Fine-tuning the locking
strategy often involves choosing an isolation level and ensuring all
your database operations are safe and reliable for that isolation
level.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_locking
ACID
An acronym standing for atomicity, consistency, isolation, and
durability. These properties are all desirable in a database system,
and are all closely tied to the notion of a transaction. The
transactional features of InnoDB adhere to the ACID principles.
Transactions are atomic units of work that can be committed or rolled
back. When a transaction makes multiple changes to the database,
either all the changes succeed when the transaction is committed, or
all the changes are undone when the transaction is rolled back. The
database remains in a consistent state at all times -- after each
commit or rollback, and while transactions are in progress. If related
data is being updated across multiple tables, queries see either all
old values or all new values, not a mix of old and new values.
Transactions are protected (isolated) from each other while they are
in progress; they cannot interfere with each other or see each other's
uncommitted data. This isolation is achieved through the locking
mechanism. Experienced users can adjust the isolation level, trading
off less protection in favor of increased performance and concurrency,
when they can be sure that the transactions really do not interfere
with each other.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_acid
MVCC
InnoDB is a multiversion concurrency control (MVCC) storage engine
which means many versions of the single row can exist at the same
time. In fact there can be a huge amount of such row versions.
Depending on the isolation mode you have chosen, InnoDB might have to
keep all row versions going back to the earliest active read view, but
at the very least it will have to keep all versions going back to the
start of SELECT query which is currently running
https://www.percona.com/blog/2014/12/17/innodbs-multi-versioning-handling-can-be-achilles-heel/
It depends.
It depends on the client -- some clients allow concurrent access; some will serialize access, thereby losing the expected gain. You have not even specified PHP vs Java vs ... or Apache vs ... or Windows vs ... Many combinations simply do not provide any parallelism.
If different tables, there is only general contention for I/O, CPU, Mutexes on the buffer_pool, etc. A reasonable amount of parallelism is possible.
If same table, it depends on the indexes and access patterns. In some cases the threads will block each other. In some cases it will even "deadlock" and rollback one of the transactions. Deadlocks not only slow you down, but make you retry the inserts.
If you looking for high speed ingestion of a lot of rows, see my blog. It lays out techniques, and points out sever of the ramifications, such as replication, Engine choice, multi-threading.
Multiple threads inserting into the same tables -- It depend a lot on the values you are providing for any PRIMARY or UNIQUE keys. It depends on whether other actions are taken in the same transaction. It depends on how much I/O is involved. It depends on whether you are doing single-row inserts, or batching. It depends on ... (Sorry to be vague, but your question is not very specific.)
If you would like to present specifics on two or three designs, we can discuss the specifics.

In MySQL, why do time-consuming updates get slowed down by other processes waiting for the table(s) being updated?

Once in a while, I need to perform a massive update to a very large table. If users continue hitting the Web site while the update is being run, there will be a line-up of MySQL clients.
It appears that the longer the line-up, the slower the main operation gets (i.e. it updates fewer rows per unit time). Killing those processes can speed things up, but they're bound to come back.
Is there a way to address this (other than by bringing the site down)? I don't mind the users waiting a few minutes, but once the line-up has reached a certain size, the operation never completes.
This applies to UPDATE statements, as well as statements resulting in a temporary table being created (e.g. ALTER TABLE)
The waiting connections take up memory, and they're queueing up lock requests on your large and busy table. Eventually you're going to exhaust your maximum DB connections or one of your memory pools due to the number of connections held open. If I had to guess, I'd guess that your slowdown is due to memory exhaustion and the resultant swap-thrashing.
If the update you're doing doesn't require consistency between rows in the large table, you can try lowering the isolation level of the update transaction, using SET TRANSACTION ISOLATION LEVEL. This will greatly decrease the amount of locking and work that MySQL normally does to provided each client "repeatable reads" on a table being updated and read concurrently. You could also try partitioning your large table and running one update per partition, or otherwise breaking up the update operation into multiple pieces so that the table isn't locked for a long time at any one stretch.
If you do require consistency to be maintained between the rows, i.e. the whole table has to go from state X to X' in a single transaction with no intermediate states ever being visible, you're not going to be able to use the above techniques. You might try cloning the table, doing the update on the new table, then renaming the old table out of the way and renaming the new table into its place. Since it's a large table, this may require a significant increase in the runtime and storage needed for the operation. There are also caveats for doing this when triggers and constraints are present. The benefit is that you avoid holding a write lock on the table being updated, except during the relatively fast rename operations. Your users will only be delayed during that small swap window, and this will likely not take so long as to cause the major slowdowns you've experienced.

Do table locks scale? / Would row locks be more efficient for nested sets?

I'm using nested sets to store hierarchical data in a MyISAM table; the table consists of several hierarchical sets for each user. Each user will be the only one writing to his respective trees, but other users may read from them. Node deletion / Insertion requires that other rows in the same tree have their lft and rgt values updated, potentially hundreds of rows.
In order to do this, I need to get a table write lock, update the other nodes in the tree, delete/insert the row and unlock the table.
What I'm wondering is this -- Do table locks scale to hundreds of concurrent users? thousands?
Would InnoDB's row locks be more efficient in this case? (locking a few hundred rows that will mostly be used only by the user himself)
If I were to use row locks, do I need to add explicit logic to deal with deadlock errors?
Well, the philosophy on locking is different between the two engines.
With MyISAM, the reason for full table locking is that writes should normally be fast. There are only two operation needed for the write (Lock table, then write row to disk). MyISAM performance is really bound by disk speed for this reason.
With InnoDB, it gets a little more complicated. Since it's fully ACID compliant, every write takes 4 steps (Lock row, write to transaction log, write row to dis, write to transaction log). Note that it writes to the disk three times. So that means that (in practice) an InnoDB write will take 3 times longer than a MyISAM write. That's one reason for the row level locking (transactions are another).
But it's not that easy. With MyISAM, the table lock requires one semaphore for that table. So the impact on both memory usage and speed are trivial at best. With InnoDB however, it requires an index and one semaphore per row. It needs an index to speed up the "check" to see if there's already a lock for the row. Now, if you're updating one or 10 rows at the same time, there's little difference. But when you're talking millions of rows the difference can be non-trivial (both in memory usage and speed, since it needs to transverse the lock "index" for each row to be locked).
There is also an additional tradeoff. Since InnoDB is ACID compliant, if there's a power loss (or other crash), you're never left in an inconsistent state. There's no uncommitted transaction's data in the db, and there's no committed transaction corrupted (it will automatically run the transaction log if it detects something to fix it). With MyISAM, a power loss (or crash) during a write can leave the table in an inconsistent state and there's nothing you can do about it. If you care about your data, InnoDB would be better. But, with good Binary logs and a backup system, you should be able to recover MyISAM, but it will require some manual intervention...
Now, with that said, your question of which scales better is really hard. First, are most of your writes dealing with one or two rows? If so, InnoDB and Row level locking will tend to scale better. If you do a lot of queries updating a lot of rows at the same time (tens of thousands and up), you'll notice that MyISAM will tend to have better performance.
As for your question of deadlocks, MySQL will locate and handle them for you (but it won't execute one of the queries, so you may want some exception handling code to either retry the query or something else). The internal system will prevent the deadlock...
Now, another note. Since MySQL supports more than one engine in a db, why not put your data into InnoDB, and then make a MyISAM join table to handle the nested set data? Store parenting info in the data table (via a parent_id mechanism). That way, all your data is in an ACID compliant db, but you can gain the speed increase by using the faster (for reading and large writes) MyISAM for the nested set logic...

A lot of writes,but few reads - what Mysql storage engine to use?

I was wondering if anyone has a suggestion for what kind of storage engine to use. The programs needs to perform a lot of writes to database but very few reads.
[edit] No foreign keys necessary. The data is simple, but it needs to preform the writes very fast.
From jpipes:
MyISAM and Table-Level Locks
Unlike InnoDB, which employs row-level
locking, MyISAM uses a much
coarser-grained locking system to
ensure that data is written to the
data file in a protected manner.
Table-level locking is the only level
of lock for MyISAM, and this has a
couple consequences:
Any connection issuing an UPDATE or DELETE against a MyISAM table will
request an exclusive write lock on the
MyISAM table. If no other locks (read
or write) are currently placed on the
table, the exclusive write lock is
granted and all other connections
issuing requests of any kind (DDL,
SELECT, UPDATE, INSERT, DELETE) must
wait until the thread with the
exclusive write lock updates the
record(s) it needs to and then
releases the write lock.
Since there is only table-level locks, there is no ability (like there
is with InnoDB) to only lock one or a
small set of records, allowing other
threads to SELECT from other parts of
the table data.
The point is, for writing, InnoDB is better as it will lock less of the resource and enable more parallel actions/requests to occur.
"It needs to perform the writes very fast" is a vague requirement. Whatever you do, writes may be delayed by contention in the database. If your application needs to not block when it's writing audit records to the database, you should make the audit writing asynchronous and keep your own queue of audit data on disc or in memory (so you don't block the main worker thread/process)
InnoDB may allow concurrent inserts, but that doesn't mean they won't be blocked by contention for resources or internal locks for things like index pages.
MyISAM allows one inserter and several readers ("Concurrent inserts") under the following circumstances:
The table has no "holes in it"
There are no threads trying to do an UPDATE or DELETE
If you have an append-only table, which you recreate each day (or create a new partition every day if you use 5.1 partitioning), you may get away with this.
MyISAM concurrent inserts are mostly very good, IF you can use them.
When writing audit records, do several at a time if possible - this applies whichever storage engine you use. It is a good idea for the audit process to "batch up" records and do an insert of several at once.
You've not really given us enough information to make a considered suggestion - are you wanting to use foreign keys? Row-level locking? Page-level locking? Transactions?
As a general rule, if you want to use transactions, InnoDB/BerkeleyDB. If you don't, MyISAM.
In my experience, MyISAM is great for fast writes as long as, after insertion, it's read-only. It'll keep happily appending faster than any other option I'm familiar with (including supporting indexes).
But as soon as you start deleting records or updating index keys, and it needs to refill emptied holes (in tables or indexes) the discussion gets a lot more complicated.
For classic log-type or journal-type tables, though, it's very happy.