What are MySQL InnoDB intention locks used for? - mysql

I have read the MySQL mannual about intention lock:
http://dev.mysql.com/doc/refman/5.5/en/innodb-locking.html#innodb-intention-locks
It says that "To make locking at multiple granularity levels practical", but how? It does not tell us about it.
Can anyone give a detailed explanation and a sample?

Think of the InnoDB data space as a collection of databases, each database being a collection of tables, and each table being a collection of rows. This forms a hierarchy, where lower and lower levels offer more and more granularity.
Now, when you want to update some part(s) of this tree in a transaction, how do you do it? Well, InnoDB employs multiple granularity locking (MGL). The mechanism in MGL is that you specify "intentions" to lock at a particular granularity level as shared or exclusive, then MGL combines all these intentions together and marches up the hierarchy until it finds the minimum spanning set that has to be locked given those intentions.
Without intention locks, you have high level shared and exclusive locks which really don't give you much flexibility: they're all or nothing, which is what we see in MyISAM. But MGL brings the concept of intended shared and intended exclusive, which it uses as I said above to provide "just enough" locking.
If you'd like to know about the specific C level implementation, refer to Introduction to Transaction Locks in InnoDB.

From this link
the table-level intention locks are still not released so other transactions cannot lock the whole table in S or X mode.
I think the existance of intention lock is to allow table locking more effective(Mysql don't have to travse the entrie tree to see if there is a conflicted lock).

Related

Is locking required when reading and writing tuple headers in MVCC?

Several locks are introduced in MySQL, among which SELECT ... FROM is a consistent read, a snapshot of the read data and no lock is set (unless the transaction level is SERIALIZABLE) (https://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html)
The snapshot (MVCC) is implemented in MySQL by adding a header (including transaction version and pointer) and logical visibility rules to the tuple
But we always emphasize the design principle of the visibility rule, but ignore the fact that reading and writing to the tuple Header are two mutually exclusive actions, which can only be avoided by locking.
So how does the sentence of consistent read without lock understand? Is it just that there is no lock in a broad sense? How is the atomic reading and writing of the tuple Header designed? Is there any performance overhead? Is there any information available in this regard?
----- supplementary instruction -----
When a row(tuple) is updated, the new version of the row is kept, along with the old copy/copies. Each copy has a sequence number (transaction id) with it.
The transaction ID and the pointer to the copies are stored in the row header, that is, when creating a copies, you need to modify the row header (update the transaction ID and the pointer to the copies). When accessing the row, you need to access the row header first to judge the version (location) we want to access.
Modifying the row header and visiting row header should be two mutually exclusive actions (otherwise dirty data will be read in the case of concurrent reading and writing), and what I want to know is that MySQL performs this part (row header) How is the logic of reading and writing designed? Is it a read-write lock / spin lock or is there any other clever method?
I think the answer goes something like this.
When a row is updated, the new version of the row is kept, along with the old copy/copies. Each copy has a sequence number (transaction id) with it. After both transactions COMMIT or ROLLBACK, the set of rows is cleaned up -- the surviving one is kept; the other(s) are tossed.
That sequence number has the effect of labeling rows as being part of the snapshot that of the dataset that was taken at BEGIN time.
Rows with sequence numbers that are equal to or older than the transaction in question are considered as fair game for reading by the transaction. Note that no lock is needed for such a "consistent read".
Each copy of the row has its own tuple header, with a different sequence number. The copies are chained together in a "history list". Updates/deletes to the row will add new items to the list, but leave the old copies unchanged. (Again, this points out that no lock is needed for an old read.)
Transaction isolation "read dirty" allows the transaction to go through the history list to 'see' the latest copy.
Performance overhead? Sure. Everything has a performance overhead. But... The "performance" that matters is the grand total of all actions. There is a lot of complexity, but the end result is 'better' performance.
The history list is a lot of overhead, but it helps by decreasing the locking, etc.
InnoDB uses "optimistic" locking -- That is it starts a transaction with the assumption (hope) that it will COMMIT. The cost is that ROLLBACK is less efficient. This seems like a reasonable tradeoff.
InnoDB has a lot of overhead, yet it can beat the much-simpler MyISAM Engine in many benchmarks. Faster and ACID -- May as well get rid of MyISAM. And that is the direction Oracle is taking MySQL.

Designing parent and child relationship in MySQL

Need input on data model design
I have parent_table as
id (PK)
current_version
latest_child_id
child_table as
id (PK)
parent_table_id (FK to parent)
version (running number . largest number implies latest child record)
The relationship between parent_table to child_table is 1:m .
The parent_table in addition keeps a pointer to the latest version of the record in child table.
The system will insert n mutable rows into the child_table and update the parent_table to point to the latest version- for faster reads.
My question is:
Is it a bad practice to have the parent_table store the latest
version of the child table ?
Am I looking at potential performance
problems \ locking issues ? since any insert into the child
table-needs a lock on the parent table as well ?
Database in question: MySQL
Is it a bad practice to have the parent_table store the latest version of the child table ?
Phrases like "bad practice" are loaded with context. I much prefer to consider the trade-offs, and understand the decision at that level.
By storing an attribute which you could otherwise calculate, you're undertaking denormalization. This is an established way of dealing with performance challenges - but it's only one of several. The trade-offs are roughly as follows.
Negative: takes more storage space. Assume this doesn't matter
Negative: requires more code. More code means more opportunity for bugs. Consider wrapping the data access code in a test suite.
Negative: denormalized schemas can require additional "brain space" - you have to remember that you calculate (for instance) the number of children a parent has, but find the latest one by looking at the attribute in the parent table. In an ideal world, a normalized schema describes the business context without having to remember implementation details.
Negative: may make your data model harder to extend in future. As you add more entities and attributes, this denormalized table may become harder and harder to keep in sync. One denormalized column is usually easy to work with, but if you have lots of denormalized columns, keeping them all up to date may be very difficult.
Negative: for data that is not accessed often, the denormalized design may be a bigger performance hit than calculating on the fly. Your question 2 is an example of this. In complex scenarios, it's possible that multiple threads create inconsistencies in the denormalized data.
Positive: with data that is read often, and where the calculation is expensive, a denormalized schema will allow faster read access.
In your case, I doubt you need to store this data as a denormalized attribute. By creating an index on parent_table_id, version DESC, retrieving this data on the fly will be too fast to measure (assuming your database holds 10s of millions of records, not 10s of billions).
In general, I recommend only denormalizing if:
You can prove you have a performance problem (i.e. you have measured it)
You cannot improve performance by creating better indexes
You cannot improve performance through better hardware
Am I looking at potential performance problems \ locking issues ? since any insert into the child table-needs a lock on the parent table as well ?
As #TheImpaler writes, probably not. However, it depends on the complexity of your insert logic (does it do any complicated calculations which might slow things down?), and the likelihood of several concurrent threads trying to update the parent record. You may also end up with inconsistent data in these scenarios.
ORDER BY child_id DESC LIMIT 1
Is a very efficient way to get the "latest" child (assuming you have INDEX(child_id)).
This eliminates the need for the naughty "redundant" info you are proposing.
Is it a bad practice to have the parent_table store the latest version of the child table ?
No, that's perfectly OK, if it fits the requirements of your application. You need to add the extra logic to update the tables correctly, but that's it. Databases offer you a range of possibilities to store your data and relationships, and this is a perfectly good one.
Am I looking at potential performance problems \ locking issues ? since any insert into the child table-needs a lock on the parent table as well ?
It depends on how often you are updating/inserting/deleting children. Most likely it's not going to be a problem, unless the rate of changes is above 200+ per second, considering current database servers. Exclusive locking can become a problem for high volume of transactions.
Normally the locks will be at the row level. That it, they will lock only the row you are working with, so multiple threads with different parents will not create a bottleneck.
If your system really requires high level of transactions (1000+ / second), then the options I see are:
Throw more hardware at it: The easiest way. Just buy a bigger machine and problem solved... at least for a while, until your system grows again.
Use Optimistic Locking: this strategy doesn't require you to perform any actual lock at all. However, you'll need to add an extra numeric column to store the version number of the row.
Switch to another database: MySQL may not handle really high volume perfectly well. If that's the case you can consider PostgreSQL, or even Oracle database, that has surely better caching technology but is also very expensive.

How to prevent multiple workers from racing to process the same task?

I start this worker 10 times to give it a sense of concurrency:
class AnalyzerWorker
#queue = :analyzer
def self.perform
loop do
# My attempt to lock pictures from other worker instances that may
# try to analyze the same picture (race condition)
pic = Pic.where(locked: false).first
pic.update_attributes locked: true
pic.analyze
end
end
end
This code is actually still vulnerable to race condition, one of the reasons i think is because there's a gap of time between fetching the unlocked picture and actually locking it.
Maybe there's more reasons, any robust approach to prevent this?
Active Record provides optimistic locking and pessimistic locking.
In order to use optimistic locking, the table needs to have a column
called lock_version of type integer. Each time the record is updated,
Active Record increments the lock_version column. If an update request
is made with a lower value in the lock_version field than is currently
in the lock_version column in the database, the update request will
fail with an ActiveRecord::StaleObjectError.
Pessimistic locking uses a locking mechanism provided by the
underlying database. Using lock when building a relation obtains an
exclusive lock on the selected rows. Relations using lock are usually
wrapped inside a transaction for preventing deadlock conditions.
Code samples are provided in the referenced links...
Either should work but each need different implementations. From what you are doing, I'd consider pessimistic locking since the possibility of a conflict is relatively high.
Your current implementation is kind of a mixture of both however, as you indicated, it really doesn't solve the problem. You might be able to make yours work, but using the Active Record implementation makes sense.

When to LOCK TABLES in MySQL (MyISAM tables)?

Is the internal locking of MySQL sufficient for a small to medium sized website? My tables are MyISAM. There might be a few hundred people concurrently hitting a specific table with SELECT'S and INSERT's. None of the INSERT/UPDATE queries would overlap. That is, no two users will be updating the same comment ID. INSERT's/UPDATE's would be one-off operations---there would be no reading of data and performing additional operations within the same query.
Specifically, I am setting up a comment/chat system for my website. At worst, there might be a couple of hundred people performing a SELECT statement on the comment/chat tables in order to read new posts. With respect to INSERT's, there might be 100(?) different people trying to INSERT a new comment at any time.
I found this article in another question on SO, and it states that LOCK TABLES is "never required for self-contained insert, update, or delete operations." Is this good practice for the amount of DB traffic that I might have? TIA for any advice.
The only form of locking MyISAM tables is table locks. The idea is that they designed it to be way-fast enough that no one else should need access while it works. Right - YMM definitely V. But for most small-to-medium-size websites, it's fine. For intance, that's what WordPress uses.
Locking is probably only something you need to worry about if rows are being viewed and edited at the same time. Don't worry about locking for what you're doing - it's not only good practice to do so, using locks here would hurt performance.

InnoDB row level locking performance - how many rows?

I just read a lot of stuff about MyISAM and InnoDB as I have to decide which type to use.
There was always mentioned 'row level locking'-support for InnoDB. Of course this only makes sense at a certain amount of rows.
How many would that (circa) be?
EDIT: Apparently I mis-worded my question. I know what table locking and row locking mean but I wondered when this does matter.
If I have just 100 rows inserted per day, of course table locking would be way enough but for a case of, let's say 100 rows per SECOND I think, InnoDB would be the better choice.
My question: Does row also locking make sense for 10 rows per second or 5 rows per second? When does this choice significantly affect performance?
It's not entirely clear what you're asking. Locking ensures that only one user attempts to modify a given row at any given time. Row-level locking means only the one row they're modifying is locked. The usual alternatives are to either lock the entire table for the duration of the modification, or else to lock some subset of the table. Row-level locking simply reduces that subset of the rows to the smallest number that still ensures integrity.
The idea is to allow one user to modify one thing without preventing other users from modifying other things. It's worth noting, however, that in some cases this can be something of a false positive, so to speak. A few databases support row-level locking, but make a row-level lock considerably more expensive that locking a larger part of the table -- enough more expensive that it can be counterproductive.
Edit: Your edit to the original post helps, but not really a lot. First of all, the sizes of rows and levels of hardware involved have a huge effect (inserting an 8-byte row onto a dozen striped 15K SAS hard drives is just a tad faster than inserting a one megabyte row onto a single consumer class hard drive).
Second, it's largely about the number of simultaneous users, so the pattern of insertion makes a big difference. 1000 rows inserted at 3 AM probably won't be noticed at all. 1000 rows inserted evenly throughout the day means a bit more (but probably only a bit). 1000 rows inserted as a batch right when 100 other users need data immediately might get somebody fired (especially if one of those 100 is the owner of the company).
MyISAM tables support concurrent inserts (aka no table lock for inserts). So if you meet the criteria, there's no problem:
http://dev.mysql.com/doc/refman/5.0/en/concurrent-inserts.html
So, like most things, the answer is "it depends". There is no bright line test. Only you can make the determination; we know nothing about your application/hardware/usage statistics/etc. and, by definition, can't know more about it than you do.