Which isolation level to use in a basic MySQL project? - mysql

Well, I got an assignment [mini-project] in which one of the most important issues is the database consistency.
The project is a web application, which allows multiple users to access and work with it. I can expect concurrent querying and updating requests into a small set of tables, some of them connected one to the other (using FOREIGN KEYS).
In order to keep the database as consistent as possible, we were advised to use isolation levels. After reading a bit (maybe not enough?) about them, I figured the most useful ones for me are READ COMMITTED and SERIALIZABLE.
I can divide the queries into three kinds:
Fetching query
Updating query
Combo
For the first one, I need the data to be consistent of course, I don't want to present dirty data, or uncommitted data, etc. Therefore, I thought to use READ COMMITTED for these queries.
For the updating query, I thought using SERIALIZABLE will be the best option, but after reading a bit, i found myself lost.
In the combo, I'll probably have to read from the DB, and decide whether I need/can update or not, these 2-3 calls will be under the same transaction.
Wanted to ask for some advice in which isolation level to use in each of these query options. Should I even consider different isolation levels for each type? or just stick to one?
I'm using MySQL 5.1.53, along with MySQL JDBC 3.1.14 driver (Requirements... Didn't choose the JDBC version)
Your insights are much appreciated!
Edit:
I've decided I'll be using REPEATABLE READ which seems like the default level.
I'm not sure if it's the right way to do, but I guess REPEATABLE READ along with LOCK IN SHARE MODE and FOR UPDATE to the queries should work fine...
What do you guys think?

I would suggest READ COMMITTED. It seems natural to be able to see other sessions' committed data as soon as they're committed.
Its unclear why MySQL has a default of REPEATABLE READ.

I think you worry too much about the isolation level.
If you have multiple tables to update you need to do:
START TRANSACTION;
UPDATE table1 ....;
UPDATE table2 ....;
UPDATE table3 ....;
COMMIT;
This is the important stuff, the isolation level is just gravy.
The default level of repeatable read will do just fine for you.
Note that select ... for update will lock the table, this can result in deadlocks, which is worse than the problem you may be trying to solve.
Only use this if you are deleting rows in your DB.
To be honest I rarely see rows being deleted in a DB, if you are just doing updates, then just use normal selects.
Anyway see: http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-model.html

Related

MySQL query synchronization/locking question

I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.

InnoDB Isolation Level for single SELECT query

I know that every single query sent to MySQL (with InnoDB as engine) is made as a separate transaction. However my concerns is about the default isolation level (Repeatable Read).
My question is: as SELECT query are sent one by one, what is the need to made the transaction in repeatable read ? In this case, InnoDB doesn't add overhead for nothing ?
For instance, in my Web Application, I have lot of single read queries but the accuracy doesn't matter: as an example, I can retreive the number of books at a given time, even if some modifications are being processed, because I precisely know that such number can evolve after my HTTP request.
In this case READ UNCOMMITED seems appropriate. Do I need to turn every similar transaction-with-single-request to such ISOLATION LEVEL or InnoDB handle it automatically?
Thanks.
First of all your question is a part of wider topic re performance tuning. It is hard to answer just like that - knowing only this. But i try to give you at least some overview.
The fact that Repeatable Read is good enough for most database, does not mean it is also best for you! That’s holly true!
BTW, I think only in MySQL this is at this level defaultly. In most database this is at Read Committed (e.g. Oracle). In my opinion it is enough for most cases.
My question is: as SELECT query are sent one by one, what is the need
to made the transaction in repeatable read ?
Basically no need. Repeatable read level ensure you are not allowing for dirty reads, not repeatable reads and phantom rows (but maybe this is a little different story). And basically these are when you run DMLs. So when query only pure SELECTs one by one -this simply does not apply to.
In this case, InnoDB doesn't add overhead for nothing ?
Another yep. It does not do it for nothing. In general ACID model in InnoDB is at cost of having data consistently stored without any doubts about data reliability. And this is not free of charge. It is simply trade off between performance and data consistency and reliability.
In more details MySQL uses special segments to store snapshots and old row values for rollback purposes. And refers to them if necessary. As I said it costs.
But also worth to mention that performance increase/decrease is visible much more when doing INSERT, UPDATE, DELETE. SELECT does not cost so much. But still.
If you do not need to do it, this is theoretically obvious benefit. How big? You need to assess it by yourself, measuring your query performance in your environment.
Cause many depends also on individual incl. scale, how many reads/writes are there, how often, reference application design, database and much, much more .
And with the same problem in different environments the answer could be simply different.
Another alternative here you could consider is to simply change engine to MyISAM (if you do not need any foreign keys for example). From my experience it is very good choice for heavy reads needs. Again all depends- but in many cases is faster than InnoDB. Of course less safer but if you are aware of possible threats - it is good solution.
In this case READ UNCOMMITED seems appropriate. Do I need to turn
every similar transaction-with-single-request to such ISOLATION LEVEL
or InnoDB handle it automatically?
You can set the isolation level globally, for the current session, or for the next transaction.
Set your transaction level globally for next sessions.
SET GLOBAL tx_isolation = 'READ-UNCOMMITTED';
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

Locking mySQL tables/rows

can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.

when will select statement without for update causing lock?

I'm using MySQL,
I sometimes saw a select statement whose status is 'locked' by running 'show processlist'
but after testing it on local,I can't reproduce the 'locked' status again.
It probably depends on what else is happening. I'm no mySQL expert but in SQL Server various lock levels control when data can be read and written. For example in production your select stateemnt might want to read a record that is being updated. It has to wait until the update is done. Vice-versa - an update might have to wait for a read to finish.
Messing with default lock levels is dangerous. And since dev environs don't have nearly as much traffic you probasbly don't see that kind of contention.
If you spot that again see if you can see if any update is being made against one of the tables your select is referencing.
I'm no expect in mysql, but it sounds like another user is holding a lock against a table/field while your trying to read it.
I'm no MySQL expert either, but locking behavior strongly depends on the isolation level / transaction isolation. I would suggest searching for those terms in the MySQL docs.

What does MySQL do if you attempt to update a table that is being queried?

I have a very slow query that I need to run on a MySQL database from time to time.
I've discovered that attempts to update the table that is being queried are blocked until the query has finished.
I guess this makes sense, as otherwise the results of the query might be inconsistent, but it's not ideal for me, as the query is of much lower importance than the update.
So my question really has two parts:
Out of curiosity, what exactly does MySQL do in this situation? Does it lock the table for the duration of the query? Or try to lock it before the update?
Is there a way to make the slow query not blocking? I guess the options might be:
Kill the query when an update is needed.
Run the query on a copy of the table as it was just before the update took place
Just let the query go wrong.
Anyone have any thoughts on this?
It sounds like you are using a MyISAM table, which uses table level locking. In this case, the SELECT will set a shared lock on the table. The UPDATE then will try to request an exclusive lock and block and wait until the SELECT is done. Once it is done, the UPDATE will run like normal.
MyISAM Locking
If you switched to InnoDB, then your SELECT will set no locks by default. There is no need to change transaction isolation levels as others have recommended (repeatable read is default for InnoDB and no locks will be set for your SELECT). The UPDATE will be able to run at the same time. The multi-versioning that InnoDB uses is very similar to how Oracle handles the situation. The only time that SELECTs will set locks is if you are running in the serializable transaction isolation level, you have a FOR UPDATE/LOCK IN SHARE MODE option to the query, or it is part of some sort of write statement (such as INSERT...SELECT) and you are using statement based binary logging.
InnoDB Locking
For the purposes of the select statement, you should probably issue a:
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
command on the connection, which causes the subsequent select statements to operate without locking.
Don't use the 'SELECT ... FOR UPDATE', as that definitely locks the table rows that are affected by the select statement.
The full list of msql transaction isloation levels are in the docs.
First off all you need to know what engine you´re using (MySam or InnoDb).
This is clearly a transaction problem.
Take a look a the section 13.4.6. SET TRANSACTION Syntax in the mysql manual.
UPDATE LOW_PRIORITY .... may be helpful - the mysql docs aren't clear whether this would let the user requesting the update continue and the update happen when it can (which is what I think happens) or whether the user has to wait (which would be worse than at present ...), and I can't remember.
What table types are you using? If you are on MyISAM, switching to InnoDB (if you can - it has no full text indexing) opens up more options for this sort of thing, as it supports the transactional features and row level locking.
I don't know MySQL, But it sounds like transaction problem.
You should be able to set transaction typ to Dirty Read in your select query.
That won't nessarily give you correct results. But it should'nt be blocked.
Better would be to make the first query go faster. Do some analyzing and check if you can speed it up with correct indeing and so on.