Read changes from within a transaction - mysql

Whatever changes made to the MySQL database, are those changes readable within the same transaction? Or should I commit the transaction to read the changes?
I could easily test this. But putting a question in SO brings up a lot of good suggestions. Thanks for any input.

Assuming you're using InnoDB, the answer to your first question is generally yes, implying the answer to your second is generally no.
By default MySQL's InnoDB uses a technique called consistent non-locking reads:
The query sees the changes made by
transactions that committed before
that point of time, and no changes
made by later or uncommitted
transactions. The exception to this
rule is that the query sees the
changes made by earlier statements
within the same transaction.
That being said, there's a lot of stuff to know about transactions. You can change the isolation level of a transaction in order to control the transaction results more thoroughly.
The chapter on the InnoDB Transaction Model is a great place to start.

Related

Is there a significant cost of Beginning & Committing a MySQL Transaction even if no changes made to tables?

This application is running on AWS EC2, using RDS/MySQL (5.7).
We have an audit script that is visiting every user in a MySQL user's table. \
For each user, we
Unconditionally start a transaction,
Possibly make some changes the user's record and to other tables.
Unconditionally commit the transaction.
Now, it's likely (and common) in step 2 that no changes were made to any table. The question arose during the code inspection as to the performance impact of Starting/Committing a transaction for many records when no changes were made.
I've read elsewhere that MySQL is optimized for commits, not rollback. But I've yet to find a discussion on the cost of starting/committing transactions when no work was done.
There's no significant cost in short running transactions.
The minimal cost of transactions starts to happen when changes are actually made.
Rollbacks are only expensive if there is data to be changed during the rollback, otherwise its a fairly empty operation.
Committing (and probably rollback) where there are no changes shouldn't incur a penalty.
Keep in mind that as soon as you read an InnoDB table, whether you explicitly "started a transaction" or not, InnoDB does similar work.
The alternative to starting a transaction is relying on autocommit. But autocommit doesn't mean no transaction happens. Autocommit means the transaction starts implicitly when your query touches an InnoDB table, and the transaction commits automatically as soon as the query is done. You can't run more than one statement, and you can't rollback, but otherwise it's the same as an explicit transaction.
You don't really save anything by trying to avoid starting a transaction.
There is most definitely an overhead, and it can be quite significant. I recently worked with a client who had exactly this issue (lots of empty commits due to a bug in their ORM) and their database was burning through about 30% of CPU on empty commits.
Capture a day of slow log with long_query_time=0 (important!), put it through mysqldumpslow or pt-query-digest, and see if you have a query that is just COMMIT; using up a large amount of CPU.

Repeatable read implementation in MySql server side?

I understand what Repeatable Read Transaction Isolation Level means.
During a repeatable read transaction, whatever transactions have committed data after my transaction has started won't be seen by my transaction.
However I am having a tough time understanding how is it actually implemented at sql server side. Is it that at the start of every transaction a snapshot of the database is taken and set aside for that particular transaction ?
If that is so, then the amount of memory resource would be huge in case if multiple repeatable read transactions are started at any point of time ?
Also can someone throw light on the role of Shared/Exclusive lock role in repeatable read
?
I seek the same answer for a while, and after some search work, I think it basically implemented by using MVCC(Snapshot Read) + GAP LOCK + NEXT KEY LOCK.
I am not quite sure I understand it correctly, but I think some keyword may help in further search work.
By the way, if anyone understand Chinese very well, here are some good explanation written in Chinese:
http://hedengcheng.com/?p=771
https://www.cnblogs.com/kismetv/p/10331633.html

InnoDB Isolation Level for single SELECT query

I know that every single query sent to MySQL (with InnoDB as engine) is made as a separate transaction. However my concerns is about the default isolation level (Repeatable Read).
My question is: as SELECT query are sent one by one, what is the need to made the transaction in repeatable read ? In this case, InnoDB doesn't add overhead for nothing ?
For instance, in my Web Application, I have lot of single read queries but the accuracy doesn't matter: as an example, I can retreive the number of books at a given time, even if some modifications are being processed, because I precisely know that such number can evolve after my HTTP request.
In this case READ UNCOMMITED seems appropriate. Do I need to turn every similar transaction-with-single-request to such ISOLATION LEVEL or InnoDB handle it automatically?
Thanks.
First of all your question is a part of wider topic re performance tuning. It is hard to answer just like that - knowing only this. But i try to give you at least some overview.
The fact that Repeatable Read is good enough for most database, does not mean it is also best for you! That’s holly true!
BTW, I think only in MySQL this is at this level defaultly. In most database this is at Read Committed (e.g. Oracle). In my opinion it is enough for most cases.
My question is: as SELECT query are sent one by one, what is the need
to made the transaction in repeatable read ?
Basically no need. Repeatable read level ensure you are not allowing for dirty reads, not repeatable reads and phantom rows (but maybe this is a little different story). And basically these are when you run DMLs. So when query only pure SELECTs one by one -this simply does not apply to.
In this case, InnoDB doesn't add overhead for nothing ?
Another yep. It does not do it for nothing. In general ACID model in InnoDB is at cost of having data consistently stored without any doubts about data reliability. And this is not free of charge. It is simply trade off between performance and data consistency and reliability.
In more details MySQL uses special segments to store snapshots and old row values for rollback purposes. And refers to them if necessary. As I said it costs.
But also worth to mention that performance increase/decrease is visible much more when doing INSERT, UPDATE, DELETE. SELECT does not cost so much. But still.
If you do not need to do it, this is theoretically obvious benefit. How big? You need to assess it by yourself, measuring your query performance in your environment.
Cause many depends also on individual incl. scale, how many reads/writes are there, how often, reference application design, database and much, much more .
And with the same problem in different environments the answer could be simply different.
Another alternative here you could consider is to simply change engine to MyISAM (if you do not need any foreign keys for example). From my experience it is very good choice for heavy reads needs. Again all depends- but in many cases is faster than InnoDB. Of course less safer but if you are aware of possible threats - it is good solution.
In this case READ UNCOMMITED seems appropriate. Do I need to turn
every similar transaction-with-single-request to such ISOLATION LEVEL
or InnoDB handle it automatically?
You can set the isolation level globally, for the current session, or for the next transaction.
Set your transaction level globally for next sessions.
SET GLOBAL tx_isolation = 'READ-UNCOMMITTED';
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

Which isolation level to use in a basic MySQL project?

Well, I got an assignment [mini-project] in which one of the most important issues is the database consistency.
The project is a web application, which allows multiple users to access and work with it. I can expect concurrent querying and updating requests into a small set of tables, some of them connected one to the other (using FOREIGN KEYS).
In order to keep the database as consistent as possible, we were advised to use isolation levels. After reading a bit (maybe not enough?) about them, I figured the most useful ones for me are READ COMMITTED and SERIALIZABLE.
I can divide the queries into three kinds:
Fetching query
Updating query
Combo
For the first one, I need the data to be consistent of course, I don't want to present dirty data, or uncommitted data, etc. Therefore, I thought to use READ COMMITTED for these queries.
For the updating query, I thought using SERIALIZABLE will be the best option, but after reading a bit, i found myself lost.
In the combo, I'll probably have to read from the DB, and decide whether I need/can update or not, these 2-3 calls will be under the same transaction.
Wanted to ask for some advice in which isolation level to use in each of these query options. Should I even consider different isolation levels for each type? or just stick to one?
I'm using MySQL 5.1.53, along with MySQL JDBC 3.1.14 driver (Requirements... Didn't choose the JDBC version)
Your insights are much appreciated!
Edit:
I've decided I'll be using REPEATABLE READ which seems like the default level.
I'm not sure if it's the right way to do, but I guess REPEATABLE READ along with LOCK IN SHARE MODE and FOR UPDATE to the queries should work fine...
What do you guys think?
I would suggest READ COMMITTED. It seems natural to be able to see other sessions' committed data as soon as they're committed.
Its unclear why MySQL has a default of REPEATABLE READ.
I think you worry too much about the isolation level.
If you have multiple tables to update you need to do:
START TRANSACTION;
UPDATE table1 ....;
UPDATE table2 ....;
UPDATE table3 ....;
COMMIT;
This is the important stuff, the isolation level is just gravy.
The default level of repeatable read will do just fine for you.
Note that select ... for update will lock the table, this can result in deadlocks, which is worse than the problem you may be trying to solve.
Only use this if you are deleting rows in your DB.
To be honest I rarely see rows being deleted in a DB, if you are just doing updates, then just use normal selects.
Anyway see: http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-model.html

Locking mySQL tables/rows

can someone explain the need to lock tables and/or rows in mysql?
I am assuming that it to prevent multiple writes to the same field, is this the best practise?
First lets look a good document This is not a mysql related documentation, it's about postgreSQl, but it's one of the simplier and clear doc I've read on transaction. You'll understand MySQl transaction better after reading this link http://www.postgresql.org/docs/8.4/static/mvcc.html
When you're running a transaction 4 rules are applied (ACID):
Atomicity : all or nothing (rollback)
Coherence : coherent before, coherent after
Isolation: not impacted by others?
Durability : commit, if it's done, it's really done
In theses rules there's only one which is problematic, it's Isolation. using a transaction does not ensure a perfect isolation level. The previous link will explain you better what are the phantom-reads and suchs isolation problems between concurrent transactions. But to make it simple you should really use Row levels locks to prevent other transaction, running in the same time as you (and maybe comitting before you), to alter the same records. But with locks comes deadlocks...
Then when you'll try using nice transactions with locks you'll need to handle deadlocks and you'll need to handle the fact that transaction can fail and should be re-launched (simple for or while loops).
Edit:------------
Recent versions of InnoDb provides greater levels of isolation than previous ones. I've done some tests and I must admit that even the phantoms reads that should happen are now difficult to reproduce.
MySQL is on level 3 by default of the 4 levels of isolation explained in the PosgtreSQL document (where postgreSQL is in level 2 by default). This is REPEATABLE READS. That means you won't have Dirty reads and you won't have Non-repeatable reads. So someone modifying a row on which you made your select in your transaction will get an implicit LOCK (like if you had perform a select for update).
Warning: If you work with an older version of MySQL like 5.0 you're maybe in level 2, you'll need to perform the row lock using the 'FOR UPDATE' words!
We can always find some nice race conditions, working with aggregate queries it could be safer to be in the 4th level of isolation (by using LOCK IN SHARE MODE at the end of your query) if you do not want people adding rows while you're performing some tasks. I've been able to reproduce one serializable level problem but I won't explain here the complex example, really tricky race conditions.
There is a very nice example of race conditions that even serializable level cannot fix here : http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY
When working with transactions the more important things are:
data used in your transaction must always be read INSIDE the transaction (re-read it if you had data from before the BEGIN)
understand why the high isolation level set implicit locks and may block some other queries ( and make them timeout)
try to avoid dead locks (try to lock tables in the same order) but handle them (retry a transaction aborted by MySQL)
try to freeze important source tables with serialization isolation level (LOCK IN SHARE MODE) when your application code assume that no insert or update should modify the dataset he's using (if not you will not have problems but your result will have ignored the concurrent changes)
It is not a best practice. Modern versions of MySQL support transactions with well defined semantics. Use transactions, and forget about locking stuff by hand.
The only new thing you'll have to deal with is that transaction commits may fail because of race conditions, but you'd be doing error checking with locks anyway, and it is easier to retry the logic that led to a transaction failure than to recover from errors in a non-transactional setup.
If you do get race conditions and failed commits, then you may want to fine-tune the isolation configuration for your transactions.
For example if you need to generate invoice numbers which are sequential and have no numbers missing - this is a requirement at least in the country I live in.
If you have a few web servers, then a few users might be buying stuff literally at the same time.
If you do select max(invoice_id)+1 from invoice to get the new invoice number, two web servers might do that at the same time (before the new invoice has been added), and get the same invoice number for the invoices they're trying to create.
If you use a mechanism such as "auto_increment", this is just meant to generate unique values, and makes no guarantees about not missing out numbers (if one transaction tries to insert a row, then does a rollback, the number is "lost"),
So the solution is to (a) lock the table (b) select max(invoice_id)+1 from invoice (c) do the insert (d) commit + unlock the table.
On another note, in MySQL you're best using InnoDB and using row-level locking. Doing a lock table command can implicitly commit the transaciton you're working on.
Take a look here for general introduction to what transactions are and how to use them.
Databases are designed to work in concurrent environments, so locking the tables and/or records helps to keep the transactions consistent.
So a record affected by one transaction should not be altered until this transaction commits or rolls back.