Do transactions add overhead to the DB? - mysql

Would it add overhead to put a DB transactions around every single service method in our application?
We currently only use DB transactions where it's an explicit/obvious necessity. I have recently suggested transactions around all service methods, but some other developers asked the prudent question: will this add overhead?
My feeling is not - auto commit is the same as a transaction from the DB perspective. But is this accurate?
DB: MySQL

You are right, with autocommit every statement is wrapped in transaction. If your service methods are executing multiple sql statements, it would be good to wrap them into a transaction. Take a look at this answer for more details, and here is a nice blog post on the subject.
And to answer your question, yes, transactions do add performance overhead, but in your specific case, you will not notice the difference since you already have autocommit enabled, unless you have long running statements in service methods, which will cause longer locks on tables participating in transactions. If you just wrap your multiple statements inside a transaction, you will get one transaction (instead of transaction for every individual statement), as pointed here ("A session that has autocommit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement") and you will achieve atomicity on a service method level...
At the end, I would go with your solution, if that makes sense from the perspective of achieving atomicity on a service method level (which I think that you want to achieve), but there are + and - effects on performance, depending on your queries, requests/s etc...

Yes, they can add overhead. The extra "bookkeeping" required to isolate transactions from each other can become significant, especially if the transactions are held open for a long time.

The short answer is that it depends on your table type. If you're using MyISAM, the default, there are no transactions really, so there should be no effect on performance.
But you should use them anyway. Without transactions, there is no demarcation of work. If you upgrade to InnoDB or a real database like PostgreSQL, you'll want to add these transactions to your service methods anyway, so you may as well make it a habit now while it isn't costing you anything.
Besides, you should already be using a transactional store. How do you clean up if a service method fails currently? If you write some information to the database and then your service method throws an exception, how do you clean out that incomplete or erroneous information? If you were using transactions, you wouldn't have to—the database would throw away rolled back data for you. Or what do you do if I'm halfway through a method and another request comes in and finds my half-written data? Is it going to blow up when it goes looking for the other half that isn't there yet? A transactional data store would handle this for you: your transactions would be isolated from each other, so nobody else could see a partially written transaction.
Like everything with databases, the only definitive answer will come from testing with realistic data and realistic loads. I recommend that you do this always, no matter what you suspect, because when it comes to databases very different code paths get activated when the data are large versus when they are not. But I strongly suspect the cost of using transactions even with InnoDB is not great. After all, these systems are heavily used constantly, every day, by organizations large and small that depend on transactions performing well. MVCC adds very little overhead. The benefits are vast, the costs are low—use them!

Related

Does mySQL replication have immediate data consistency?

I am considering a noSQL solution for a current project, but I'm hesitant about the 'eventual consistency' clause in many of these databases. Is eventual consistency different than dealing with a mySQL database where replication lags? One solution I have used in the past with lagging replication is to read from the master when immediate data consistency is needed.
However, I am confused then as to why relational database claim to have strong data consistency. I guess I should use transactions and that will give me strong consistency. Is it a good practice then to write applications assuming mySQL replication may lag?
Consistency in the sense it is used in ACID means that all constraints are satisfied before and after any change. When a system assures that you can't read data that is inconsistent, they're saying for example that you will never read data where a child row references a non-existent parent row, or where half of a transaction has been applied but the other half hasn't yet been applied (the textbook example is debiting one bank account but not yet having credited the recipient bank account).
Replication in MySQL is asynchronous by default, or "semi-synchronous" at best. Certainly it does lag in either case. In fact, the replication replica is always lagging behind at least a fraction of a second, because the master doesn't write changes to its binary log until the transaction commits, then the replica has to download the binary log and relay the event.
But the changes are still atomic. You can't read data that is partially changed. You either read committed changes, in which case all constraints are satisfied, or else the changes haven't been committed yet, in which case you see the state of data from before the transaction began.
So you might temporarily read old data in a replication system that lags, but you won't read inconsistent data.
Whereas in an "eventually consistent" system, you might read data that is partially updated, where the one account has been debited but the second account has not yet been credited. So you can see inconsistent data.
You're right that you may need to be careful about reading from replicas if your application requires absolutely current data. Each application has a different tolerance for replication lag, and in fact within one application, different queries have different tolerance for lag. I did a presentation about this: Read/Write Splitting for MySQL and PHP (Percona webinar 2013)
For completeness I'll also answer the question with the CAP theorem point of view. Oh and Consistency in ACID is not same as Consistency in CAP.
In terms of Consistency in CAP theorem, which says every read receives the most recent write or an error(this is referred as linearizability, a.k.a strong consistency a.k.a atomic consistency), MySQL is not strongly consistent by default because it uses asynchronous replication. So there is a period of time which some nodes in the group has the most recent write while some nodes still hasn't.
Also if your MySQL version is 8.0.14 or higher, then group_replication_consistency is configurable but still it's default value is EVENTUAL(this isn't configurable and is the default value in previous MySQL versions which I belive most apps running on). Details: https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-consistency-guarantees.html
Furthermore if you're using MySQL Cluster(which is a different product/technology and I find it confusing they've called it cluster), MySQL documentation itself says it only guarantees eventual consistency. Details: https://dev.mysql.com/doc/mysql-cluster-manager/1.4/en/mcm-eventual-consistency.html
So we are safe to say that it's an eventually consistent system. And every asynchronously replicated system is eventually consistent by definition.

Proper locking for reliable insertion (MySQL)

When receiving so called IPN message from PayPal, I need to update a row in my database.
The issue is that I need perfect reliability.
Currently I use InnoDB. I am afraid that the transaction may fail due a race condition.
Should I use LOCK TABLES? Any other reliable solution?
Should I check for a failure and repeat the transaction several (how many?) times?
You cannot reliably make a distributed process (like adding a row locally and notifying the server remotely) perfectly reliable, no matter the order. This is a lot like the Two General's Problem: there is no single event which can denote the successful completion of the transaction on both sides simultaneously, as any message might get lost along the way.
I'm not sure I understand your issue correctly, but perhaps the following would work: Write a line to some table noting the fact that you are going to verify a given message. Then do the verification, and afterwards write a line to the database about the result of that verification. In the unlikely but important scenario that something broke in between, you will have an intent line with no matching result line. You can then detect such situations and recover from them manually.
On your local database, you'd have single row updates, which you may execute in their own transaction, probably even with autocommit turned on. You have to make sure that the first write is actually committed to disk (and preferrably a binary log on some other disk as well) before you start talking to the PayPal server, but I see no need for locking or similar. You migt want to retry failed transactions, I'd say up to three times, but the important thing is that in the end you can have admin intervention to fix anything your code can't handle.

How to Handle Eventual Consistency Issues in MySQL Read/Write Splitting

I've been looking into solutions to scale MySQL. One that often comes up beyond adding a Memcached layer is read/write splitting -- all writes go to the master and all reads go to a set of load balanced slaves.
The one issue that obviously comes up with this approach is "eventual consistency." When I run a write on the master, replication to the read slaves takes a certain amount of time. Thus, if I make a request for a newly created row, it may not be there.
Does anyone know of specific strategies to handle this issue? I've read about a conceptual partial solution of the ability to "read-what-you-writes". But, does anyone have anyone have any ideas how to implement such a solution -- whether it be conceptually, or specifically in a Spring/Hibernate stack?
I've not done this, but here's an idea. You could have a memcache server on your write database that you connect to before each read query. When you do a write, add a key of some kind to your memcache, and when you replicate1, remove the key.
When you do the memcache read and you're reading a single record, if the key of a record is found, you should read it from the master only. If you're selecting several records, then read them from a slave, and then query each found ID against the memcache keys. If any found in memcache, re-read only those records from the master database.
You may find that there are some (write-heavy) use cases where this strategy would negate the benefits of having a read/write split. But I would wager that in most cases, the extra checking of memcache and the occasional master re-reads will still make it worthwhile.
1 If you are using standard replication and cannot track whether a particular record has replicated fully, just timestamp all your keys, and remove/expire them after a worst-case scenario delay. For example if your slaves lag behind your master by two minutes, ignore (and delete) any keys that are older than two minutes, since they are sure to be replicated.
That all said: don't forget there are lots of cases where lag is acceptable. For example if you have a website at which users update their profiles, if their changes do not fully propagate for five minutes, this is in most cases fine. The key is, imo, not to over-engineer something to get instant propagation if it is not necessary.

Is it a good idea to wrap a data migration into a single transaction scope?

I'm doing a data migration at the moment of a subset of data from one database into another.
I'm writing a .net application that is going to communicate with our in house ORM which will drag data from the source database to the target database.
I was wondering, is it feasible, or is it even a good idea to put the entire process into a transaction scope and then if there are no problems to commit it.
I'd say I'd be moving possibly about 1Gig of data across.
Performance is not a problem but is there a limit on how much modified or new data that can be inside a transaction scope?
There's no limit other than the physical size of the log file (note the size required will be much more then the size of the migrated data. Also think about if there is an error and you rollback the transaction that may take a very, very long time.
If the original database is relatively small (< 10 gigs) then I would just make a backup and run the migration non-logged without a transaction.
If there are any issues just restore from back-up.
(I am assuming that you can take the database offline for this - doing migrations when live is a whole other ball of wax...)
If you need to do it while live then doing it in small batches within a transaction is the only way to go.
I assume you are copying data between different servers.
In answer to your question, there is no limit as such. However there are limiting factors which will affect whether this is a good idea. The primary one is locking and lock contention. I.e.:
If the server is in use for other queries, your long-running transaction will probably lock other users out.
Whereas, If the server is not in use, you don't need a transaction.
Other suggestions:
Consider writing the code so that it is incremental, and interruptable, i.e. does it a bit at a time, and will carry on from wherever it left off. This will involve lots of small transactions.
Consider loading the data into a temporary or staging table within the target database, then use a transaction when updating from that source, using a stored procedure or SQL batch. You should not have too much trouble putting that into a transaction because, being on the same server, it should be much, much quicker.
Also consider SSIS as an option. Actually, I know nothing about SSIS, but it is supposed to be good at this kind of stuff.

how to solve lock_wait_timeout, subsequent rollback and data disappeareance from mysql 5.1.38

i am using a toplink with struts 2 and toplink for a high usage app, the app always access a single table with multiple read and writes per second. This causes a lock_wait_timeout error and the transaction rolls back, causing the data just entered to disappear from the front end. (Mysql's autocommit has been set to one). The exception has been caught and sent to an error page in the app but still a rollback occurs (it has to be a toplink exception as mysql does not have the rollback feature turned on). The raw data files, ibdata01 show the entry in it when opened in an editor. As this happend infreqeuntly have not been able to replicate in test conditions.
Can anyone be kind enough to provide some sort of way out of this dilemma? What sort of approach should such a high access (constant read and writes from the same table all the time)? Any help would be greatly appreciated.
What is the nature of your concurrent reads/updates? Are you updating the same rows constantly from different sessions? What do you expect to happen when two sessions update the same row at the same time?
If it is just reads conflicting with updates, consider reducing your transaction isolation on your database.
If you have multiple write conflicting, then you may consider using pessimistic locking to ensure each transaction succeeds. But either way, you will have lot of contention, so may reconsider your data model or application's usage of the data.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Locking
lock_wait_timeouts are a fact of life for transactional databases. the normal response should usually be to trap the error and attempt to re-run the transaction. not many developers seem to understand this, so it bears repeating: if you get a lock_wait_timeout error and you still want to commit the transaction, then run it again.
other things to look out for are:
persistent connections and not
explicitly COMMIT'ing your
transactions leads to long-running
transactions that result in
unnecessary locks.
since you
have auto-commit off, if you log in
from the mysql CLI (or any other
interactive query tool) and start
running queries you stand a
significant chance of locking rows
and not releasing them in a timely
manner.