Proper locking for reliable insertion (MySQL) - mysql

When receiving so called IPN message from PayPal, I need to update a row in my database.
The issue is that I need perfect reliability.
Currently I use InnoDB. I am afraid that the transaction may fail due a race condition.
Should I use LOCK TABLES? Any other reliable solution?
Should I check for a failure and repeat the transaction several (how many?) times?

You cannot reliably make a distributed process (like adding a row locally and notifying the server remotely) perfectly reliable, no matter the order. This is a lot like the Two General's Problem: there is no single event which can denote the successful completion of the transaction on both sides simultaneously, as any message might get lost along the way.
I'm not sure I understand your issue correctly, but perhaps the following would work: Write a line to some table noting the fact that you are going to verify a given message. Then do the verification, and afterwards write a line to the database about the result of that verification. In the unlikely but important scenario that something broke in between, you will have an intent line with no matching result line. You can then detect such situations and recover from them manually.
On your local database, you'd have single row updates, which you may execute in their own transaction, probably even with autocommit turned on. You have to make sure that the first write is actually committed to disk (and preferrably a binary log on some other disk as well) before you start talking to the PayPal server, but I see no need for locking or similar. You migt want to retry failed transactions, I'd say up to three times, but the important thing is that in the end you can have admin intervention to fix anything your code can't handle.

Related

What's causing subsequent errors when restarting deadlocked transaction?

When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)

Immidate command during MySQL Transaction

During a sign up process, I'm using a Transaction to enclose all the operations involved in the setup of an account so that in the event of a problem they can be rolled back.
The last item that occurs is a billing process so that if payment is successful, the Commit operation is called to finalise the account creation, if say, the user's card is declined, I roll back.
However, I am wondering what the best way is to write a log of the attempted billing to the database without that particular write operation being 'covered' by the transaction protecting the other database operations. Is this possible in MySQL? The log table in question does not depend on any others. Holding on to the data in the application to write it after the rollback operation is somewhat difficult due to legacy payment libraries created before we started using transactions. I'd like to avoid that if MySQL has a solution.
I would not use transactions with that goal in mind. The operations you describe seem to have full right to exist independently.
For example, an invoice has a header and one or more lines. You use a transaction to ensure that you don't store an incomplete invoice in your database because that would be an application error: there's no circumstance in business logic where you have e.g. a line without a header.
However, having an unconfirmed account makes perfect sense from the business logic point of view. The customer will probably prefer to be informed about the situation and be able to provide another payment method rather that start over again.
Furthermore, using a transaction for such a lengthy process requires keeping an open connection with MySQL Server. If you ever need to implement an HTTP interface you'll have to rethink the whole logic.
In short, transactions are a tool to protect against application errors, not a mechanism to implement business logic.

Do transactions add overhead to the DB?

Would it add overhead to put a DB transactions around every single service method in our application?
We currently only use DB transactions where it's an explicit/obvious necessity. I have recently suggested transactions around all service methods, but some other developers asked the prudent question: will this add overhead?
My feeling is not - auto commit is the same as a transaction from the DB perspective. But is this accurate?
DB: MySQL
You are right, with autocommit every statement is wrapped in transaction. If your service methods are executing multiple sql statements, it would be good to wrap them into a transaction. Take a look at this answer for more details, and here is a nice blog post on the subject.
And to answer your question, yes, transactions do add performance overhead, but in your specific case, you will not notice the difference since you already have autocommit enabled, unless you have long running statements in service methods, which will cause longer locks on tables participating in transactions. If you just wrap your multiple statements inside a transaction, you will get one transaction (instead of transaction for every individual statement), as pointed here ("A session that has autocommit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement") and you will achieve atomicity on a service method level...
At the end, I would go with your solution, if that makes sense from the perspective of achieving atomicity on a service method level (which I think that you want to achieve), but there are + and - effects on performance, depending on your queries, requests/s etc...
Yes, they can add overhead. The extra "bookkeeping" required to isolate transactions from each other can become significant, especially if the transactions are held open for a long time.
The short answer is that it depends on your table type. If you're using MyISAM, the default, there are no transactions really, so there should be no effect on performance.
But you should use them anyway. Without transactions, there is no demarcation of work. If you upgrade to InnoDB or a real database like PostgreSQL, you'll want to add these transactions to your service methods anyway, so you may as well make it a habit now while it isn't costing you anything.
Besides, you should already be using a transactional store. How do you clean up if a service method fails currently? If you write some information to the database and then your service method throws an exception, how do you clean out that incomplete or erroneous information? If you were using transactions, you wouldn't have to—the database would throw away rolled back data for you. Or what do you do if I'm halfway through a method and another request comes in and finds my half-written data? Is it going to blow up when it goes looking for the other half that isn't there yet? A transactional data store would handle this for you: your transactions would be isolated from each other, so nobody else could see a partially written transaction.
Like everything with databases, the only definitive answer will come from testing with realistic data and realistic loads. I recommend that you do this always, no matter what you suspect, because when it comes to databases very different code paths get activated when the data are large versus when they are not. But I strongly suspect the cost of using transactions even with InnoDB is not great. After all, these systems are heavily used constantly, every day, by organizations large and small that depend on transactions performing well. MVCC adds very little overhead. The benefits are vast, the costs are low—use them!

Is it a good idea to wrap a data migration into a single transaction scope?

I'm doing a data migration at the moment of a subset of data from one database into another.
I'm writing a .net application that is going to communicate with our in house ORM which will drag data from the source database to the target database.
I was wondering, is it feasible, or is it even a good idea to put the entire process into a transaction scope and then if there are no problems to commit it.
I'd say I'd be moving possibly about 1Gig of data across.
Performance is not a problem but is there a limit on how much modified or new data that can be inside a transaction scope?
There's no limit other than the physical size of the log file (note the size required will be much more then the size of the migrated data. Also think about if there is an error and you rollback the transaction that may take a very, very long time.
If the original database is relatively small (< 10 gigs) then I would just make a backup and run the migration non-logged without a transaction.
If there are any issues just restore from back-up.
(I am assuming that you can take the database offline for this - doing migrations when live is a whole other ball of wax...)
If you need to do it while live then doing it in small batches within a transaction is the only way to go.
I assume you are copying data between different servers.
In answer to your question, there is no limit as such. However there are limiting factors which will affect whether this is a good idea. The primary one is locking and lock contention. I.e.:
If the server is in use for other queries, your long-running transaction will probably lock other users out.
Whereas, If the server is not in use, you don't need a transaction.
Other suggestions:
Consider writing the code so that it is incremental, and interruptable, i.e. does it a bit at a time, and will carry on from wherever it left off. This will involve lots of small transactions.
Consider loading the data into a temporary or staging table within the target database, then use a transaction when updating from that source, using a stored procedure or SQL batch. You should not have too much trouble putting that into a transaction because, being on the same server, it should be much, much quicker.
Also consider SSIS as an option. Actually, I know nothing about SSIS, but it is supposed to be good at this kind of stuff.

how to solve lock_wait_timeout, subsequent rollback and data disappeareance from mysql 5.1.38

i am using a toplink with struts 2 and toplink for a high usage app, the app always access a single table with multiple read and writes per second. This causes a lock_wait_timeout error and the transaction rolls back, causing the data just entered to disappear from the front end. (Mysql's autocommit has been set to one). The exception has been caught and sent to an error page in the app but still a rollback occurs (it has to be a toplink exception as mysql does not have the rollback feature turned on). The raw data files, ibdata01 show the entry in it when opened in an editor. As this happend infreqeuntly have not been able to replicate in test conditions.
Can anyone be kind enough to provide some sort of way out of this dilemma? What sort of approach should such a high access (constant read and writes from the same table all the time)? Any help would be greatly appreciated.
What is the nature of your concurrent reads/updates? Are you updating the same rows constantly from different sessions? What do you expect to happen when two sessions update the same row at the same time?
If it is just reads conflicting with updates, consider reducing your transaction isolation on your database.
If you have multiple write conflicting, then you may consider using pessimistic locking to ensure each transaction succeeds. But either way, you will have lot of contention, so may reconsider your data model or application's usage of the data.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Locking
lock_wait_timeouts are a fact of life for transactional databases. the normal response should usually be to trap the error and attempt to re-run the transaction. not many developers seem to understand this, so it bears repeating: if you get a lock_wait_timeout error and you still want to commit the transaction, then run it again.
other things to look out for are:
persistent connections and not
explicitly COMMIT'ing your
transactions leads to long-running
transactions that result in
unnecessary locks.
since you
have auto-commit off, if you log in
from the mysql CLI (or any other
interactive query tool) and start
running queries you stand a
significant chance of locking rows
and not releasing them in a timely
manner.