Immidate command during MySQL Transaction - mysql

During a sign up process, I'm using a Transaction to enclose all the operations involved in the setup of an account so that in the event of a problem they can be rolled back.
The last item that occurs is a billing process so that if payment is successful, the Commit operation is called to finalise the account creation, if say, the user's card is declined, I roll back.
However, I am wondering what the best way is to write a log of the attempted billing to the database without that particular write operation being 'covered' by the transaction protecting the other database operations. Is this possible in MySQL? The log table in question does not depend on any others. Holding on to the data in the application to write it after the rollback operation is somewhat difficult due to legacy payment libraries created before we started using transactions. I'd like to avoid that if MySQL has a solution.

I would not use transactions with that goal in mind. The operations you describe seem to have full right to exist independently.
For example, an invoice has a header and one or more lines. You use a transaction to ensure that you don't store an incomplete invoice in your database because that would be an application error: there's no circumstance in business logic where you have e.g. a line without a header.
However, having an unconfirmed account makes perfect sense from the business logic point of view. The customer will probably prefer to be informed about the situation and be able to provide another payment method rather that start over again.
Furthermore, using a transaction for such a lengthy process requires keeping an open connection with MySQL Server. If you ever need to implement an HTTP interface you'll have to rethink the whole logic.
In short, transactions are a tool to protect against application errors, not a mechanism to implement business logic.

Related

Move information-resource stored in the database tables with two step using 'reservation'

I need to architect a database and service, I have resource that I need to deliver to the users. And the delivery takes some time or requires user to do some more job.
These are the tables I store information into.
Table - Description
_______________________
R - to store resources
RESERVE - to reserve requested resources
HACK - to track some requests that couldn`t be made with my client application (statistics)
FAIL - to track requests that can`t be resolved, but the user isn't guilty (statistics)
SUCCESS - to track successfully delivery (statistics)
The first step when a user requests resouce
IF (condition1 is true - user have the right to request resource) THEN
IF (i've successfully RESERVE-d resource and commited the transaction) THEN
nothing to do more
ELSE
save request into FAIL
ELSE
save request into HACK
Then the second step
IF (condition2 is true - user done his job and requests the reserved resource) THEN
IF (the resource delivered successfully) THEN
save request into SUCCESS
ELSE
save request into FAIL
depending on application logic move resource from RESERVE to R or not
ELSE
save request into HACK, contact to the user,
if this is really a hacker move resource from RESERVE to R
This is how I think to implement the system. I've stored transactions into the procedures. But the main application logic, where I decide which procedure to call are done in the application/service layer.
Am I on a right way, is such code division between the db and the service layers normal? Your experienced opinions are very important.
Clarifying and answering to RecentCoin's questions.
The difference between the HACK and FAIL tables are that I store more information in the HACK table, like user IP and XFF. I`m not going to penalize each user that appeared in that table. There can be 2 reasons that a user(request) is tracked as a hack. The first is that I have a bug (mainly in the client app) and this will help me to fix them. The second is that someone does manually requests, and tries to bypass the rules. If he tries 'harder' I'll be able to take some precautions.
The separation of the reserve and the success tables has these reasons.
2.1. I use reserve table in some transactions and queries without using the success table, so I can lock them separately.
2.2. The data stored in success will not slow down my queries, wile I'm querying the reserve table.
2.3. The success table is kind of a log for statistics, that I can delete or move to other database for future analyse.
2.4. I delete the rows from the reserve after I move them to the success table. So I can evaluate approximately the max rows count in that table, because I have max limit for reservations for each user.
The points 2.3 and 2.4 could be achieved too by keeping in one table.
So are the reasons 2.1 and 2.2 enough good to keep the data separately?
The resource "delivered successfully" mean that the admin and the service are done everything they could do successfully, if they couldn't then the reservation fails
4 and 6. The restrictions and right are simple, they are like city and country restrictions, The users are 'flat', don't have any roles or hierarchy.
I have some tables to store users and their information. I don't have LDAP or AD.
You're going in the right direction, but there are some other things that need to be more clearly thought out.
You're going to have to define what constitutes a "hack" vs a "fail". Especially with new systems, users get confused and it's pretty easy for them to make honest mistakes. This seems like something you want to penalize them for in some fashion so I'd be extremely careful with this.
You will want to consider having "reserve" and "success" be equivalent. Why store the same record twice? You should have a really compelling reason do that.
You will need to define "delivered successfully" since that could be anything from an entry in a calendar to getting more pens and post notes.
You will want to define your resources as well as which user(s) have rights to them. For example, you may have a conference room that only managers are allowed to book, but you might want to include the managers' administrative assistants in that list since they would be booking the room for the manager(s).
Do you have a database of users? LDAP or Active Directory or will you need to create all of that yourself? If you do have LDAP or AD, can use something like SAML?
6.You are going to want to consider how you want to assign those rights. Will they be group based where group membership confers the rights to reserve, request, or use a given thing? For example, you may only want architects printing to the large format printer.

MySQL transactions: reads while writing

I'm implementing PayPal Payments Standard in the website I'm working on. The question is not related to PayPal, I just want to present this question through my real problem.
PayPal can notify your server about a payment in two ways:
PayPal IPN - after each payment PayPal sends a (server-to-server) notification to a url (choose by you) with the transaction details.
PayPal PDT - after a payment (if you set this up in your PP account) PayPal will redirect the user back to your site, passing the transaction id in the url, so you can query PayPal about that transaction, to get details.
The problem is, that you can't be sure which one happens first:
Will your server notified by IPN
Will be the user redirected back to your site
Whichever is happening first, I want to be sure I'm not processing a transaction twice.
So, in both cases, I query my DB against the transaction id coming from paypal (and the payment status actually..but it doesn't matter now) to see if I already saved and processed that transaction. If not, I process it, and save the transaction id with other transaction details into my database.
QUESTION
What happens if I start processing the first request (let it be the PDT..so the user was redirected back to my site, but my server wasn't notified by IPN yet), but before I actually save the transaction to database, the second (the IPN) request arrives and it will try to process the transaction too, because it doesn't find it in DB.
I would love to make sure that while I'm writing a transaction into database, no other queries can read the table, looking for that given transaction id.
I'm using InnoDB, and don't want to lock the whole table, for the time of the write.
Can this be solved simply by transactions, have I to lock "manually" that row? I'm really confused, and I hope some more experienced mysql developers can help making this clear for me and solving the problem.
Native database locks are almost useless in a Web context, particularly in situations like this. MySQL connections are generally NOT done in a persistent way - when a script shuts down, so does the MySQL connection and all locks are released and any in-flight transactions are rolled back.
e.g.
situation 1: You direct a user to paypal's site to complete the purchase
When they head off paypal, the script which sent over the http redirect will terminate and shuts down. Locks/transactions are released/rolled back, and they come back to a "virgin" status as far as the DB is concerned. Their record is no longer locked.
situation 2: Paypal does a server-to-server response. This will be done via a completely separate HTTP connection, utterly distinct from the connection established by the user to your server. That means any locks you establish in the yourserver<->user connection will be distinct from the paypal<->yourserver session, and the paypal response will encounter locked tables. And of course, there's no way of predicting when the paypal response comes in. If the network gods smile upon you and paypal's not swamped, you get a response very quickly and possibly while the user<->you connection is still open. If things are slow and the response is delayed, that response MAY encounter unlocked tables/rows because the user<->server session has completed.
You COULD use persistent MySQL connections, but they open up a whole other world of pain. e.g. consider the case where your script has a bug which gets triggered halfway through processing. You connection, do some transaction work, set up some locks... and then the script dies. Because the MySQL connection is persistent, MySQL will NOT see that the client script has died, and it will keep the transactions/locks in-flight. But the connection is still sitting there, in the shared pool waiting for another session to pick it up. When it invariably is, that new script has no idea that it's gotten this old "stale" connection. It'll step into the middle of a mess of locks and transactions it has no idea exists. You can VERY easily get yourself into a deadlock situation like this, because your buggy scripts have dumped garbage all over the system and other scripts cannot cope with that garbage.
Basically, unless you implement your own locking mechanism on top of the system, e.g. UPDATE users SET locked=1 WHERE id=XXX, you cannot use native DB locking mechanisms in a Web context except in 1-shot-per-script contexts. Locks should never be attempted over multiple independent requests.

Proper locking for reliable insertion (MySQL)

When receiving so called IPN message from PayPal, I need to update a row in my database.
The issue is that I need perfect reliability.
Currently I use InnoDB. I am afraid that the transaction may fail due a race condition.
Should I use LOCK TABLES? Any other reliable solution?
Should I check for a failure and repeat the transaction several (how many?) times?
You cannot reliably make a distributed process (like adding a row locally and notifying the server remotely) perfectly reliable, no matter the order. This is a lot like the Two General's Problem: there is no single event which can denote the successful completion of the transaction on both sides simultaneously, as any message might get lost along the way.
I'm not sure I understand your issue correctly, but perhaps the following would work: Write a line to some table noting the fact that you are going to verify a given message. Then do the verification, and afterwards write a line to the database about the result of that verification. In the unlikely but important scenario that something broke in between, you will have an intent line with no matching result line. You can then detect such situations and recover from them manually.
On your local database, you'd have single row updates, which you may execute in their own transaction, probably even with autocommit turned on. You have to make sure that the first write is actually committed to disk (and preferrably a binary log on some other disk as well) before you start talking to the PayPal server, but I see no need for locking or similar. You migt want to retry failed transactions, I'd say up to three times, but the important thing is that in the end you can have admin intervention to fix anything your code can't handle.

Database strategy for synchronization based on changes

I have a Spring+Hibernate+MySQL backend that exposes my model (8 different entities) to a desktop client. To keep synchronized, I want the client to regularely ask the server for recent changes. The process may be as follows:
Point A: The client connects for the
first time and retrieves all the
model from the server.
Point B: The client asks the server
for all changes since Point A.
Point C: The client asks the server
for all changes since Point B.
To retrieve the changes (point B&C) I could create a HQL query that returns all rows in all my tables that have been last modified since my previous retrieval. However I'm afraid this can be a heavy query and degrade my performance if executed oftenly.
For this reason I was considering other alternatives as keeping a separate table with recent updates for a fast access. I have looked to using L2 query cache but it doesn't seem to serve for my purpose.
Does someone know a good strategy for my purpose? My initial thought is to keep control of synchronization and avoid using "automatic" synchronization tools.
Many thanks
you can store changes in a queue table. Triggers can populate the queue on insert, update, delete. this preserves the order of the changes like insert, update, update, delete. Empty the queue after download.
Emptying the queue would cause issues if you have multiple clients.... may need to think about a design to handle that case.
there are several designs you can go with, all with trade offs. I have used the queue design before, but it was only copying data to a single destination, not multiple.

Pattern for updating slave SQL Server 2008 databases from a master whilst minimising disruption

We have an ASP.NET web application hosted by a web farm of many instances using SQL Server 2008 in which we do aggregation and pre-processing of data from multiple sources into a format optimised for fast end user query performance (producing 5-10 million rows in some tables). The aggregation and optimisation is done by a service on a back end server which we then want to distribute to multiple read only front end copies used by the web application instances to facilitate maximum scalability.
My question is about the best way to get this data from a back end database out to the read only front end copies in such a way that does not kill their performance during the process. The front end web application instances will be under constant high load and need to have good responsiveness at all times.
The backend database is constantly being updated so I suspect that transactional replication will not be the best approach, as the constant stream of updates to the copies will hurt their performance.
Staleness of data is not a huge issue so snapshot replication might be the way to go, but this will result in poor performance during the periods of replication.
Doing a drop and bulk insert will result in periods with no data for user queries.
I don't really want to get into writing a complex cluster approach where we drop copies out of the cluster during updating - is there something along these lines that we can do without too much effort, or is there a better alternative?
There is actually a technology built into SQL Server 2005 (and 2008) that is designed to address this kind of issues. Service Broker (I'll refer further as SSB). The problem is that it has a very steep learning curve.
I know MySpace went public how uses SSB to manage their park of SQL Servers: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Data. I know of several more (major) sites that use similar patterns but unfortunately they have not gone public so I cannot refer names. I was personally involved with some projects around this technology (I am a former member of the SQL Server team).
Now bear in mind that SSB is not a dedicate data transfer technology like Replication. As such you will not find anyhting similar to the publishing wizards and simple deployment options of Replication (check a table and it gets transferred). SSB is a reliable messaging technology and as such its primitives stop at the level of message exchange, you would have to write the code that leverages the data change capture, packs it as messages and also the unpacking of message into relational tables at destination.
Why still some companies preffer SSB over Replication at a task like you describe is because SSB has a far better story when it comes to reliability and scalability. I know of projects that exchange data between 1500+ sites, far beyond the capabilities of Replication. SSB is also abstracted from the physical topology: you can move databases, rename machines, rebuild servers all without changing the application. Because data flow occurs over logical routes the application can addapt on-the-fly to new topologies. SSB is also resilient to long periods of disocnnect and downtime, being capable of resuming the data flow after hours, days and even months of disconnect. High troughput achieved by engine integration (SSB is part of the SQL engine itself, is not a collection of sattelite applications and processes like Replication) means that the backlog of changes can be processes on reasonable times (I know of sites that are going through half a million transactions per minute). SSB applications typically rely on internal Activation to process the incomming data. SSB also has some unique features like built-in load balancing (via routes) with sticky session semantics, support for deadlock free application specific correlated processing, priority data delivery, specific support for database mirroring, certificate based authentication for cross domain operations, built-in persisted timers and many more.
This is not a specific answer 'how to move data from table T on server A to server B'. Is more a generic technology on how to 'exhange data between server A and server B'.
I've never had to deal with this scenario before but did come up with a possible solution for this. Basically, it would require a change in your main database structure. Instead of storing the data, you would keep records of modifications of this data. Thus, if a record is added, you store "Table X, inserted new record with these values: ..." With modifications, just store the table, field and changed value. With deletions, just store which record is deleted. Every modification will be stored with a timestamp.
Your client systems would keep their local copies of the database and will regularly ask for all database modifications after a certain date/time. You then execute those modifications on the local database and it will be up-to-date again.
And the back-end? Well, it would just keep a list of modifications and perhaps a table with the base data. Keeping just the modifications also means you're keeping track of history, allowing you to ask the system what it looked like a year ago.
How well this would perform depends on the number of modifications on the back-end database. But if you request the changes every 15 minutes, it shouldn't be that much data every time.
But again, I never had the chance to work this out in a real application so it's still a theoretic principle for me. It seems fast but a lot of work will be required.
Option 1: Write an app to transfer the data using row level transactions. It might take longer but would result in no interruption of the site using the data because the rows are there before and after the read occurs, just with new data. This processing would happen on a separate server to minimize load.
In sql server 2008 you can set READ_COMMITTED_SNAPSHOT to ON to ensure that the row being updated is not causing blocking.
But basically all this app does is read the new data as it is available out from one database and into the other.
Option 2: Move the data (tables or entire database) from the aggregation server to the front-end server. Automate this if possible. Then switch your web application to point to the new database or tables for future requests. This works but requires control over the web app, which you may not have.
Option 3: If you were talking about a single table (or this could work with many) what you can do is a view swap. So you write your code against a sql view which points to table A. You do you work on Table B and when it's ready, you update the view to point to Table B. You can even write a function that determines the active table and automate the whole swap thing.
Option 4: You might be able to use something like byte-level replication of the server. That sounds scary though. Which is basically copying the server from point A to point B exactly down to the very bytes. It's mostly used in DR situations which this sounds like it could be a kinda/sorta DR situation, but not really.
Option 5: Give up and learn how to sell insurance. :)