MySQL transactions: reads while writing - mysql

I'm implementing PayPal Payments Standard in the website I'm working on. The question is not related to PayPal, I just want to present this question through my real problem.
PayPal can notify your server about a payment in two ways:
PayPal IPN - after each payment PayPal sends a (server-to-server) notification to a url (choose by you) with the transaction details.
PayPal PDT - after a payment (if you set this up in your PP account) PayPal will redirect the user back to your site, passing the transaction id in the url, so you can query PayPal about that transaction, to get details.
The problem is, that you can't be sure which one happens first:
Will your server notified by IPN
Will be the user redirected back to your site
Whichever is happening first, I want to be sure I'm not processing a transaction twice.
So, in both cases, I query my DB against the transaction id coming from paypal (and the payment status actually..but it doesn't matter now) to see if I already saved and processed that transaction. If not, I process it, and save the transaction id with other transaction details into my database.
QUESTION
What happens if I start processing the first request (let it be the PDT..so the user was redirected back to my site, but my server wasn't notified by IPN yet), but before I actually save the transaction to database, the second (the IPN) request arrives and it will try to process the transaction too, because it doesn't find it in DB.
I would love to make sure that while I'm writing a transaction into database, no other queries can read the table, looking for that given transaction id.
I'm using InnoDB, and don't want to lock the whole table, for the time of the write.
Can this be solved simply by transactions, have I to lock "manually" that row? I'm really confused, and I hope some more experienced mysql developers can help making this clear for me and solving the problem.

Native database locks are almost useless in a Web context, particularly in situations like this. MySQL connections are generally NOT done in a persistent way - when a script shuts down, so does the MySQL connection and all locks are released and any in-flight transactions are rolled back.
e.g.
situation 1: You direct a user to paypal's site to complete the purchase
When they head off paypal, the script which sent over the http redirect will terminate and shuts down. Locks/transactions are released/rolled back, and they come back to a "virgin" status as far as the DB is concerned. Their record is no longer locked.
situation 2: Paypal does a server-to-server response. This will be done via a completely separate HTTP connection, utterly distinct from the connection established by the user to your server. That means any locks you establish in the yourserver<->user connection will be distinct from the paypal<->yourserver session, and the paypal response will encounter locked tables. And of course, there's no way of predicting when the paypal response comes in. If the network gods smile upon you and paypal's not swamped, you get a response very quickly and possibly while the user<->you connection is still open. If things are slow and the response is delayed, that response MAY encounter unlocked tables/rows because the user<->server session has completed.
You COULD use persistent MySQL connections, but they open up a whole other world of pain. e.g. consider the case where your script has a bug which gets triggered halfway through processing. You connection, do some transaction work, set up some locks... and then the script dies. Because the MySQL connection is persistent, MySQL will NOT see that the client script has died, and it will keep the transactions/locks in-flight. But the connection is still sitting there, in the shared pool waiting for another session to pick it up. When it invariably is, that new script has no idea that it's gotten this old "stale" connection. It'll step into the middle of a mess of locks and transactions it has no idea exists. You can VERY easily get yourself into a deadlock situation like this, because your buggy scripts have dumped garbage all over the system and other scripts cannot cope with that garbage.
Basically, unless you implement your own locking mechanism on top of the system, e.g. UPDATE users SET locked=1 WHERE id=XXX, you cannot use native DB locking mechanisms in a Web context except in 1-shot-per-script contexts. Locks should never be attempted over multiple independent requests.

Related

How to make polling from database scalable?

I am try to find a scalable way to allow for my desktop application to run command when a change in the database is made.
The application is for running a remote command on your PC. The user logs into the website and can choose the run the command. Currently, users have to download a desktop application that checks the database every few seconds to see if a value has changed. The value can only be changed when they login to a website and press a button.
For now it seems to be working fine since there aren't many users. But when I hit 100+ users hitting the database 100+ times every few seconds is not good. What might be a better approach?
It's true that polling for changes is too expensive, especially if you have many clients. The queries are often very costly, and it's tempting to run the queries frequently to make sure the client gets notified promptly after a change. It's better to avoid polling the database.
One suggestion in the comments above is to use a UDF called from a trigger. But I don't recommend this, because a trigger runs when you do an INSERT/UPDATE/DELETE, not when you COMMIT the change. So a client could be notified of a change, and then when they check the database the change appears to not be there, because either the transaction was rolled back, or else the transaction simply hasn't been committed yet.
Another reason the trigger solution is not good is that MySQL triggers execute once for each row changed, not once for each INSERT/UPDATE/DELETE statement. So you could cause notification spam, if you do an UPDATE that affects thousands of rows.
A different solution is to use a message queue like RabbitMQ or ActiveMQ or Amazon SQS (there are many others). When a client commits their INSERT/UPDATE/DELETE, they confirm the commit succeeded, then post a message on a message queue topic. Many clients can be notified efficiently this way. But it requires that every client who commits changes to the database write code to post to the message queue.
Another solution is for clients to subscribe to MySQL's binary log and read it as a change data capture log. Every committed change to the database is logged in the binary log. You can make clients read this, and it has no more impact to the database server than a replication client (MySQL can easily support hundreds of replicas).
A hybrid solution is to consume the binary log, and turn those changes into events in a message queue. This is how a product like Debezium works. It reads the binary log, and posts events to an Apache Kafka message queue. Then other clients can wait for events on the Kafka queue and respond to them.

How to lock MySQL resources while open in webform and release when closed / done editing

I'm building a platform that is supposed to serve many corporate users at the same time. The system contains a lot of records that from time to time require updating. It is important that every change is logged and appropriately stored. I use a Laravel 6 implementation for the back-end and Angular 6 for the front-end. The front-end sends requests to the back-end via HTTP requests. The data is stored in a MySQL database.
Users load a specific dataset in either read-only or in read-write mode. In read-only mode there is no need to lock the resource as the user is aware that he can only read the data. In read-write mode the dataset should be locked from the moment onwards that the user requested the data such that the dataset cannot be requested by a different user while someone is working on the data. The user working on the data then has the data open in a webform for editing. As soon as the user either saves, cancels or closes the browser the data should be unlocked.
Now locking the data in the database is not the problem, I keep a table of which datasets are locked for editing and whenever a user tries to access this data an error is thrown. Also releasing the data when the user either saves of cancels is not a problem, I just release the lock in the table.
However, since there is no interaction with the back-end when the browser is closed while editing, the dataset remains locked indefinitely (I could fix this with timestamps and cronjobs and so on but those are not an option for the company infrastructure). I'm wondering how to fix this?
One idea I had myself, but no idea on how to implement this, is to keep the HTTP connection open between the client and the server and have the server (laravel) execute some code to release the locks when the connection is closed. Any tips, hints or pointers for me to continue from here?
Use a timestamp field without cron job to indicate how long a record is locked. If the current time is later than the timestamp, consider the record unlocked.
While the web form is open, you can use an ajax request every couple of minutes to update this timestamp for the next couple of minutes, thus keeping the record locked.
If the user closes the browser window, then the timestamp field does not get updated and the record will be unlocked in couple of minutes.

Read after write consistency with mysql and multiple concurrent connections

I'm trying to understand whether it is possible to achieve the following:
I have multiple instances of an application server running behind a round-robin load balancer. The client expects GET after POST/PUT semantics, in particular the client will make a POST request, wait for the response and immediately make a GET request expecting the response to reflect the change made by the POST request, e.g:
> Request: POST /some/endpoint
< Response: 201 CREATED
< Location: /some/endpoint/123
> Request: GET /some/endpoint/123
< Response must not be 404 Not Found
It is not guaranteed that both requests are handled by the same application server. Each application server has a pool of connections to the DB. Each request will commit a transaction before responding to the client.
Thus the database will on one connection see an INSERT statement, followed by a COMMIT. One another connection, it will see a SELECT statement. Temporally, the SELECT will be strictly after the commit, however there may only be a tiny delay in the order of milliseconds.
The application server I have in mind uses Java, Spring, and Hibernate. The database is MySQL 5.7.11 managed by Amazon RDS in a multiple availability zone setup.
I'm trying to understand whether this behavior can be achieved and how so. There is a similar question, but the answer suggesting to lock the table does not seem right for an application that must handle concurrent requests.
Under ordinary circumstances, you will not have any issue with this sequence of requests, since your MySQL will have committed the changes to the database by the time the 201 response has been sent back. Therefore, any subsequent statements will see the created / updated record.
What could be the extraordinary circumstances under which the subsequent select will not find the updated / inserted record?
Another process commits an update or delete statement that changes or removes the given record. There is not too much you can do about this, since it is part of the normal operation. If you do not want such thing to happen, then you have to implement application level locking of data.
The subsequent GET request is routed not only to a different application server, but that one uses (or is forced to use) a different database instance, which does not have the most updated state of that record. I would envisage this to happen if either application or database server level there is a severe failure, or routing of the request goes really bad (routed to a data center at a different geographical location). These should not happen too frequently.
If you're using MyISAM tables, you might be seeing the effects of 'concurrent inserts' (see 8.11.3 in the mysql manual). You can avoid them by either setting the concurrent_insert system variable to 0, or by using the HIGH_PRIORITY keyword on the INSERT.

Will IIS ever terminate the thread if a POST gets canceled by the browser [duplicate]

Environment:
Windows Server 2003 - IIS 6.x
ASP.NET 3.5 (C#)
IE 7,8,9
FF (whatever the latest 10 versions are)
User Scenario:
User enters search criteria against large data-set. After initiating the request, they are navigated to a results page, where they wait until the data is loaded and can then refine the data.
Technical Scenario:
After user sends search criteria (via ajax call), UI calls back-end service. Back-end service queries transactional system(s) and puts the resulting data into a db "cache" - a denormalized table, set-up for further refining the of the data (i.e. sorting, filtering). UI waits until the data is cached and then upon getting notified that the process is done, navigates to a resulting page. The resulting page then makes a call to get the data from the denormalized table.
Problem:
The search is relatively slow (15-25 seconds) for large queries that end up having to query many systems based on the criteria entered. It is relatively fast for other queries ( <4 seconds).
Technical Constraints:
We can not entirely re-architect this search / results system. There are way to many complexities here between how the UI and the back-end is tied together. The page is required (because of constraints that can not be solved on StackOverflow) to turn after performing the search criteria.
We also can not ask the organization to denormalize the data prior to searching because the data has to be real-time, i.e. if a user makes a change in other systems, the data has to show up correctly if they do a search afterwards.
Process that I want to follow:
I want to cheat a little. I want to issue the "Cache" request via an async HttpHandler in a fire-forget model.
After issuing the query, I want to transition the page to the resulting page.
On the transition page, I want to poll the "Cache" table to see if the data has been inserted into it yet.
The reason I want to do this transition right away, is that the resulting page is expensive on itself (even without getting the data) - still 2 seconds of load time before even getting to calling the service that gets the data from the cache.
Question:
Will the ASP.NET thread that is called via the async handler reliably continue processing even if I navigate away from the page using a javascript redirect?
Technical Boundaries 2:
Yes, I know... This search process does not sound efficient. There is nothing I can do about that right now. I am trying to do whatever I can to get it to perform a little better while we continue researching how we are going to re-architect it.
If your answer is to: "Throw it away and start over", please do not answer. That is not acceptable.
Yes.
There is the property Response.IsClientConnected which is used to know if a long running process is still connected. The reason for this property is a processes will continue running even if the client becomes disconnected and must be manually detected via the property and manually shut down if a premature disconnect occurs. It is not by default to discontinue a running process on client disconnect.
Reference to this property: http://msdn.microsoft.com/en-us/library/system.web.httpresponse.isclientconnected.aspx
update
FYI this is a very bad property to rely on these days with sockets. I strongly encourage you to do an approach which allows you to quickly complete a request that notes in some database or queue of some long running task to complete, probably use RabbitMQ or something like that, that in turns uses socket.io or similar to update the web page or app once completed.
How about don't do the async operation on an ASP.NET thread at all? Let the ASP.NET code call a service to queue the data search, then return to the browser with a token from the service, where it will then redirect to the result page that awaits the completed result? The result page will poll using the token from the service.
That way, you won't have to worry about whether or not ASP.NET will somehow learn that the browser has moved to a different page.
Another option is to use Threading (System.Threading).
When the user sends the search criteria, the server begins processing the page request, creates a new Thread responsible for executing the search, and finishes the response getting back to the browser and redirecting to the results page while the thread continues to execute on the server background.
The results page would keep verifying on the server if the query execution had finished as the started Thread would share the progress information. When it does finish, the results are returned when the next ajax call is done by the results page.
It could also be considered using WebSockets. In a sense that the Webserver itself could tell the browser when it is done processing the query execution as it offers full-duplex communications channels.

Immidate command during MySQL Transaction

During a sign up process, I'm using a Transaction to enclose all the operations involved in the setup of an account so that in the event of a problem they can be rolled back.
The last item that occurs is a billing process so that if payment is successful, the Commit operation is called to finalise the account creation, if say, the user's card is declined, I roll back.
However, I am wondering what the best way is to write a log of the attempted billing to the database without that particular write operation being 'covered' by the transaction protecting the other database operations. Is this possible in MySQL? The log table in question does not depend on any others. Holding on to the data in the application to write it after the rollback operation is somewhat difficult due to legacy payment libraries created before we started using transactions. I'd like to avoid that if MySQL has a solution.
I would not use transactions with that goal in mind. The operations you describe seem to have full right to exist independently.
For example, an invoice has a header and one or more lines. You use a transaction to ensure that you don't store an incomplete invoice in your database because that would be an application error: there's no circumstance in business logic where you have e.g. a line without a header.
However, having an unconfirmed account makes perfect sense from the business logic point of view. The customer will probably prefer to be informed about the situation and be able to provide another payment method rather that start over again.
Furthermore, using a transaction for such a lengthy process requires keeping an open connection with MySQL Server. If you ever need to implement an HTTP interface you'll have to rethink the whole logic.
In short, transactions are a tool to protect against application errors, not a mechanism to implement business logic.