Stable pagination of rapidly changing data with repeatable read transaction - mysql

I have a database with data that changes fairly quickly, and there's enough data that user's need to paginate. I'm trying to figure out a way to do stable pagination and I'm wondering if there's a way to take advantage of the repeatable read transaction isolation level to give each pagination request a stable snapshot of the data. The API runs on AWS Lambda, so my question is specifically, is it possible to run multiple queries, in separate lambda invocations against the same transaction? Everything I've found so far indicates that transactions must all operate in the same statement like this
START TRANSACTION
...
COMMIT
I would like to do something like
USE TRANSACTION 12345
...
END
Hopefully that gives you an idea of what I'm trying to accomplish even though the syntax is fictitious. If this is not possible, are there any other options built-in to MySQL that would solve this problem? If not, we will implement our own stable snapshot/caching layer.

Related

What could cause mysql db read to return stale data

I am chasing a problem on a mysql application. At some point my client INSERTs some data, using a query wrapped in a START TRANSACTION; .... COMMIT; statement. Right after that another client comes are read back the data, and it is not there (I am sure of the order of things).
I am running nodejs, express, mysql2, and use connection pooling, with multiple statements queries.
What is interesting is that I see weird things on mysqlworkbench. I just had a workbench instance which would not see the newly inserted data either. I opened a second one, it saw the new data. Minutes later, the first instance would still not see the new data. Hit 'Reconnect to DBMS', and now it sees it. The workbench behaviour, if applied to my node client, would explain the bad result I see in node / mysql2.
There is some sort of caching going on somewhere... no idea where to start :-( Any pointers? Thanks!
It sounds like your clients are living in their own snapshot of the database, which would be true if they have an open transaction using the REPEATABLE-READ isolation level. In other words, no data committed after that client started its transaction will be visible to that client.
One workaround is to force a new transaction to start. Just run COMMIT in the client session where it appears to be viewing stale data. That will resolve any open transaction and the next query will start a new transaction.
Another way you can test is to use a locking read query such as SELECT ... FOR UPDATE. This will read the most recently committed data, regardless of the client's transaction isolation level. That is, even if the client had started their transaction using REPEATABLE-READ, a locking read behaves as if they had started their transaction with READ-COMMITTED.

it's possible to do transactions through http requests in perl?

I'm doing a web application and i want to do is if user don't like the changes or he makes a mistake, he could rollback the changes, and if he likes, save it. I'm using Perl with DBI module and MySQL.
First I send the data to update to a other Perl file, in that page I perform the update and i return the flow to the first page and show the changes to the user.
So I am wondering if its possible to persist or keep alive the transaction through HTTP request or how to do the transaction?
I did the following:
$dbh->{AutoCommit} = 0;
$dbh-do("update ...")
I'm a beginner with Perl and DBI so any answer will be appreciated
How complex a transaction is it? One table, or multiple tables and complex relationships?
If it's a single table, it might be a lot simpler for the confirmation page to show the before (DBI) values and the after (form) values, and perform the transaction following a 'commit' from there.
Apache::DBI and other ORM modules do exist that attempt to persist database connections, but given each web-server process has its own memory-space, you quickly get into some pretty hairy problems. Not for the noob, I would suggest.
I would also recommend that before you go too far with hand-crafted DBI, have a look at some of the object-relational mapping modules out there. DBIx::Class is the most popular/actively maintained one.

Do transactions add overhead to the DB?

Would it add overhead to put a DB transactions around every single service method in our application?
We currently only use DB transactions where it's an explicit/obvious necessity. I have recently suggested transactions around all service methods, but some other developers asked the prudent question: will this add overhead?
My feeling is not - auto commit is the same as a transaction from the DB perspective. But is this accurate?
DB: MySQL
You are right, with autocommit every statement is wrapped in transaction. If your service methods are executing multiple sql statements, it would be good to wrap them into a transaction. Take a look at this answer for more details, and here is a nice blog post on the subject.
And to answer your question, yes, transactions do add performance overhead, but in your specific case, you will not notice the difference since you already have autocommit enabled, unless you have long running statements in service methods, which will cause longer locks on tables participating in transactions. If you just wrap your multiple statements inside a transaction, you will get one transaction (instead of transaction for every individual statement), as pointed here ("A session that has autocommit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement") and you will achieve atomicity on a service method level...
At the end, I would go with your solution, if that makes sense from the perspective of achieving atomicity on a service method level (which I think that you want to achieve), but there are + and - effects on performance, depending on your queries, requests/s etc...
Yes, they can add overhead. The extra "bookkeeping" required to isolate transactions from each other can become significant, especially if the transactions are held open for a long time.
The short answer is that it depends on your table type. If you're using MyISAM, the default, there are no transactions really, so there should be no effect on performance.
But you should use them anyway. Without transactions, there is no demarcation of work. If you upgrade to InnoDB or a real database like PostgreSQL, you'll want to add these transactions to your service methods anyway, so you may as well make it a habit now while it isn't costing you anything.
Besides, you should already be using a transactional store. How do you clean up if a service method fails currently? If you write some information to the database and then your service method throws an exception, how do you clean out that incomplete or erroneous information? If you were using transactions, you wouldn't have to—the database would throw away rolled back data for you. Or what do you do if I'm halfway through a method and another request comes in and finds my half-written data? Is it going to blow up when it goes looking for the other half that isn't there yet? A transactional data store would handle this for you: your transactions would be isolated from each other, so nobody else could see a partially written transaction.
Like everything with databases, the only definitive answer will come from testing with realistic data and realistic loads. I recommend that you do this always, no matter what you suspect, because when it comes to databases very different code paths get activated when the data are large versus when they are not. But I strongly suspect the cost of using transactions even with InnoDB is not great. After all, these systems are heavily used constantly, every day, by organizations large and small that depend on transactions performing well. MVCC adds very little overhead. The benefits are vast, the costs are low—use them!

Multiple processes accessing Django db backend; records not showing up until manually calling _commit

I have a Django project in which multiple processes are accessing the backend mysql db. One process is creating records, while a second process is trying to read those records. I am having an issue where the second process that is trying to read the records can't actually find the records until I manually call connection._commit().
This question has been asked before:
caching issues in MySQL response with MySQLdb in Django
The OP stated that he solved the problem, but didn't quite explain how. Can anyone shed some light on this? I'd like to be able to access the records without manually calling _commit().
Thanks,
Asif
He said:
Django's autocommit isn't an actual autocommit in the db.
So, you have to ensure that autocommit is set at the db level. Otherwise, because of transaction isolation, processes will not see changes made by a different process (different connection), until a commit is done. AFAIK this is not especially a Django issue, other than the lack of clarity in the docs about Django autocommit != db autocommit.
Update: Paraphrasing slightly from the MySQL docs:
REPEATABLE READ is the default
isolation level for InnoDB. For
consistent reads, there is an
important difference from the READ
COMMITTED isolation level: All
consistent reads within the same
transaction read the snapshot
established by the first read. (My
emphasis.)
So, with REPEATABLE READ you only get, on subsequent reads, what was read in the first read. With READ COMMITTED, each read creates and reads its own fresh snapshot, so that you see subsequent updates from other transactions. So - in answer to your comment - your change to the transaction level is correct.
Are you running the processes as views? If so, they're probably committing when the request is finished processing, but it sounds like you're running these processes concurrently. If you run the process outside of a view, they should commit on each save.

How To Mutex Across a Network?

I have a desktop application that runs on a network and every instance connects to the same database.
So, in this situation, how can I implement a mutex that works across all running instances that are connected to the same database?
In other words, I don't wan't that two+ instances to run the same function at the same time. If one is already running the function, the other instances shouldn't have access to it.
PS: Database transaction won't solve, because the function I wan't to mutex doesn't use the database. I've mentioned the database just because it can be used to exchange information across the running instances.
PS2: The function takes about ~30 minutes to complete, so if a second instance tries to run the same function I would like to display a nice message that it can't be performed right now because computer 'X' is already running that function.
PS3: The function has to be processed on the client machine, so I can't use stored procedures.
I think you're looking for a database transaction. A transaction will isolate your changes from all other clients.
Update:
You mentioned that the function doesn't currently write to the database. If you want to mutex this function, there will have to be some central location to store the current mutex holder. The database can work for this -- just add a new table that includes the computername of the current holder. Check that table before starting your function.
I think your question may be confusion though. Mutexes should be about protecting resources. If your function is not accessing the database, then what shared resource are you protecting?
put the code inside a transaction either - in the app, or better -inside a stored procedure, and call the stored procedure.
the transaction mechanism will isolate the code between the callers.
Conversely consider a message queue. As mentioned, the DB should manage all of this for you either in transactions or serial access to tables (ala MyISAM).
In the past I have done the following:
Create a table that basically has two fields, function_name and is_running
I don't know what RDBMS you are using, but most have a way to lock individual records for update. Here is some pseduocode based on Oracle:
BEGIN TRANS
SELECT FOR UPDATE is_running FROM function_table WHERE function_name='foo';
-- Check here to see if it is running, if not, you can set running to 'true'
UPDATE function_table set is_running='Y' where function_name='foo';
COMMIT TRANS
Now I don't have the Oracle PSQL docs with me, but you get the idea. The 'FOR UPDATE' clause locks there record after the read until the commit, so other processes will block on that SELECT statement until the current process commits.
You can use Terracotta to implement such functionality, if you've got a Java stack.
Even if your function does not currently use the database, you could still solve the problem with a specific table for the purpose of synchronizing this function. The specifics would depend on your DB and how it handles isolation levels and locking. For example, with SQL Server you would set the transaction isolation to repeatable read, read a value from your locking row and update it inside a transaction. Don't commit the transaction until your function is done. You can also use explicit table locks in a transaction on most databases which might be simpler. This is probably the simplest solution given you are already using a database.
If you do not want to rely on the database for whatever reason you could write a simple service that would accept TCP connections from your client. Each client would request permission to run and would return a response when done. The server would be able to ensure only one client gets permission to run at a time. Dead clients would eventually drop the TCP connection and be detected as long as you have the correct keep alive setting.
The message queue solution suggested by Xepoch would also work. You could use something like MSMQ or Java Message Queue and have a single message that would act as a run token. All your clients would request the message and then repost it when done. You risk a deadlock if a client dies before reposting so you would need to devise some logic to detect this and it might get complicated.