I have a process that makes a call to a 3rd party system to perform some action. My users can schedule these calls in my web application, which inserts a row in my database. A scheduled process will check my database for any outstanding requests every hour and attempt to make the calls to the 3rd party service. Users can also opt to manually make calls they have scheduled in the web application.
I want to avoid a situation where the scheduled process and a user attempt to make the call at the same time. To add this safety I introduced different row statuses: "waiting" for new rows, "trying" for rows being attempted, and then "complete" for completed ones and "error" for ones that failed.
The first step in attempting the 3rd party call is to update the status from "waiting" to "trying" using a SQL statement like this:
UPDATE remote_calls SET status = "trying" WHERE status = "waiting" AND id = ?;
If rows affected = 0, exit. If rows affected = 1, try to make the call. Thus if the user and the scheduled process attempt to make the same call concurrently only one will succeed and the other one will exit.
This works fine on my test box but I found out that it will be deployed in a master-master replication environment. Do I have a problem? I worry that if user's attempt and scheduled process attempt update different masters, both of the updates will succeed. Is this a legitimate worry? If so, is there any way I can guarantee that only one of these will succeed?
Related
I have Mysql database where tasks are inserted. Sometimes they are inserted by 1 or 2 per time, but sometimes they are inserted by 1000 or even more per time. I have workers on multiple servers which are ruled by listeners on these servers. One server = one listener. Listener selects tasks, that are not done yet and that are not in processing right now. It selects an ID of the task and status. If status is empty, then the task is new and we need to proceed id. Then we update the status with an id of server listener (unique) and that means that this server wants to get this task. Then we select tasks using this unique id of the listener and see if there was an attempt of the another listener to do the same at this time. We can see it if the status contains another id. If it is so, we skip the task and continue but if an id is ours we update the status to the processing state indicating our id only if the id in status is ours. After this step we can process the task.
The question is why could happen the situation when some servers process the same tasks? And how to avoid this?
I have a database, let's say in MySQL, that logs runs of client programs that connect to the database. When doing a run, the client program will connect to the database, insert a "Run" record with the start timestamp into the "Runs" table, enter its data into other tables for that run, and then update the same record in the "Runs" table with the end timestamp of the run. The end timestamp is NULL until the end of the run.
The problem is that the client program can be interrupted -- someone can hit Ctrl^C, the system can crash, etc. This would leave the end timestamp as NULL; i.e. I couldn't tell the difference between a run that's still ongoing and one that terminated ungracefully at some point.
I wouldn't want to wrap the entire run in a transaction because the runs can take a long time and upload a lot of data, and all of the data from a partial run would be desired. (There will be lots of smaller transactions during the run, however.) I also need to be able to view the data in real-time in another SQL connection as it's being uploaded by a client, so a mega-transaction for the entire run would not be good for that purpose.
During a run, the client will have a continuous session with the SQL server, so it would be nice if there could be a "trigger" or similar functionality on the connection closing that would update the Run record with the ending timestamp. It would also be nice if such a "trigger" could add a status like "completed successfully" vs. "terminated ungracefully" to boot.
Is there a solution for this in MySQL? How about PostgreSQL or any other popular relational database system?
I am using 2 separate processes via multiprocessing in my application. Both have access to a MySQL database via sqlalchemy core (not the ORM). One process reads data from various sources and writes them to the database. The other process just reads the data from the database.
I have a query which gets the latest record from the a table and displays the id. However it always displays the first id which was created when I started the program rather than the latest inserted id (new rows are created every few seconds).
If I use a separate MySQL tool and run the query manually I get correct results, but SQL alchemy is always giving me stale results.
Since you can see the changes your writer process is making with another MySQL tool that means your writer process is indeed committing the data (at least, if you are using InnoDB it does).
InnoDB shows you the state of the database as of when you started your transaction. Whatever other tools you are using probably have an autocommit feature turned on where a new transaction is implicitly started following each query.
To see the changes in SQLAlchemy do as zzzeek suggests and change your monitoring/reader process to begin a new transaction.
One technique I've used to do this myself is to add autocommit=True to the execution_options of my queries, e.g.:
result = conn.execute( select( [table] ).where( table.c.id == 123 ).execution_options( autocommit=True ) )
assuming you're using innodb the data on your connection will appear "stale" for as long as you keep the current transaction running, or until you commit the other transaction. In order for one process to see the data from the other process, two things need to happen: 1. the transaction that created the new data needs to be committed and 2. the current transaction, assuming it's read some of that data already, needs to be rolled back or committed and started again. See The InnoDB Transaction Model and Locking.
I am using Gearman workers to accept jobs from a MySQL database. The procedure for these workers getting a job is something like:
SELECT foo, bar FROM jobs WHERE job_id = 'foobarbaz' AND status = 'WAITING';
Now, if that query returns one row, then the worker knows its been given a valid job and proceeds to work on it. But of course the risk is while working on it another worker might take up the job, too.
To prevent this I'm thinking how can I atomically SELECT the data needed to proceed with the job as well as update the status of it if it was valid? I thought perhaps of doing the UPDATE on the row status with the job ID and then testing affected rows, but wasn't sure if there was a more sensible way to go about it.
Thanks.
Basically our user provisioning algorithm does something like
-query for a new user
-update database to show you have that user
I'm wondering how to lock the ability for other instances of the process to do the "read" step while one has already started. So it's a little more aggressive than a typical transaction, because it needs to be a read-read lock, and of course unrelated processes should be able to read without being affected by the lock.
You can simply run the UPDATE query immediately to "steal" all inactive users for the current server.
Since individual UPDATE queries are always atomic, this will ensure that each user is only grabbed by one server.
Since MySQL does not allow you to return the updated rows from an UPDATE, you will need to add an identifier column to tell you which rows were "stolen".
Every time you provision users, pick a GUID, set the identifier column to that GUID in the UPDATE statement, then SELECT rows WHERE they still have that GUID.