Mysql task queue with multiple workers - mysql

I have Mysql database where tasks are inserted. Sometimes they are inserted by 1 or 2 per time, but sometimes they are inserted by 1000 or even more per time. I have workers on multiple servers which are ruled by listeners on these servers. One server = one listener. Listener selects tasks, that are not done yet and that are not in processing right now. It selects an ID of the task and status. If status is empty, then the task is new and we need to proceed id. Then we update the status with an id of server listener (unique) and that means that this server wants to get this task. Then we select tasks using this unique id of the listener and see if there was an attempt of the another listener to do the same at this time. We can see it if the status contains another id. If it is so, we skip the task and continue but if an id is ours we update the status to the processing state indicating our id only if the id in status is ours. After this step we can process the task.
The question is why could happen the situation when some servers process the same tasks? And how to avoid this?

Related

SQL: why we need transaction for ticket booking?

I have read that transaction is usually used in movie ticket booking website, to solve concurrent purchase problem. However, I failed to understand why is it necessary.
If at the same time, 2 users book the same seat (ID = 1) on the same show (ID = 99), can't you simply issue the following SQL command?
UPDATE seat_db
SET takenByUserID=someUserId
WHERE showID=99 AND seatID=1 AND takenByUserID IS NOT NULL
As I can see, this SQL is already been executed atomically, there's no concurrency issue. The database will set seat ID=1 to 1st user of which the server receives the request, then let the 2nd user's request fail. So, why is transaction still needed for ticket booking system?
When you batch all of your DML statements into a single transaction typically you are telling the database a couple things:
Make this operation (i.e. book movie ticket) an all-or-nothing operation
Ensure you don't leave any orphan rows and have consistent data
Lock all the associated tables up-front so that no other writes can be done while the operation runs
Prevents other transactions from modifying tables your current operation wants to access
Prevents deadlock and allows processing to continue by aborting one of the locking queries
Whether you need to wrap your UPDATE seat_db request in its own transaction depends on what other processing (DML) is being done before and after it.
You'll have to use transactions if your action involves multiple unrelated rows. For example, if the user has to pay for the ticket, then there will be at least two updates: update the user's credit and mark the seat as occupied. If any of the two updates were performed alone you'll definitely get into trouble.

How can I guarantee row consistency with a Master-Master database replication?

I have a process that makes a call to a 3rd party system to perform some action. My users can schedule these calls in my web application, which inserts a row in my database. A scheduled process will check my database for any outstanding requests every hour and attempt to make the calls to the 3rd party service. Users can also opt to manually make calls they have scheduled in the web application.
I want to avoid a situation where the scheduled process and a user attempt to make the call at the same time. To add this safety I introduced different row statuses: "waiting" for new rows, "trying" for rows being attempted, and then "complete" for completed ones and "error" for ones that failed.
The first step in attempting the 3rd party call is to update the status from "waiting" to "trying" using a SQL statement like this:
UPDATE remote_calls SET status = "trying" WHERE status = "waiting" AND id = ?;
If rows affected = 0, exit. If rows affected = 1, try to make the call. Thus if the user and the scheduled process attempt to make the same call concurrently only one will succeed and the other one will exit.
This works fine on my test box but I found out that it will be deployed in a master-master replication environment. Do I have a problem? I worry that if user's attempt and scheduled process attempt update different masters, both of the updates will succeed. Is this a legitimate worry? If so, is there any way I can guarantee that only one of these will succeed?

JDBC / MySQL query and update in a transaction

Basically our user provisioning algorithm does something like
-query for a new user
-update database to show you have that user
I'm wondering how to lock the ability for other instances of the process to do the "read" step while one has already started. So it's a little more aggressive than a typical transaction, because it needs to be a read-read lock, and of course unrelated processes should be able to read without being affected by the lock.
You can simply run the UPDATE query immediately to "steal" all inactive users for the current server.
Since individual UPDATE queries are always atomic, this will ensure that each user is only grabbed by one server.
Since MySQL does not allow you to return the updated rows from an UPDATE, you will need to add an identifier column to tell you which rows were "stolen".
Every time you provision users, pick a GUID, set the identifier column to that GUID in the UPDATE statement, then SELECT rows WHERE they still have that GUID.

mysql row level read lock to replace messaging queue

I have a mysql table in which I store jobs to be processed. mainly text fields of raw data the will take around a minute each to process.
I have 2 servers pulling data from that table processing it then deleting.
To manage the job allocation between the 2 servers I am currently using amazon SQS. I store all the row IDS that need processing in SQS, the worker servers poll SQS to get new rows to work on.
The system currently works but SQS adds a layer of complexity and costs that I feel are overkill to achieve what I am doing.
I am trying to implement the same thing without SQS and was wondering if there is any way to read lock a row so that if one worker is working on one row, no other worker can select that row. Or if there's any better way to do it.
A simple workaround: add one more column to your jobs table, is_taken_by INT.
Then in your worker you do something like this:
select job_id from jobs where is_taken_by is null limit 1 for update;
update jobs set is_taken_by = worker_pid where id = job_id;
SELECT ... FOR UPDATE sets exclusive locks on rows it reads. This way you ensure that no other worker can take the same job.
Note: you have to run those two lines in an explicit transaction.
Locking of rows for update using SELECT FOR UPDATE only applies when autocommit is disabled (either by beginning transaction with START TRANSACTION or by setting autocommit to 0. If autocommit is enabled, the rows matching the specification are not locked.

Login Process (Possibly in MySQL/PHP)

I have a problem which I cannot seem to be able to resolve: I'm creating a login process for an application; anytime a user attempts to login and fails, an attempt record is inserted/updated up to 5 attempts. After the 5th attempt, the account is locked.
I have 2 tables for that process, the 'user' table where the user information is kept, and the 'attempts' table.
First, I don't want to use a session or cookie variable for the counting of attempts (cookies can be deleted, and session variables can become too much, since it's a high traffic site); I plan to use an update statement to increase the number by 1.
Here's where the confusion starts:
First, I'm not sure if I should update the row in the attempt table, or just insert a new row for every attempt (my preference is to insert on the first attempt, and to update the row on the remaining 4 attempts).
Second, I need a way to indicate that the attempt being made today is completely different from the one yesterday. For example, if a user attempted to log in yesterday, and succeeded after the third attempt, and then today, he attempted to login again, I don't want the attempt to increment yesterday's attempt. So, in a way, after every successful login, I need a way to ensure that any attempt after a successful login starts a new login process by itself.
I'm not sure if my question is clear. Please, ask for more clarification if needed.
I've racked my brain for 2 days without a solution to this process.
Thanks
P.S: I'm using stored procedure for most of the processing to eliminate the traveling back and forth for the processing.
Rather than have a separate table for login attempts, simply add a counter as a new integer column on the user table. Each time a failed attempt is made, increment that column for that record. Each time a successful login is made, reset that column to 0 for that record.
If you need to keep a running audit of all attempts, that's a separate concern. Auditing isn't part of the login process. For that concern you'd write failed attempts to some kind of audit log. This log can be a table in the database, but shouldn't be linked to the transactional tables in any way. And it can be a general logging system for all kind of application events, not just for failed logins. (Again, another concern entirely.)