How do I track progress on a queue job? - message-queue

I am using the the database queue driver in laravel to run jobs in the background.
One of my jobs creates a given number (thousands to hundred thousands) records in the database. I wrapped the code for this job in a transaction so that in case the job failed, the database writes would not be commited.
Initially to track progress of the job, i thought i would count the number of created records, divide by total number of expected records then display that in a ui as percentage against each job such that users can know how much longer they have to wait.
This doesn't work because the tables are locked during the transaction.
Am wondering if anybody knows how track progress on a queued job

For the ones who stumble on this question, there is a package which allows that: https://github.com/imTigger/laravel-job-status

As given in http://laravel.com/docs/5.1/queues#job-events
The Queue::after method can be called once a job has completed successfully
As given in http://laravel.com/docs/5.1/queues#failed-job-events
The Queue::failing method can be called when a queued job fails
Hope this is helpful :)

Related

Running a cron to update 1 million records in every hour fails

We have an E-commerce system with more than 1 million users with a total or 4 to 5 million records in order table. We use codeigniter framework as back end and Mysql as database.
Due to this excessive number of users and purchases, we use cron jobs to update the order details and referral bonus points in every hour to make the things work.
Now we have a situation that these data updates exceeds one hour and the next batch of updates reach before finishing the previous one, there by leading into a deadlock and failure of the system.
I'd like to know about the different possible architectural and database scaling options and suggestions to get rid of this situation. We are using only the monolithic architecture to run this application.
Don't use cron. Have a single process that starts over when it finishes. If one pass lasts more than an hour, the next one will start late. (Checking PROCESSLIST is clumsy and error-prone. OTOH, this continually-running approach needs a "keep-alive" cronjob.)
Don't UPDATE millions of rows. Instead, find a way to put the desired info in a separate table that the user joins to. Presumably, that extra table would only 1 row (if everyone is controlled by the same game) or a small number of rows (if there are only a small number of patterns to handle).
Do have the slowlog turned on, with a small value for long_query_time (possibly "1.0", maybe lower). Use pt-query-digest to summarize it to find the "worst" queries. Then we can help you make them take less time, thereby helping to calm your busy system and improve the 'user experience'.
Do use batched INSERT. (A one INSERT with 100 rows runs about 10 times as fast as 100 single-row INSERTs.) Batching UPDATEs is tricky, but can be done with IODKU.
Do use batches of 100-1000 rows. (This is somewhat optimal considering the various things that can happen.)
Do use transactions judiciously. Do check for errors (including deadlocks) at every step.
Do tell us what you are doing in the hourly update. We might be able to provide more targeted advice than that 15-year-old book.
Do realize that you have scaled beyond the capabilities of the typical 3rd-party package. That is, you will have to learn the details of SQL.
I have some ideas here for you - mixed up with some questions.
Assuming you are limited in what you can do (i.e. you can't re-architect you way out of this) and that the database can't be tuned further:
Make the list of records to be processed as small as possible
i.e. Does the job have to run over all records? These 4-5 million records - are they all active orders, or that's how many you have in total for all time? Obviously just process the bare minimum.
Split and parallel process
You mentioned "batches" but never explained what that meant - can you elaborate?
Can you get multiple instances of the cron job to run at once, each covering a different segment of the records?
Multi-Record Operations
The easy (lazy) way to program updates is to do it in a loop that iterates through each record and processes it individually, but relational databases can do updates over multiple records at once. I'm pretty sure there's a proper term for that but I can't recall it. Are you processing each row individually or doing multi-record updates?
How does the cron job query the database? Have you hand-crafted the most efficient queries possible, or are you using some ORM / framework to do stuff for you?

Producer/consumer pattern via mysql

I have 2 processes that act as a producer/consumer via a table.
One process does only INSERT into the table while the other process does a SELECT for new records and an UPDATE of these records when it finishes to mark them as finished.
This keeps happening constantly.
As far as I can see there is no need for any locking or transactions for this simple interaction. Am I right on this?
Am I overlooking something?
I would say the prime consideration to take into account is a scenario where multiple workers retrieve the same row.
The UPDATE and SELECT operations themselves should be fine, but if you have multiple workers consuming via SELECT on the same table, then you might get two workers simultaneously processing the same row.
If each worker is required to process separate rows, locking on SELECT may be required with careful consideration of deadlock if there's a significant unit of work associated with your process.

Reliably select from a database table at fixed time intervals

I have a fairly 'active' CDR table I want to select records from it every say 5 minutes for those last 5 minutes. The problem is it has a SHA IDs generated on a few of the other columns so all I have to lean on is a timestamp field by which I filter by date to select the time window of records I want.
The next problem is that obviously I cannot guarantee my script will run on the second precisely every time, or that the wall clocks of the server will be correct (which doesn't matter) and most importantly there almost certainly will be more than one record per second say 3 rows '2013-08-08 14:57:05' and before the second expired one more might be inserted.
By the time for '2013-08-08 14:57:05' and get records BETWEEN '2013-08-08 14:57:05' AND '2013-08-08 15:02:05' there will be more records for '2013-08-08 14:57:05' which I would have missed.
Essentially:
imprecise wall clock time
no sequential IDs
multiple records per second
query execution time
unreliable frequency of running the query
Are all preventing me from getting a valid set of rows in a specified rolling time window. Any suggestions for how I can go around these?
If you are using the same clock then i see no reason why things would be wrong. a resolution you would want to consider is a datetime table. So that way, every time you updated the start and stop times based on the server time.... then as things are added it would be guarenteed to be within that timeframe.
I mean, you COULD do it by hardcoding, but my way would sort of forcibly store a start and stop point in the database to use.
I would use Cron to handle the intervals and timing somewhat. Not use the time from that, but just to not lock up the database by checking all the time.
I probably not got all the details but to answer to your question title "Reliably select from a database table at fixed time intervals"...
I don't think you could even hope for a query to be run at "second precise" time.
One key problem with that approach is that you will have to deal with concurrent access and lock. You might be able to send the query at fixed time maybe, but your query might be waiting on the DB server for several seconds (or being executed seeing fairly outdated snapshot of the db). Especially in your case since the table is apparently "busy".
As a suggestion, if I were you, I would spend some time to think about queue messaging systems (like http://www.rabbitmq.com/ just to cite one, not presaging it is somehow "your" solution). Anyway those kind of tools are probably more suited to your needs.

How to ensure MySQL selects do not interfere?

I have a table in my MySQL DB which basically contains "cron"-like tasks. Basically a user visits a page and the script (php) checks the DB cron table, gets the latest 5 results that are "available" and executes the scripts related to the tasks.
Only issues I foresee at the moment is that 2 users might get the same tasks. Note that currently I first run an UPDATE query which assigns 5 tasks to the current user. After that I do a SELECT query to get 5 tasks assigned to the current user and when hes done I mark the tasks as completed.
Theoretically no 2 users should ever get the same tasks but I'm uncertain. I'm simple wondering if MySQL possibly has a build in mechanism to ensure this or if there are known methods for it?
Thanks.
You want to use Transactions. This way you can ensure that a multi-step operation, such as [UPDATE, SELECT, UPDATE] is either wholly completed, or does not happen at all.
This is a classic concurrency problem, it's worth reading up about concurrency and transactions in general so that you understand the principals. This will help you avoid problems down the line (there are lots of knotty problems in concurrency!).

High Concurrency Counter (For Credit Management)

I am wondering the best way to do this for speed and accuracy, here is what our application does:
Check if credit is 1 or above (Pre check)
Process job (takes a little time)
Job complete, check if still credits exist to finish job (credit count 1 or above)
Deduct credit
Finish job
This process is repeated 50,000+ times (threaded, using a queue system) and is currently using a mysql database to handle the counter.
Is there any better solutions other than a mysql database style counter?
I was thinking a schema like:
user_id | credit_count
Is this the best schema I should use?
And the thread just locks row than deducts credit, than release row for next thread.
Not sure about which technology are you using for the process.
You could somehow acumulate the counting in the languaje of your processes, and only dump the counter to the database from time to time.
You won't be increasing the count on each request, but every n requests...
Ever look at memcached? If you want to stick with mysql you could change the table to a memory table.
Also read this.