High Concurrency Counter (For Credit Management) - mysql

I am wondering the best way to do this for speed and accuracy, here is what our application does:
Check if credit is 1 or above (Pre check)
Process job (takes a little time)
Job complete, check if still credits exist to finish job (credit count 1 or above)
Deduct credit
Finish job
This process is repeated 50,000+ times (threaded, using a queue system) and is currently using a mysql database to handle the counter.
Is there any better solutions other than a mysql database style counter?
I was thinking a schema like:
user_id | credit_count
Is this the best schema I should use?
And the thread just locks row than deducts credit, than release row for next thread.

Not sure about which technology are you using for the process.
You could somehow acumulate the counting in the languaje of your processes, and only dump the counter to the database from time to time.
You won't be increasing the count on each request, but every n requests...

Ever look at memcached? If you want to stick with mysql you could change the table to a memory table.
Also read this.

Related

Running a cron to update 1 million records in every hour fails

We have an E-commerce system with more than 1 million users with a total or 4 to 5 million records in order table. We use codeigniter framework as back end and Mysql as database.
Due to this excessive number of users and purchases, we use cron jobs to update the order details and referral bonus points in every hour to make the things work.
Now we have a situation that these data updates exceeds one hour and the next batch of updates reach before finishing the previous one, there by leading into a deadlock and failure of the system.
I'd like to know about the different possible architectural and database scaling options and suggestions to get rid of this situation. We are using only the monolithic architecture to run this application.
Don't use cron. Have a single process that starts over when it finishes. If one pass lasts more than an hour, the next one will start late. (Checking PROCESSLIST is clumsy and error-prone. OTOH, this continually-running approach needs a "keep-alive" cronjob.)
Don't UPDATE millions of rows. Instead, find a way to put the desired info in a separate table that the user joins to. Presumably, that extra table would only 1 row (if everyone is controlled by the same game) or a small number of rows (if there are only a small number of patterns to handle).
Do have the slowlog turned on, with a small value for long_query_time (possibly "1.0", maybe lower). Use pt-query-digest to summarize it to find the "worst" queries. Then we can help you make them take less time, thereby helping to calm your busy system and improve the 'user experience'.
Do use batched INSERT. (A one INSERT with 100 rows runs about 10 times as fast as 100 single-row INSERTs.) Batching UPDATEs is tricky, but can be done with IODKU.
Do use batches of 100-1000 rows. (This is somewhat optimal considering the various things that can happen.)
Do use transactions judiciously. Do check for errors (including deadlocks) at every step.
Do tell us what you are doing in the hourly update. We might be able to provide more targeted advice than that 15-year-old book.
Do realize that you have scaled beyond the capabilities of the typical 3rd-party package. That is, you will have to learn the details of SQL.
I have some ideas here for you - mixed up with some questions.
Assuming you are limited in what you can do (i.e. you can't re-architect you way out of this) and that the database can't be tuned further:
Make the list of records to be processed as small as possible
i.e. Does the job have to run over all records? These 4-5 million records - are they all active orders, or that's how many you have in total for all time? Obviously just process the bare minimum.
Split and parallel process
You mentioned "batches" but never explained what that meant - can you elaborate?
Can you get multiple instances of the cron job to run at once, each covering a different segment of the records?
Multi-Record Operations
The easy (lazy) way to program updates is to do it in a loop that iterates through each record and processes it individually, but relational databases can do updates over multiple records at once. I'm pretty sure there's a proper term for that but I can't recall it. Are you processing each row individually or doing multi-record updates?
How does the cron job query the database? Have you hand-crafted the most efficient queries possible, or are you using some ORM / framework to do stuff for you?

Using table locking to prevent multiple users from updating at a given time

I am building a simple shopping cart. Currently, to ensure that a customer can never purchase a product that is out of stock, when processing the order I have a loop for each product in their cart:
-- Begin a transaction --
Loop through each product in the cart and
Select the stock count from the products table
If it is in stock:
I will reduce the stock count from the product
Add the product to the order items table
Otherwise, I call a rollback and return an error
-- (If there isn't a call for rollback, everything ends off with a commit --
However, if at any given time, the stock count for a product is updated AFTER it has checked for that particular product, there may be inconsistencies.
Question: would it be a good idea to lock the table from writes whenever I am processing an order? So that when the 'loop' above occurs, I can be assured that no one else is able to alter the product count and it will always be accurate.
The idea is that the product count/availability will always be consistent, and there will never be an instance where the stock count goes to -1 (which would be unfulfillable).
However, I have seen so many posts on locks being inefficient/having bad effects. If so, what is the best way to accomplish this?
I have seen alternatives like handling it in an update + select query, but have seen that it may also not be suitable in some cases.
You have at least three strategies:
1. Pessimistic Locking
If your application will experience low activity then you can lock the tables (or single rows) to make sure no other thread changes the values during the processing of a purchase. It works, but it has performance limitations.
2. Optimistic Locking
If your application/web site must serve a high load then you can opt for the "optimistic locking" strategy. In this case you add a version number column to your critical tables and then you use it when reading/writing it.
When updating stock you check the version number you are updating must be the same that you read. If it's not the case anymore (another thread modified it) you roll back the transaction and can retry again a couple of times until you succeed.
It requires more development effor since you need to identify the bad case and implement retry logic (if you want to).
3. Processing Queues
You can implement processing queues. When a thread wants to "purchase an order" it can submit it to a processing queue for purchase orders. This queue can be implemented by one or more threads dedicated to this task; if you choose multiple threads they can be divided by order types, regions, categories, etc. to distribute the load.
This requires more programming effort since you need to manage asynchronous processing, but can sustain much higher levels of load.
You can use this strategy for multiple different tasks: purchasing orders, refilling stock, sending notifications, processing promotions, etc.

Derived vs Stored account balance in high rate transactions system

I'm writing a Spring Boot 2.x application using Mysql as DBMS. I use Spring Data and Hibernate.
I want to realize a SMS gateway for my customers. Each customer has an account in my system and a balance.
For each sms sent, the balance of the customer must be subctracted by the sms cost. Furthemore, before send the sms the balance should be checked in order to see if the customer has enough credit (this imply having an updated balance to check).
I want to handle a high rate of sms because customers are business and not just final users.
Each customer therefore could send hundreds sms in really short time. I'm looking for an efficient way to update customer's balance. Each transaction has a little price but I've a lot of them.
I could derive the balance making a SELECT SUM(deposit-costs) FROM... but this would be very expensive to do as soon I've milions of records in my system.
On the other hand, if I keep the value of the balance in a column, I would have two problems:
concurrency problem: I could have many transactions at the same time that want to update the balance. I could use pessimistic lock but I would slow down the entire system
correctness of the data: The balance could be wrong due to some wrong/miss update
I could mitigate these points running a task at the end of the day to fix the stored balance with value of the derived one, but:
if I've hundreds of customers it could stuck my system for some time
some heedful customer could notice the variation of his balance and could ask for explanation. It's not nice that your balance change without explanation when you are not doing anything
I'm looking for some advice and best practice to follow. In the end several big companies are selling their service "pay as you go", so I guess there is a common way to handle the problem.
In banking, people are quite careful about money. Generally, the "place for truth" is the database. You can make the "place for truth" memory, but this is more sophisticated requiring concurrent in memory databases. What if one of your servers goes down in the middle of a transaction? You need to be able to quickly failover the database to a backup.
Do a benchmark to see if database updates times meet your needs. There are various ways to speed them up moderately. If these rates are in your acceptable range, then do it this way. It is the simplest.
A common approach to speed up txn times is to have a threadpool and assign one thread to an account. This way all txns on an account are always handled by the same thread. This allows further optimization.

How do I track progress on a queue job?

I am using the the database queue driver in laravel to run jobs in the background.
One of my jobs creates a given number (thousands to hundred thousands) records in the database. I wrapped the code for this job in a transaction so that in case the job failed, the database writes would not be commited.
Initially to track progress of the job, i thought i would count the number of created records, divide by total number of expected records then display that in a ui as percentage against each job such that users can know how much longer they have to wait.
This doesn't work because the tables are locked during the transaction.
Am wondering if anybody knows how track progress on a queued job
For the ones who stumble on this question, there is a package which allows that: https://github.com/imTigger/laravel-job-status
As given in http://laravel.com/docs/5.1/queues#job-events
The Queue::after method can be called once a job has completed successfully
As given in http://laravel.com/docs/5.1/queues#failed-job-events
The Queue::failing method can be called when a queued job fails
Hope this is helpful :)

Using a table to keep the last used ID in a web server farm

I use a table with one row to keep the last used ID (I have my reasons to not use auto_increment), my app should work in a server farm so I wonder how I can update the last inserted ID (ie. increment it) and select the new ID in one step to avoid problems with thread safety (race condition between servers in the server farm).
You're going to use a server farm for the database? That doesn't sound "right".
You may want to consider using GUID's for Id's. They may be big but they don't have duplicates.
With a single "next id" value you will run into locking contention for that record. What I've done in the past is use a table of ranges of id's (RangeId, RangeFrom, RangeTo). The range table has a primary key of "RangeId" that is a simple number (eg. 1 to 100). The "get next id" routine picks a random number from 1 to 100, gets the first range record with an id lower than the random number. This spreads the locks out across N records. You can use 10's, 100's or 1000's of range records. When a range is fully consumed just delete the range record.
If you're really using multiple databases then you can manually ensure each database's set of range records do not overlap.
You need to make sure that your ID column is only ever accessed in a lock - then only one person can read the highest and set the new highest ID.
You can do this in C# using a lock statement around your code that accesses the table, or in your database you can put together a transaction on your read/write. I don't know the exact syntax for this on mysql.
Use a transactional database and control transactions manually. That way you can submit multiple queries without risking having something mixed up. Also, you may store the relevant query sets in stored procedures, so you can simply invoke these transactional queries.
If you have problems with performance, increment the ID by 100 and use a thread per "client" server. The thread should do the increment and hand each interested party a new ID. This way, the thread needs only access the DB once for 100 IDs.
If the thread crashes, you'll loose a couple of IDs but if that doesn't happen all the time, you shouldn't need to worry about it.
AFAIK the only way to get this out of a DB with nicely incrementing numbers is going to be transactional locks at the DB which is hideous performance wise. You can get a lockless behaviour using GUIDs but frankly you're going to run into transaction requirements in every CRUD operation you can think of anyway.
Assuming that your database is configured to run with a transaction isolation of READ_COMMITTED or better, then use one SQL statement that updates the row, setting it to the old value selected from the row plus an increment. With lower levels of transaction isolation you might need to use INSERT combined with SELECT FOR UPDATE.
As pointed out [by Aaron Digulla] it is better to allocate blocks of IDs, to reduce the number of queries and table locks.
The application must perform the ID acquisition in a separate transaction from any business logic, otherwise any transaction that needs an ID will end up waiting for every transaction that asks for an ID first to commit/rollback.
This article: http://www.ddj.com/architect/184415770 explains the HIGH-LOW strategy that allows your application to obtain IDs from multiple allocators. Multiple allocators improve concurrency, reliability and scalability.
There is also a long discussion here: http://www.theserverside.com/patterns/thread.tss?thread_id=4228 "HIGH/LOW Singleton+Session Bean Universal Object ID Generator"