Derived vs Stored account balance in high rate transactions system - mysql

I'm writing a Spring Boot 2.x application using Mysql as DBMS. I use Spring Data and Hibernate.
I want to realize a SMS gateway for my customers. Each customer has an account in my system and a balance.
For each sms sent, the balance of the customer must be subctracted by the sms cost. Furthemore, before send the sms the balance should be checked in order to see if the customer has enough credit (this imply having an updated balance to check).
I want to handle a high rate of sms because customers are business and not just final users.
Each customer therefore could send hundreds sms in really short time. I'm looking for an efficient way to update customer's balance. Each transaction has a little price but I've a lot of them.
I could derive the balance making a SELECT SUM(deposit-costs) FROM... but this would be very expensive to do as soon I've milions of records in my system.
On the other hand, if I keep the value of the balance in a column, I would have two problems:
concurrency problem: I could have many transactions at the same time that want to update the balance. I could use pessimistic lock but I would slow down the entire system
correctness of the data: The balance could be wrong due to some wrong/miss update
I could mitigate these points running a task at the end of the day to fix the stored balance with value of the derived one, but:
if I've hundreds of customers it could stuck my system for some time
some heedful customer could notice the variation of his balance and could ask for explanation. It's not nice that your balance change without explanation when you are not doing anything
I'm looking for some advice and best practice to follow. In the end several big companies are selling their service "pay as you go", so I guess there is a common way to handle the problem.

In banking, people are quite careful about money. Generally, the "place for truth" is the database. You can make the "place for truth" memory, but this is more sophisticated requiring concurrent in memory databases. What if one of your servers goes down in the middle of a transaction? You need to be able to quickly failover the database to a backup.
Do a benchmark to see if database updates times meet your needs. There are various ways to speed them up moderately. If these rates are in your acceptable range, then do it this way. It is the simplest.
A common approach to speed up txn times is to have a threadpool and assign one thread to an account. This way all txns on an account are always handled by the same thread. This allows further optimization.

Related

Should you scale through tables or computation in Mysql?

I have a project with customers buying a product with platform based tokens. I have a mysql table that tracks a customer buying x amount and one tracking customer consumption(-x amount). In order to display their Amount of tokens they have left on the platform and query funds left on spending I wanted to query (buys - comsumed). But I remembered that people alsways talk about space is cheaper than computation(Not just $ but querytime as well). Should I have a seperate table for querying amount that gets updated with each buy or consume ?
So far I have always tried to use the least amount of tables to make it simple and have easy oversight, but I start to question if that is right...
There is no right answer, keep in mind the goal of the application, and updates in software likely to happen.
If you keep in these 2 tables transactions the user may have, then the new column was necessary, cause you had to sum the columns. If one row is for one user (likely your case), then 90% you should use those 2 tables only.
I would suggest you not have that extra column. As far with my expierence, in that kind of situations has the down of the bigger the project becomes, the more difficult is for you and the other developers, to have in mind to update the new column, because is dependent variable.
Also, when the user buy products or consumption tokens, you will have to update the new token, so energy and time loss as well.
You can store the (buys - consumed) in session, and update when is needed(if real time update is not necessary, not multiple devices).
If you need continuous update, so multiple queries over time, then memory loss over energy-time loss is greater, so you should have that 3 table - column.

MySQL query design for booking database with limited amount of goods and heavy traffic

We're running a site for booking salmon fishing licenses. The site has no problem handling the traffic 364 days a year. The 365th day is when the license sale opens, and that's where the problem occurs. The servers are struggling more and more each year due to increased traffic, and we have to further optimize our booking query.
The licenses are divided into many different types (tbl_license_types), and each license type are connected to one or more fishing zones (tbl_zones).
Each license type can have a seasonal quota, which is a single value set as an integer field in tbl_license_types.
Each zone can have a daily quota, a weekly quota and a seasonal quota. The daily quota is the same for all days, and the seasonal quota of course is a single value. Daily and seasonal are therefore integer fields in tbl_zones. The weekly quota however differs by week, therefore those are specified in the separate tbl_weekly_quotas.
Bookings can be for one or more consecutive dates, but are only stated as From_date and To_date in tbl_shopping_cart (and tbl_bookings). For each booking attempt made by a user, the quotas have to be checked against already allowed bookings in both tbl_shopping_cart and tbl_bookings.
To be able to count/group by date, we use a view called view_season_calendar with a single column containing all the dates of the current season.
In the beginning we used a transaction where we first made a query to check the quotas, and if quotas allowed we would use a second query to insert the booking to tbl_bookings.
However that gave a lot of deadlocks under relatively moderate traffic, so we redesigned it to a single query (pseudo-code):
INSERT INTO tbl_bookings (_booking_)
WHERE _lowest_found_quota_ >= _requested_number_of_licenses_
where _lowest_found_quota_ is a ~330 lines long SELECT with multiple subqueries and the related tables being joined multiple times in order to check all quotas.
Example: User wants to book License type A, for zones 5 and 6, from 2020-05-19 to 2020-05-25.
The system needs to
count previous bookings of license type A against the license type A seasonal quota.
count previous bookings in zone 5 for each of the 6 dates against zone 5 daily quota.
same count for zone 6.
count previous bookings in zone 5 for each of the two weeks the dates are part of, against zone 5 weekly quota.
same count for zone 6.
count all previous bookings in zone 5 for the current season against zone 5 seasonal quota.
same count for zone 6.
If all are within quotas, insert the booking.
As I said this was working well earlier, but due to higher traffic load we need to optimize this query further now. I have some thoughts on how to do this;
Using isolation level READ UNCOMMITTED on each booking until quotas for the requested zones/license type are nearly full, then fallback to the default REPEATABLE READ. As long as there's a lot left of the quota, the count doesn't need to be 100% correct. This will greatly reduce lock waits and deadlocks, right?
Creating one or more views which keeps count of all bookings for each date, week, zone and license type, then using those views in the WHERE clauses of the insert.
If doing nr 2, use READ UNCOMMITTED in the views. If views report relevant quota near full, cancel the INSERT and start a new one with the design we're using today. (Hopefully traffic levels are coming down before quotas are becoming full)
I would greatly appreciate thoughts on how the query can be done as efficient as possible.
Rate Per Second = RPS
Suggestions to consider for your AWS Parameters group
innodb_lru_scan_depth=100 # from 1024 to conserve 90% of CPU cycles used for function every second
innodb_flush_neighbors=2 # from 1 to reduce time required to lower innodb_buffer_pool_pages_dirty count when busy
read_rnd_buffer_size=128K # from 512K to reduce handler_read_rnd_next RPS of 3,227
innodb_io_capacity=1900 # from 200 to use more of SSD IOPS
log_slow_verbosity=query_plan,explain # to make slow query log more useful
have_symlink=NO # from YES for some protection from ransomware
You will find these changes will cause transactions to complete processing quicker. For additional assistance, view profile, Network profile for contact info and free downloadable Utility Scripts to assist with performance tuning. On our FAQ page you will find "Q. How can I find JOINS or QUERIES not using indexes?" to assist in reducing select_scan RPhr of 1,173. Com_rollback averages 1 every ~ 2,700 seconds, usually correctable with consistent read order in maintenance queries.
See if you can upgrade the AWS starting a day before the season opens, then downgrade after the rush. It's a small price to pay for what might be a plenty big performance boost.
Rather than the long complex query for counting, decrement some counters as you go. (This may or may not help, so play around with the idea.)
Your web server has some limit on the number of connections it will handle; limit that rather than letting 2K users get into MySQL and stumble over each other. Think of what a grocery store is like when the aisles are so crowded that no one is getting finished!
Be sure to use "transactions", but don't let them be too broad. If they encompass too many things, the traffic will degenerate to single file (and/or transactions will abort with deadlocks).
Do as much as you can outside of transactions -- such as collecting and checking user names/addresses, etc. If you do this after issuing the license, be ready to undo the license if something goes wrong. (This should be done in code, not via ROLLBACK.
(More)
VIEWs are syntactic sugar; they do not provide any performance or isolation. OTOH, if you make "materialized" views, there might be something useful.
A long "History list" is a potential performance problem (especially CPU). This can happen when lots of connections are in the middle of a transaction at the same time -- each needs to hang onto its 'snapshot' of the dataset.
Whereever possible terminate transactions as soon as possible -- even if you turn around and start a new one. An example in Data Warehousing is to do the 'normalization' before starting the main transaction. (Probably this example does not apply to your app.)
Ponder having a background task computing the quotas. The hope is that the regular tasks can run faster by not having the computation inside their transactions.
A technique used in the reservation industry: (And this sounds somewhat like your item 1.) Plow ahead with minimal locking. At the last moment, have the shortest possible transaction to make the reservation and verify that the room (plane seat, etc) is still available.
If the whole task can be split into (1) read some stuff, then (2) write (and reread to verify that the thing is still available), then... If the read step is heavier than the write step, add more Slaves ('replicas') and use them for the read step. Reserve the Master for the write step. Note that Replicas are easy (and cheap) to add and toss for a brief time.

Accounting System: How to Implement of Account Closing, Manage Billions of Transactions created by multiple users?

I am implementing an accounting System in Codeigniter using MySQL Relational Database. I'm facing some issues. I am recording all transactions in a "Transaction Table" which can be seen as General Ledger in accounting terms. As there could be more than thousand users of the system and there could be billions of transactions in my "Transaction Table".
For extracting accounting reports like Balance Sheet, I need to search all of the transaction of that specific User and then perform other operations to form an "Accounting Report"
Now Come to the issue, as transactions growth is way too high which would definitely slow down my system. How can I handle this issue?
I came to a solution which is "account closing", If I close accounts every month, restrict users to perform Create,Update and Delete functionality then I'll be able to use pre-calculated values of accounts for my accounting reports, but I can not restrict user to VIEW previous transactions, so in that case system will have to search out for his transactions from billions of transactions which is not solving my problem.
I'm thinking about transferring closed transactions to separate table of every user which can solve the problem, but my Database is normalized(3NF). Would it be a good idea to create separate tables for every user while creating that user's account and manage that newly created table's relation?
Transaction Table Image
Your model may be wrong. You assume one credited and one debited account per transaction. Often there are 3 or more accounts taking part in transaction.
Regarding your scale question - don't worry about it now.
Get your product to be successful. If you get close to bilion entries hire a consultant to help you out, there are many techniques he or she could use, sharding being one of the keywords.
If you don't assume being able to afford consultatnt with 1 bln entries for multi-user accounting system, your business model is flawed and you're getting yourself in big trouble down the road. If that's the case it's better to rethink it now.
By my estimate with your current structure the table would hold about 0.5 GB per bilion entries. MySQL should handle this with no issue, unless your server has space limitations.

Decrementing money balance stored on master server from numerous backends? (distributed counter, eh?)

I have some backend servers located in two differend datacenters (in USA and in Europe). These servers are just delivering ads on CPM basis.
Beside that I have big & fat master MySQL server serving advertiser's ad campaign's money balances. Again, all ad campaigns are being delivered on CPM basis.
On every impression served from any of backends I have to decrement ad campaign's money balance according to impression price.
For example, price per one impression is 1 cent. Backend A has delivered 50 impressions and will decrement money balance by 50 cents. Backed B has delivered 30 impressions and it will decrement money balance by 30 cents.
So, main problems as I see are:
Backends are serving about 2-3K impressions every seconds. So, decrementing money balance on fly in MySQL is not a good idea imho.
Backends are located in US and EU datacenters. MySQL master server is located in USA. Network latency could be a problem [EU backend] <-> [US master]
As possible solutions I see:
Using Cassandra as distributed counter storage. I will try to be aware of this solution as long possible.
Reserving part on money by backend. For example, backend A is connecting to master and trying to reserve $1. As $1 is reserved and stored locally on backend (in local Redis for example) there is no problem to decrement it with light speed. Main problem I see is returning money from backend to master server if backend is being disabled from delivery scheme ("disconnected" from balancer). Anyway, it seems to be very nice solution and will allow to stay in current technology stack.
Any suggestions?
UPD: One important addition. It is not so important to deliver ads impressions with high precision. We can deliver more impressions than requested, but never less.
How about instead of decrementing balance, you keep a log of all reported work from each backend, and then calculate balance when you need it by subtracting the sum of all reported work from the campaign's account?
Tables:
campaign (campaign_id, budget, ...)
impressions (campaign_id, backend_id, count, ...)
Report work:
INSERT INTO impressions VALUES ($campaign_id, $backend_id, $served_impressions);
Calculate balance of a campaign only when necessary:
SELECT campaign.budget - impressions.count * $impression_price AS balance
FROM campaign INNER JOIN impressions USING (campaign_id);
This is perhaps the most classical ad-serving/impression-counting problem out there. You're basically trying to balance a few goals:
Not under-serving ad inventory, thus not making as much money as you could.
Not over-serving ad inventory, thus serving for free since you can't charge the customer for your mistake.
Not serving the impressions too quickly, because usually customers want an ad to run through a given calendar time period, and serving them all in an hour between 2-3 AM makes those customers unhappy and doesn't do them any good.
This is tricky because you don't necessarily know how many impressions will be available for a given spot (since it depends on traffic), and it gets even more tricky if you do CPC instead of CPM, since you then introduce another unknowable variable of click-through rate.
There isn't a single "right" pattern for this, but what I have seen to be successful through my years of consulting is:
Treat the backend database as your authoritative store. Partition it by customer as necessary to support your goals for scalability and fault tolerance (limiting possible outages to a fraction of customers). The database knows that you have an ad insertion order for e.g. 1000 impressions over the course of 7 days. It is periodically updated (minutes to hours) to reflect the remaining inventory and some basic stats to bootstrap the cache in case of cache loss, such as actual
Don't bother with money balances at the ad server level. Deal with impression counts, rates, and targets only. Settle that to money balances after the fact through logging and offline processing.
Serve ad inventory from a very lightweight and fast cache (near the web servers) which caches the impression remaining count and target serving velocity of an insertion order, and calculates the actual serving velocity.
Log all served impressions with relevant data.
Periodically collect serving velocities and push them back to the database.
Periodically collect logs and calculate actual served inventory and push it back to the database. (You may need to recalculate from logs due to outages, DoSes, spam, etc.)
Create a service on your big & fat master MySQL server serving advertiser's ad campaign's money balances.
This service must implement a getCampaignFund(idcampaign, requestingServerId, currentLocalAccountBalanceAtTheRequestingServer) that returns a creditLimit to the regional server.
Imagine a credit card mechanism. Your master server will give some limit to your regional servers. Once this limit is decreasing, a threshold trigger this request to get a new limit. But to get the new credit limit the regional server must inform how much it had used from the previous limit.
Your regional servers might implement additionally these services:
currentLocalCampaignAccountBalance
getCampaignAccountBalance(idcampaign): to inform the current usage of a specific campaign, so the main server might update all campaigns at a specific time.
addCampaign(idcampaign, initialBalance): to register a new campaign
and it's start credit limit.
supendCampaign(idcampaign): to suspend the impressions to a
campaign.
resumeCampaign(idcampaign): to resume impression to a campaign.
currentLocalCampaignAccountBalance finishCampaign(idcampaign): to
finish a campaign and return the current local account balance.
currentLocalCampaignAccountBalance
updateCampaignLimit(idcampaign, newCampaignLimit): to update the limit
(realocation of credit between regional servers). This service will
update the campaign credit limit and return the account balance of
the previous credit limit acquired.
Services are great so you have a loosely coupled architecture. Even if your main server goes offline for some time, your regional servers will keep running until they have not finished their credit limits.
this may not be a detailed canonical answer but i'll offer my thoughts as possible [and at least partial] solutions.
i'll have to guess a bit here because the question doesn't say much about what measurements have been taken to identify mysql bottlenecks, which imho is the place to start. i say that because imho 1-2k transactions per second is not out of range for mysql. i've easily supported volumes this high [and much higher] with some combination of the following techniques, in no particular order here because it depends on what measurements tell me are the bottlenecks: 0-database redesign; 1-tuning buffers; 2-adding ram; 3-solid state drives; 4-sharding; 5-upgrading to mysql 5.6+ if on 5.5 or lower. so i'd take some measurements and apply the foregoing as called for by the results of the measurements.
hope this helps.
I assume
Ads are probably bought in batches of at least a couple of thousands
There are ads from several different batches being delivered at the same time, not all of which be near empty at the same time
It is OK to serve some extra ads if your infrastructure is down.
So, here's how I would do it.
The BigFat backend has these methods
getCurrentBatches() that will deliver a list of batches that can be used for a while. Each batch contains a rate with the number of ads that can be served each second. Each batch also contains a serveMax; how many ads might be served before talking to BigFat again.
deductAndGetNextRateAndMax(batchId, adsServed) that will deduct the number of ads served since last call and return a new rate (that might be the same) and a new serveMax.
The reason to have a rate for each batch is that when one batch is starting to run out of funds it will be served less until it's totally depleted.
If one backend doesn't connect to BigFat for a while it will reach serveMax and only serve ads from other batches.
The backends could have a report period of seconds, minutes or even hours depending on serveMax. A brand new batch with millions of impressions left can run safely for a long while before reporting back.
When BigFat gets a call to deductAndGetNextRateAndMax it deducts the number of served ads and then returns something like 75% of the total remaining impressions up to a configured max. This means that at the end of batch, if it isn't refilled, there will be some ads delivered after the batch is empty but it's better that the batch is actually depleted rather than almost depleted for a long time.

High Concurrency Counter (For Credit Management)

I am wondering the best way to do this for speed and accuracy, here is what our application does:
Check if credit is 1 or above (Pre check)
Process job (takes a little time)
Job complete, check if still credits exist to finish job (credit count 1 or above)
Deduct credit
Finish job
This process is repeated 50,000+ times (threaded, using a queue system) and is currently using a mysql database to handle the counter.
Is there any better solutions other than a mysql database style counter?
I was thinking a schema like:
user_id | credit_count
Is this the best schema I should use?
And the thread just locks row than deducts credit, than release row for next thread.
Not sure about which technology are you using for the process.
You could somehow acumulate the counting in the languaje of your processes, and only dump the counter to the database from time to time.
You won't be increasing the count on each request, but every n requests...
Ever look at memcached? If you want to stick with mysql you could change the table to a memory table.
Also read this.