MySQL query design for booking database with limited amount of goods and heavy traffic - mysql

We're running a site for booking salmon fishing licenses. The site has no problem handling the traffic 364 days a year. The 365th day is when the license sale opens, and that's where the problem occurs. The servers are struggling more and more each year due to increased traffic, and we have to further optimize our booking query.
The licenses are divided into many different types (tbl_license_types), and each license type are connected to one or more fishing zones (tbl_zones).
Each license type can have a seasonal quota, which is a single value set as an integer field in tbl_license_types.
Each zone can have a daily quota, a weekly quota and a seasonal quota. The daily quota is the same for all days, and the seasonal quota of course is a single value. Daily and seasonal are therefore integer fields in tbl_zones. The weekly quota however differs by week, therefore those are specified in the separate tbl_weekly_quotas.
Bookings can be for one or more consecutive dates, but are only stated as From_date and To_date in tbl_shopping_cart (and tbl_bookings). For each booking attempt made by a user, the quotas have to be checked against already allowed bookings in both tbl_shopping_cart and tbl_bookings.
To be able to count/group by date, we use a view called view_season_calendar with a single column containing all the dates of the current season.
In the beginning we used a transaction where we first made a query to check the quotas, and if quotas allowed we would use a second query to insert the booking to tbl_bookings.
However that gave a lot of deadlocks under relatively moderate traffic, so we redesigned it to a single query (pseudo-code):
INSERT INTO tbl_bookings (_booking_)
WHERE _lowest_found_quota_ >= _requested_number_of_licenses_
where _lowest_found_quota_ is a ~330 lines long SELECT with multiple subqueries and the related tables being joined multiple times in order to check all quotas.
Example: User wants to book License type A, for zones 5 and 6, from 2020-05-19 to 2020-05-25.
The system needs to
count previous bookings of license type A against the license type A seasonal quota.
count previous bookings in zone 5 for each of the 6 dates against zone 5 daily quota.
same count for zone 6.
count previous bookings in zone 5 for each of the two weeks the dates are part of, against zone 5 weekly quota.
same count for zone 6.
count all previous bookings in zone 5 for the current season against zone 5 seasonal quota.
same count for zone 6.
If all are within quotas, insert the booking.
As I said this was working well earlier, but due to higher traffic load we need to optimize this query further now. I have some thoughts on how to do this;
Using isolation level READ UNCOMMITTED on each booking until quotas for the requested zones/license type are nearly full, then fallback to the default REPEATABLE READ. As long as there's a lot left of the quota, the count doesn't need to be 100% correct. This will greatly reduce lock waits and deadlocks, right?
Creating one or more views which keeps count of all bookings for each date, week, zone and license type, then using those views in the WHERE clauses of the insert.
If doing nr 2, use READ UNCOMMITTED in the views. If views report relevant quota near full, cancel the INSERT and start a new one with the design we're using today. (Hopefully traffic levels are coming down before quotas are becoming full)
I would greatly appreciate thoughts on how the query can be done as efficient as possible.

Rate Per Second = RPS
Suggestions to consider for your AWS Parameters group
innodb_lru_scan_depth=100 # from 1024 to conserve 90% of CPU cycles used for function every second
innodb_flush_neighbors=2 # from 1 to reduce time required to lower innodb_buffer_pool_pages_dirty count when busy
read_rnd_buffer_size=128K # from 512K to reduce handler_read_rnd_next RPS of 3,227
innodb_io_capacity=1900 # from 200 to use more of SSD IOPS
log_slow_verbosity=query_plan,explain # to make slow query log more useful
have_symlink=NO # from YES for some protection from ransomware
You will find these changes will cause transactions to complete processing quicker. For additional assistance, view profile, Network profile for contact info and free downloadable Utility Scripts to assist with performance tuning. On our FAQ page you will find "Q. How can I find JOINS or QUERIES not using indexes?" to assist in reducing select_scan RPhr of 1,173. Com_rollback averages 1 every ~ 2,700 seconds, usually correctable with consistent read order in maintenance queries.

See if you can upgrade the AWS starting a day before the season opens, then downgrade after the rush. It's a small price to pay for what might be a plenty big performance boost.
Rather than the long complex query for counting, decrement some counters as you go. (This may or may not help, so play around with the idea.)
Your web server has some limit on the number of connections it will handle; limit that rather than letting 2K users get into MySQL and stumble over each other. Think of what a grocery store is like when the aisles are so crowded that no one is getting finished!
Be sure to use "transactions", but don't let them be too broad. If they encompass too many things, the traffic will degenerate to single file (and/or transactions will abort with deadlocks).
Do as much as you can outside of transactions -- such as collecting and checking user names/addresses, etc. If you do this after issuing the license, be ready to undo the license if something goes wrong. (This should be done in code, not via ROLLBACK.
(More)
VIEWs are syntactic sugar; they do not provide any performance or isolation. OTOH, if you make "materialized" views, there might be something useful.
A long "History list" is a potential performance problem (especially CPU). This can happen when lots of connections are in the middle of a transaction at the same time -- each needs to hang onto its 'snapshot' of the dataset.
Whereever possible terminate transactions as soon as possible -- even if you turn around and start a new one. An example in Data Warehousing is to do the 'normalization' before starting the main transaction. (Probably this example does not apply to your app.)
Ponder having a background task computing the quotas. The hope is that the regular tasks can run faster by not having the computation inside their transactions.
A technique used in the reservation industry: (And this sounds somewhat like your item 1.) Plow ahead with minimal locking. At the last moment, have the shortest possible transaction to make the reservation and verify that the room (plane seat, etc) is still available.
If the whole task can be split into (1) read some stuff, then (2) write (and reread to verify that the thing is still available), then... If the read step is heavier than the write step, add more Slaves ('replicas') and use them for the read step. Reserve the Master for the write step. Note that Replicas are easy (and cheap) to add and toss for a brief time.

Related

Running a cron to update 1 million records in every hour fails

We have an E-commerce system with more than 1 million users with a total or 4 to 5 million records in order table. We use codeigniter framework as back end and Mysql as database.
Due to this excessive number of users and purchases, we use cron jobs to update the order details and referral bonus points in every hour to make the things work.
Now we have a situation that these data updates exceeds one hour and the next batch of updates reach before finishing the previous one, there by leading into a deadlock and failure of the system.
I'd like to know about the different possible architectural and database scaling options and suggestions to get rid of this situation. We are using only the monolithic architecture to run this application.
Don't use cron. Have a single process that starts over when it finishes. If one pass lasts more than an hour, the next one will start late. (Checking PROCESSLIST is clumsy and error-prone. OTOH, this continually-running approach needs a "keep-alive" cronjob.)
Don't UPDATE millions of rows. Instead, find a way to put the desired info in a separate table that the user joins to. Presumably, that extra table would only 1 row (if everyone is controlled by the same game) or a small number of rows (if there are only a small number of patterns to handle).
Do have the slowlog turned on, with a small value for long_query_time (possibly "1.0", maybe lower). Use pt-query-digest to summarize it to find the "worst" queries. Then we can help you make them take less time, thereby helping to calm your busy system and improve the 'user experience'.
Do use batched INSERT. (A one INSERT with 100 rows runs about 10 times as fast as 100 single-row INSERTs.) Batching UPDATEs is tricky, but can be done with IODKU.
Do use batches of 100-1000 rows. (This is somewhat optimal considering the various things that can happen.)
Do use transactions judiciously. Do check for errors (including deadlocks) at every step.
Do tell us what you are doing in the hourly update. We might be able to provide more targeted advice than that 15-year-old book.
Do realize that you have scaled beyond the capabilities of the typical 3rd-party package. That is, you will have to learn the details of SQL.
I have some ideas here for you - mixed up with some questions.
Assuming you are limited in what you can do (i.e. you can't re-architect you way out of this) and that the database can't be tuned further:
Make the list of records to be processed as small as possible
i.e. Does the job have to run over all records? These 4-5 million records - are they all active orders, or that's how many you have in total for all time? Obviously just process the bare minimum.
Split and parallel process
You mentioned "batches" but never explained what that meant - can you elaborate?
Can you get multiple instances of the cron job to run at once, each covering a different segment of the records?
Multi-Record Operations
The easy (lazy) way to program updates is to do it in a loop that iterates through each record and processes it individually, but relational databases can do updates over multiple records at once. I'm pretty sure there's a proper term for that but I can't recall it. Are you processing each row individually or doing multi-record updates?
How does the cron job query the database? Have you hand-crafted the most efficient queries possible, or are you using some ORM / framework to do stuff for you?

Derived vs Stored account balance in high rate transactions system

I'm writing a Spring Boot 2.x application using Mysql as DBMS. I use Spring Data and Hibernate.
I want to realize a SMS gateway for my customers. Each customer has an account in my system and a balance.
For each sms sent, the balance of the customer must be subctracted by the sms cost. Furthemore, before send the sms the balance should be checked in order to see if the customer has enough credit (this imply having an updated balance to check).
I want to handle a high rate of sms because customers are business and not just final users.
Each customer therefore could send hundreds sms in really short time. I'm looking for an efficient way to update customer's balance. Each transaction has a little price but I've a lot of them.
I could derive the balance making a SELECT SUM(deposit-costs) FROM... but this would be very expensive to do as soon I've milions of records in my system.
On the other hand, if I keep the value of the balance in a column, I would have two problems:
concurrency problem: I could have many transactions at the same time that want to update the balance. I could use pessimistic lock but I would slow down the entire system
correctness of the data: The balance could be wrong due to some wrong/miss update
I could mitigate these points running a task at the end of the day to fix the stored balance with value of the derived one, but:
if I've hundreds of customers it could stuck my system for some time
some heedful customer could notice the variation of his balance and could ask for explanation. It's not nice that your balance change without explanation when you are not doing anything
I'm looking for some advice and best practice to follow. In the end several big companies are selling their service "pay as you go", so I guess there is a common way to handle the problem.
In banking, people are quite careful about money. Generally, the "place for truth" is the database. You can make the "place for truth" memory, but this is more sophisticated requiring concurrent in memory databases. What if one of your servers goes down in the middle of a transaction? You need to be able to quickly failover the database to a backup.
Do a benchmark to see if database updates times meet your needs. There are various ways to speed them up moderately. If these rates are in your acceptable range, then do it this way. It is the simplest.
A common approach to speed up txn times is to have a threadpool and assign one thread to an account. This way all txns on an account are always handled by the same thread. This allows further optimization.

Decrementing money balance stored on master server from numerous backends? (distributed counter, eh?)

I have some backend servers located in two differend datacenters (in USA and in Europe). These servers are just delivering ads on CPM basis.
Beside that I have big & fat master MySQL server serving advertiser's ad campaign's money balances. Again, all ad campaigns are being delivered on CPM basis.
On every impression served from any of backends I have to decrement ad campaign's money balance according to impression price.
For example, price per one impression is 1 cent. Backend A has delivered 50 impressions and will decrement money balance by 50 cents. Backed B has delivered 30 impressions and it will decrement money balance by 30 cents.
So, main problems as I see are:
Backends are serving about 2-3K impressions every seconds. So, decrementing money balance on fly in MySQL is not a good idea imho.
Backends are located in US and EU datacenters. MySQL master server is located in USA. Network latency could be a problem [EU backend] <-> [US master]
As possible solutions I see:
Using Cassandra as distributed counter storage. I will try to be aware of this solution as long possible.
Reserving part on money by backend. For example, backend A is connecting to master and trying to reserve $1. As $1 is reserved and stored locally on backend (in local Redis for example) there is no problem to decrement it with light speed. Main problem I see is returning money from backend to master server if backend is being disabled from delivery scheme ("disconnected" from balancer). Anyway, it seems to be very nice solution and will allow to stay in current technology stack.
Any suggestions?
UPD: One important addition. It is not so important to deliver ads impressions with high precision. We can deliver more impressions than requested, but never less.
How about instead of decrementing balance, you keep a log of all reported work from each backend, and then calculate balance when you need it by subtracting the sum of all reported work from the campaign's account?
Tables:
campaign (campaign_id, budget, ...)
impressions (campaign_id, backend_id, count, ...)
Report work:
INSERT INTO impressions VALUES ($campaign_id, $backend_id, $served_impressions);
Calculate balance of a campaign only when necessary:
SELECT campaign.budget - impressions.count * $impression_price AS balance
FROM campaign INNER JOIN impressions USING (campaign_id);
This is perhaps the most classical ad-serving/impression-counting problem out there. You're basically trying to balance a few goals:
Not under-serving ad inventory, thus not making as much money as you could.
Not over-serving ad inventory, thus serving for free since you can't charge the customer for your mistake.
Not serving the impressions too quickly, because usually customers want an ad to run through a given calendar time period, and serving them all in an hour between 2-3 AM makes those customers unhappy and doesn't do them any good.
This is tricky because you don't necessarily know how many impressions will be available for a given spot (since it depends on traffic), and it gets even more tricky if you do CPC instead of CPM, since you then introduce another unknowable variable of click-through rate.
There isn't a single "right" pattern for this, but what I have seen to be successful through my years of consulting is:
Treat the backend database as your authoritative store. Partition it by customer as necessary to support your goals for scalability and fault tolerance (limiting possible outages to a fraction of customers). The database knows that you have an ad insertion order for e.g. 1000 impressions over the course of 7 days. It is periodically updated (minutes to hours) to reflect the remaining inventory and some basic stats to bootstrap the cache in case of cache loss, such as actual
Don't bother with money balances at the ad server level. Deal with impression counts, rates, and targets only. Settle that to money balances after the fact through logging and offline processing.
Serve ad inventory from a very lightweight and fast cache (near the web servers) which caches the impression remaining count and target serving velocity of an insertion order, and calculates the actual serving velocity.
Log all served impressions with relevant data.
Periodically collect serving velocities and push them back to the database.
Periodically collect logs and calculate actual served inventory and push it back to the database. (You may need to recalculate from logs due to outages, DoSes, spam, etc.)
Create a service on your big & fat master MySQL server serving advertiser's ad campaign's money balances.
This service must implement a getCampaignFund(idcampaign, requestingServerId, currentLocalAccountBalanceAtTheRequestingServer) that returns a creditLimit to the regional server.
Imagine a credit card mechanism. Your master server will give some limit to your regional servers. Once this limit is decreasing, a threshold trigger this request to get a new limit. But to get the new credit limit the regional server must inform how much it had used from the previous limit.
Your regional servers might implement additionally these services:
currentLocalCampaignAccountBalance
getCampaignAccountBalance(idcampaign): to inform the current usage of a specific campaign, so the main server might update all campaigns at a specific time.
addCampaign(idcampaign, initialBalance): to register a new campaign
and it's start credit limit.
supendCampaign(idcampaign): to suspend the impressions to a
campaign.
resumeCampaign(idcampaign): to resume impression to a campaign.
currentLocalCampaignAccountBalance finishCampaign(idcampaign): to
finish a campaign and return the current local account balance.
currentLocalCampaignAccountBalance
updateCampaignLimit(idcampaign, newCampaignLimit): to update the limit
(realocation of credit between regional servers). This service will
update the campaign credit limit and return the account balance of
the previous credit limit acquired.
Services are great so you have a loosely coupled architecture. Even if your main server goes offline for some time, your regional servers will keep running until they have not finished their credit limits.
this may not be a detailed canonical answer but i'll offer my thoughts as possible [and at least partial] solutions.
i'll have to guess a bit here because the question doesn't say much about what measurements have been taken to identify mysql bottlenecks, which imho is the place to start. i say that because imho 1-2k transactions per second is not out of range for mysql. i've easily supported volumes this high [and much higher] with some combination of the following techniques, in no particular order here because it depends on what measurements tell me are the bottlenecks: 0-database redesign; 1-tuning buffers; 2-adding ram; 3-solid state drives; 4-sharding; 5-upgrading to mysql 5.6+ if on 5.5 or lower. so i'd take some measurements and apply the foregoing as called for by the results of the measurements.
hope this helps.
I assume
Ads are probably bought in batches of at least a couple of thousands
There are ads from several different batches being delivered at the same time, not all of which be near empty at the same time
It is OK to serve some extra ads if your infrastructure is down.
So, here's how I would do it.
The BigFat backend has these methods
getCurrentBatches() that will deliver a list of batches that can be used for a while. Each batch contains a rate with the number of ads that can be served each second. Each batch also contains a serveMax; how many ads might be served before talking to BigFat again.
deductAndGetNextRateAndMax(batchId, adsServed) that will deduct the number of ads served since last call and return a new rate (that might be the same) and a new serveMax.
The reason to have a rate for each batch is that when one batch is starting to run out of funds it will be served less until it's totally depleted.
If one backend doesn't connect to BigFat for a while it will reach serveMax and only serve ads from other batches.
The backends could have a report period of seconds, minutes or even hours depending on serveMax. A brand new batch with millions of impressions left can run safely for a long while before reporting back.
When BigFat gets a call to deductAndGetNextRateAndMax it deducts the number of served ads and then returns something like 75% of the total remaining impressions up to a configured max. This means that at the end of batch, if it isn't refilled, there will be some ads delivered after the batch is empty but it's better that the batch is actually depleted rather than almost depleted for a long time.

max mysql queries per minute

How many queries does the slave can handle per minute?
In my case, I used the show status like '%questions%' and found that after the interval of 1 minute, around 5,000 queries were executed.
Is it the normal behavior or it can improved?
There are many inter-related factors which influence the answer to your questions. They include, but are not limited to
The distribution of queries to your slave
The timing and types of updates coming in via replication
Your dataset
Your hardware
Your MySQL version and engines being employed
Having said that, the stat of "5000 questions per minute" by itself it not sufficient to raise any flags in my books.
It might be more worthwhile to determine if your application is operating within an acceptable range; e.g. what the application's average response time or worst case response time.
It's worth noting that the questions counter includes more than just SELECTs. I can't find a comprehensive list or reference, but, for some examples, I believe that "UPDATEs, DELETEs, SHOW STATUS, USE $db, SHOW TABLES" all increment the questions status variable.
Well, my server is currently reporting that in the month it's been running it has executed an average of 3,090 queries per second, that's 185k per minute. So there's definitely something that could be improved!

Storing affiliate leads and conversions

I've created an affilaite system that tracks leads and conversions. The leads and conversions records will go into the millions so I need a good way to store them. Users will need to track the stats hourly, daily, weekly and monthly.
Whats the best way to store the leads and conversions?
For this type of system, you need to keep all of the detail records. Reason being at some point someone is going to contest an invoice.
However, you should have some roll up tables. Each hour, compute current hours work and store the results. Do the same for daily, weekly, and monthly.
If some skewing is okay, you can compute the daily amounts off of the 24 hourly computed records. Weekly, off of the last 7 daily records. For monthly you might want to compute back off of the hourly records, because each month doesn't quite add up to 4 full weeks.. Also, it helps reduce noise from any averaging you might be doing.
I'd recommend a two step archival process. The first one should run once a day and move the records into a separate "hot" database. Try to keep 3 months hot for any type of research queries you need to do.
The second archive process is up to you. You could simply move any records older than 3 months into some type of csv file and simply back it up. After some period of time (a year?) delete them depending on your data retention agreements.
Depending on the load, you may need to have multiple web servers handling the lead and conversion pixels firing. One option is to store the raw data records on each web/mysql server, and then run an archival process every 5-10 minutes that stores them in a highly normalized table structure, and which performs any required roll-ups to achieve the performance you are looking for.
Make sure you keep row size as small as possible, store IP's as unsigned ints, store referees as INTs that reference lookup tables, etc.