How to manage realtime push notifications for large number of users? - mysql

I have a cron service running in the background every minute. The cron service is responsible to send realtime notifications to users based on a complex logic. It is mandatory for the cron to complete processing and deliver push notifications to all the users within the given minute. I use a third party push notification service to manage the delivery of the push notifications.
At any given minute, I have 50,000 users (increasing with time) who might be possible candidates to receive these notifications. I iterate through 50,000 users in a for loop in batches of 1000 to perform my application logic. The application logic involves 20+ simple database queries per user.
Is this a recommended architecture?
When the cron executes, the CPU utilisation shoots upto 100% for a span 3-4 seconds, 4-5 times a minute. The repeated 100% utilisation is because I have added delays between subsequent heavy computations to even the utilisation over a minute. Should I be concerned about this?

Related

AWS RDS MySQL innodb/btr_search_latch

I am running MySQL 5.7.24 on AWS RDS, I have an InnoDB type table and working fine with normal traffic but when send push notification to 50k user the problem happend.
server features 32 GB RAM, 8vCPU, and my AWS RDS server is db.m5.2xlarge.
the wait/synch/sxlock/innodb/btr_search_latch take resources greater than wait/io/table/sql/handler like below image
the innodb_adaptive_hash_index enabled now
You're trying to send 50,000 push notifications in five minutes?
50,000 / 300 seconds means you're pushing 167 notifications per second and I assume then updating the database to record the result of the push. Probably you are doing this in many concurrent threads so you can do the pushes in parallel.
Have you considered doing these push notifications more gradually, like over 10 or 15 minutes?
Or updating the database in batches?
Or using fewer threads to avoid the high contention on the database?
I used to work for SchoolMessenger, a company that provide notification services for the majority of public schools in the USA. We sent millions of notifications, SMS messages, and phone calls every day. The way we did it was to have a very complex Java application queue up the notifications, and then post them gradually. Then as the results of the pushes came in, these also queued up, and updated the database gradually.
We used MySQL, but we also used it together with ActiveMQ as a persistent queue. Push all the tasks to be done into the queue, then a pool of worker threads would act on the tasks, and push the results back into another queue. Then a result-reading thread would read batches of results from the queue and update the database in bulk updates.
When you are designing a back-end system to do large-scale work to do, you have to think of new ways to architect your application to avoid choke-points.
As a database performance and scaling consultant, I have observed this rule many times:
For every 10x growth in data or traffic, you should reevaluate your software architecture. You may have to redesign some parts of it to work at the larger scale.

Spring Framework #Async method + MySql Performance Degradation - Scalability Problem

I've an api, notifyCustomers() implemented on my batch server which gets called from my application server. It can send notification via three channels SMS, Push & Email. I've separate helper classes for each of them and they all execute in async mode.
I've got around 30k users out of which I usually send notification to the particular set of users ranging from 3k to 20k. The issue that I face is whenever I call that api, mysql performance just goes for a toss, particularly CPU. CPU utilisation goes around 100% for a very long period of around 30 mins
I've figured out workaround by doing following things and it's helping me in keeping things under control:
Using projection instead of domain object
Getting data in batch of 500 in each call
Implemented indexing based on the criteria that I need
No database calls from async methods of SMS, Email and Push
Thread.sleep(10 mins) between each subsequent fetch operation of data batches <== This is the dirty hack that's bothering me a lot
If I remove Thread.sleep() then everything goes haywire because batch server just calls async methods and then fires up db call to fetch next batch of 500 users in very quick successions till the time db server stops responding.
I need help with what I shall be doing in order to get rid of 5th point while keeping things under control? I'm running mysql on RDS with 300 IOPS and 4 GB RAM (db.t3.medium)

MySQL jobs stuck in sidekiq queue

My Rails application takes a JSON blob of ~500 entries (from an API endpoint), throws it into a sidekiq/ redis background queue. The background job parses the blob then loops through the entries to perform a basic Rails Model.find_or_initialize_by_field_and_field() and model.update_attributes().
If this job were in the foreground, it would take a matter of seconds (if that long). I'm seeing these jobs remain in the sidekiq queue for 8 hours. Obviously, something's not right.
I've recently re-tuned the MySQL database to use 75% of available RAM as the buffer_pool_size and divided that amongst 3 buffer pools. I originally thought that might be part of the deadlock but the load avg on the box is still well below any problematic status ( 5 CPU and a load of ~ 2.5 ) At this point, I'm not convinced the DB is the problem though, of course, I can't rule it out.
I'm sure, at this point, I need to scale back the sidekiq worker instances. In anticipation of the added load I increased the concurrency to 300 per worker (I have 2 active workers on different servers.) Under a relatively small amount of load there queues operate as expected; even the problematic jobs are completed in ~1 minute. Though, per the sidekiq documentation >50 concurrent workers is a bad idea. I wasn't having any stability issues at 150 workers per instance. The problem has been this newly introduced job that performs ~500 MySQL finds and updates.
If this were a database timeout issue, the background job should have failed and been moved from the active (busy) queue to the failed queue. That's not the case. They're just getting stuck in the queue.
What other either MySQL or Rails/ sidekiq tuning parameters should I be examining to ensure these jobs succeed, fail, or properly time out?

Make periodic task occur every 2 seconds

I need to check regularly if a new message has been received because the API service I am integrating with does not have a push notification service. How do I set how often a periodic task runs for?
I have the boiler plate code (eg. http://www.c-sharpcorner.com/uploadfile/54f4b6/periodic-and-resourceintensive-tasks-in-windows-phone-mango/) from any example on the internet, but it seems it can only run roughly every 30 minutes :() ?
Unfortunately periodic tasks run not more often than 30 minutes and they are not even guaranteed to run. If you want to run more often than that your only bet is setting up a push notification service...

Does a query to an ScriptDb consume quotas?

In the official documentation of ScriptDb is well documented the storage capabilities of the ScriptDb. However, it is not clear for me if querying a ScriptDb or saving new objects to an ScriptDb consumes quotas. Is there a limit in the number of queries that an app can do to an ScriptDb per day?
ScritpDb is only capped by overall store size starts at 50mb and is the total of all the databases owned by the script author.
The only thing you'll want to be sure with reads is that it doesn't run over on execution times (usually if a script runs for several minutes you should be concerned). For example, the total trigger execution time for a day is 1hr. If you run the script as the person accessing the script then you'll be less likely to run into this.