Slick with Hikari don't use more connections when needed - mysql

I'm trying to understand how Slick-Hikari works, I've read a lot of documentation but I've a use case whose behavior I don't understand.
I'm using Slick 3 with Hikari, with the default configuration. I already have a production app with ~1000 users connected concurrently. My app works with websockets and when I deploy a new release all clients are reconnected. (I know it's not the best way to handle a deploy but I don't have clustering at the moment.) When all these users reconnect, they all starts doing queries to get their user state (dog-pile effect). When it happens Slick starts to throw a lot of errors like:
java.util.concurrent.RejectedExecutionException: Task slick.backend.DatabaseComponent$DatabaseDef$$anon$2#4dbbd9d1 rejected from java.util.concurrent.ThreadPoolExecutor#a3b8495[Running, pool size = 20, active threads = 20, queued tasks = 1000, completed tasks = 23740]
What I think it's happening is that the slick queue for pending queries is full because it can't handle all the clients requesting information from the database. But if I see the metrics that Dropwizard provides me I see the following:
Near 16:45 we se a deploy. Until old instance is terminated we can see that the number of connections goes from 20 to 40. I think that's normal, given how the deploy process is done.
But, if the query queue of Slick becomes full because of the dog-pile effect, why is it not using more than 3-5 connections if it has 20 connections available? The database is performing really well, so I think the bottleneck is in Slick.
Do you have any advice for improving this deploy process? I have only 1000 users now, but I'll have a lot more in few weeks.

Based on the "rejected" exception, I think many slick actions were submitted to slick concurrently, which exceeded the default size(1000) of the queue embedded in slick.
So I think you should:
increase queue size(queueSize) to hold more unprocessed actions.
increase number of thread(numThreads) in slick to process more actions concurrently.
You can get more tips here

Related

Spring Framework #Async method + MySql Performance Degradation - Scalability Problem

I've an api, notifyCustomers() implemented on my batch server which gets called from my application server. It can send notification via three channels SMS, Push & Email. I've separate helper classes for each of them and they all execute in async mode.
I've got around 30k users out of which I usually send notification to the particular set of users ranging from 3k to 20k. The issue that I face is whenever I call that api, mysql performance just goes for a toss, particularly CPU. CPU utilisation goes around 100% for a very long period of around 30 mins
I've figured out workaround by doing following things and it's helping me in keeping things under control:
Using projection instead of domain object
Getting data in batch of 500 in each call
Implemented indexing based on the criteria that I need
No database calls from async methods of SMS, Email and Push
Thread.sleep(10 mins) between each subsequent fetch operation of data batches <== This is the dirty hack that's bothering me a lot
If I remove Thread.sleep() then everything goes haywire because batch server just calls async methods and then fires up db call to fetch next batch of 500 users in very quick successions till the time db server stops responding.
I need help with what I shall be doing in order to get rid of 5th point while keeping things under control? I'm running mysql on RDS with 300 IOPS and 4 GB RAM (db.t3.medium)

MySQL jobs stuck in sidekiq queue

My Rails application takes a JSON blob of ~500 entries (from an API endpoint), throws it into a sidekiq/ redis background queue. The background job parses the blob then loops through the entries to perform a basic Rails Model.find_or_initialize_by_field_and_field() and model.update_attributes().
If this job were in the foreground, it would take a matter of seconds (if that long). I'm seeing these jobs remain in the sidekiq queue for 8 hours. Obviously, something's not right.
I've recently re-tuned the MySQL database to use 75% of available RAM as the buffer_pool_size and divided that amongst 3 buffer pools. I originally thought that might be part of the deadlock but the load avg on the box is still well below any problematic status ( 5 CPU and a load of ~ 2.5 ) At this point, I'm not convinced the DB is the problem though, of course, I can't rule it out.
I'm sure, at this point, I need to scale back the sidekiq worker instances. In anticipation of the added load I increased the concurrency to 300 per worker (I have 2 active workers on different servers.) Under a relatively small amount of load there queues operate as expected; even the problematic jobs are completed in ~1 minute. Though, per the sidekiq documentation >50 concurrent workers is a bad idea. I wasn't having any stability issues at 150 workers per instance. The problem has been this newly introduced job that performs ~500 MySQL finds and updates.
If this were a database timeout issue, the background job should have failed and been moved from the active (busy) queue to the failed queue. That's not the case. They're just getting stuck in the queue.
What other either MySQL or Rails/ sidekiq tuning parameters should I be examining to ensure these jobs succeed, fail, or properly time out?

How to find source of Too Many Connections exception

I am getting intermittent "Too Many Connections" exceptions in my Django web-app. Having looked at other Stackoverflow questions regarding "Too many connections", it generally seems like it is an error in coding (ex. spawning a bunch of threads, etc.) that causes many sleeping connections.
I have used select * from information_schema.processlist order by host; to check for such connections and I can see maybe 1 or 2 at most but most sleeping connections lifespan are under 10s.
So, I am wondering 2 things:
Is there a (relatively easy) method for tracking down what in Django may be causing large number of connections?
If it is really a matter too many people accessing the site at a particular time, what is the standard method to scale the number of connections up?
I found the source of the problem: we have a RESTful API and someone was running a periodic script that spawned 700 or so threads to make "reading through the API faster".
However, more important than the cause (which is very specific) is the method for finding it. So I will provide the details in hopes it helps someone else if they encounter a similar situation.
Some further details about my setup:
In a cloud environment
Multiple Django web servers behind a loadbalancer
Steps for troubleshooting:
Use a monitoring system to alert to alert you if you go over the max connections. Failing that rig a script that polls the MySQL database and uses the following select statement to get info:
select substring_index(host, ':', 1), count(*) from information_schema.processlist group by substring_index(host, ':', 1);
From the polling/monitoring check to see if there are any patterns of the system going over at regular intervals. (ex. in my case I saw it go over the max every 5 minutes or so).
Check webserver (apache/nginx/etc.) or Django logs to see which URLs/views where being accessed at the time of an overage. (This was harder to determine in my case due to the load balancer causing only a few of the offending URL accesses to happen on each server but based on the time pattern I was eventually able to figure it out).
Have a friendly chat with the person causing this grief :)
As for the 2nd part of my original question, because I am in a hosted cloud environment the operators control the max number of connections and often scale it based on the DB instance size. I attempted to upsize once but due to the many threads, the more connections I had the more the process used up.

Producer Consumer setup: How to handle Database Connections?

I'm building my first single-producer/single-consumer app in which the consumer takes items off the queue and stores them in a MySQL database.
Previously, when it was a single thread app, I would open a connection to the DB, send the query, close the connection, and repeat every time new info came in.
With a producer-consumer setup, what is the better way to handle the DB connection? Should I open it once before starting the consumer loop (I can't see a problem with this, but I'm sure that one of you fine folks will point it out if there is one)? Or should I open and close the DB connection on each iteration of the loop (seems like a waste of time and resources)?
This software runs on approximately 30 small linux computers and all of them talk to the same database. I don't see 30 simultaneous connections being an issue, but I'd love to hear your thoughts.
Apologies if this has been covered, I couldn't find it anywhere. If it has, a link would be fantastic. Thanks!
EDIT FOR CLARITY
My main focus here is the speed of the consumer thread. The whole reason for switching from single- to multi-threaded was because the single-threaded version was missing incoming information because it was busy trying to connect to the database. Given that the producer thread is expected to start dumping info into the buffer at quite a high rate, and given that the buffer will be limited in size, it is very important that the consumer work through the buffer as quickly as possible while remaining stable.
Your MySQL shouldn't have any problems handling connections in the hundreds, if not thousands.
On each of your consumers you should set up a connection pool use that from your consumer. If you consume the messages in a single thread (per application) the pool only needs to use one connection but it's also fine to consume and start parallel threads that all use one connection.
The reason for using a connection pool is that it will handle re connection and keep alive for you. Just ask it for one connection and have it promise that it will work (it does this by running a small query against the database). If you don't use a connection for a while and it get's terminated the pool will just create a new one.

mySQL "Too many connections" error influenced by number of mongrel instances?

Recently I have started getting mySQL "too many connection" errors at times of high traffic. My rails app runs on a mongrel cluster with 2 instances on a shared host. Some recent changes that might be driving it:
Traffic to my site has increased. I
am now averaging about 4K pages a
day.
Database size has increased. My largest table has ~ 100K rows.
Some associations could return
several hundred instances in the
worst case, though most are far less.
I have added some features that
increased the number and size of
database calls in some actions.
I have done a code review to reduce database calls, optimize SQL queries, add missing indexes, and use :include for eager loading. However, many of my methods still make 5-10 separate SQL calls. Most of my actions have a response time of around 100ms, but one of my most common actions averages 300-400ms, and some actions randomly peak at over 1000ms.
The logs are of little help, as the errors seem to occur randomly, or at least the pattern does not appear related to the actions being called or data being accessed.
Could I alleviate the error by adding additional mongrel instances? Or are the mySQL connections limited by the server, and thus unrelated to the number of processes I divide my traffic across?
Is this most likely a problem with my coding, or should I be pressing my host for more capacity/less load on the shared server?
ActiveRecord has pooled database connections since Rails 2.2, and it's likely that that's what's causing your excess connections here. Try turning down the value of pool in your database.yml for that environment (it defaults to 5).
Docs can be found here.
Are you caching anything? It's an important part of alleviating application and database load. The Rails Guides have a section on caching.
Something is wrong. A Mongrel instance processes 1 request at a time so if you have 2 Mongrel instances then you should not be seeing more than 2 active MySQL connections (from the mongrels at least)
You could log or graph the output of SHOW STATUS LIKE 'Threads_connected' over time.
PS: this is not very many Mongrels. if you want to be able to service more than 2 simultaneous requests then you'll want more. ...if memory is tight, you can switch to Phusion Passenger and REE.