AWS: Too many connections - mysql

I have an RDS instance hosting a mySQL database. Instance size is db.t2.micro
I also have an ExpressJS backend connecting to the mySQL RDS instance via a connection pool:
Additionally i have a mobile app, the client, feeding off the ExpressJS API.
The issue i'm facing is, either via the mobile app or via Postman, there are times where i get a 'Too many connections' error and therefore several requests fail:
On the RDS instance. On current activity i sometimes get 65 connections, showing it's reaching the limit. What i need clarity on is:
When 200 mobile app instances connect to the API, to the RDS instance, does it register as 200 connections or 1 connection from ExpressJS?
Is it normal to be reaching the RDS instance 65 connection limit?
Is this just a matter of me using db.t2.micro instance size which is not recommended for prod? Will upgrading the instance size resolve this issue?
Is there something i'm doing wrong with my requests?
Thank you and your feedback is appreciated.

If your app creates a connection pool of 100, that's the number of database connections it will try to open. It must be lower than your MySQL connection limit.
Typically connection pools open all the connections for the pool, so they are ready when a client calls the http API. The connections might normally be running no SQL queries, if there are not many clients using the API at a given moment. The database connections are nevertheless connected.
Sort of like when you ssh to a remote linux server but you just sit there at a shell prompt for a while before running any command. You're still connected.
You asked if a db.t2.micro instance was not recommended for production. Yes, I would agree with that. It's tempting to use the smallest instance possible to save money, but a db.t2.micro is too small for anything but light testing, in my opinion.
In fact, I would not use any t2 instance for production, regardless of size. The t2 type uses "burstable" performance. This means it can provide only brief periods of good performance. Once the instance depletes its performance credits, they recharge slowly, and while they recharge, the performance of that instance is very low. This is okay for testing, but not for production, if you expect to provide consistent performance at any time.

Related

MariaDB. connection re-use

i have a database that thousands of users need to connect to (via ODBC) for very brief periods (it's a subscription licensing database for a win32 desktop app). They connect, get their approval to run and disconnect).
max_connections is set to 1000 but am not seeing the re-use i would expect server side. i.e. server currently has about 800 processes/connections sleeping (and another 200 connected to real data in other databases on the same server) .... yet a new attempt by a client app was rejected 'too many connections'.
What am i missing?
have increased the max_connections for now to 1500 but if that just means another 500 sleeping connections it's not a long term solution. pretty sure clients are disconnecting properly but am adding some diagnostics to the win32 app just in case.
MariaDB 10.3.11
with MySQL ODBC 5.3 ANSI Driver
It's normal to see a lot of sessions "Sleeping". That means the client is connected, but not executing a query at this moment. The client is likely doing other tasks, before or after running an SQL query. Just like if you are logged into a server with ssh, most of the time you're just sitting at the shell prompt not running any program.
It's up to you to design your clients to wait to connect until they need data, then disconnect promptly after getting their data. It's pretty common in apps that they connect to the database at startup, and remain connected. It's also pretty common in some frameworks to make multiple connections at startup, and treat them as a pool that can be used by multiple threads of the client app. It's your app, so you should configure this as needed.
Another thing to try is to enable the thread pool in the MariaDB server. See https://mariadb.com/kb/en/thread-pool-in-mariadb/
This is different from a client-side connection pool. The thread pool allows many thousands of clients to think they're connected, without allocating a full-blown thread in the MariaDB server for every single connection. When a client has something to query, at that time it is given one of the threads. When that client is done, it may continue to maintain a connection, but the thread in the MariaDB server is reallocated to a different client's request.
This is good for "bursty" workloads by many clients, and it sounds like your case might be a good candidate.

AWS RDS connection from external client extremely slow

I am currently connecting to an RDS instance (MariaDB) without an issue from within the configured VPC.
I am also connecting to the RDS instance from local clients (external to the VPC) with no connectivity issues but have serious issues with SQL execution speeds. It can take up to 20 times longer to execute a query remotely vs locally (an EC2 on same VPC).
I have the Security Group for the RDS instance setup to allow the external IPs as incoming rules and the RDS instance is listening on a non-default port (not 3306).
I cannot think of anything I should be doing differently on the network side of things and I have set skip-name-resolve=1, yet the speed is ridiculous.
It has no preference in terms of what the SQL query may be (SELECT, UPDATE, DELETE), they all execute slow.
Server RDS is MariaDb 10.1.19 on a db.t2.medium instance.
Client connection is via MySQL .NET Connector and connection string:
Server=<ip>;Port=<port>;Database=<dbname>;User ID=<dbuser>;Pooling=true;CharSet=utf8;Password=<dbpass>
Client has no connectivity or speed issues when DB in not an RDS (local MySQL).
I have seen various network related issues popping up now and then (connection stream dropped) but nothing serious apart from that, just very slow.
Any pointers on how to at least determine where the problem is?
The scenario I am trying to achieve (with acceptable speeds) is described here (albeit vague in their instructions):
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.Scenarios.html#USER_VPC.Scenario4

Amazon Aurora DB Cluster Not Auto Balancing Correctly

I have created an Amazon Aurora Database cluster runing MySQL with three instances: the main instance that backs the cluster and two read replicas for balancing. However, the cluster does not seem to be balancing the reads at all. I have one replica managing 700+ Selects/sec maximizing the CPU at 99.75% or higher while the other replica is doing virtually nothing with a CPU usage of 4% at 1 select per second, if that. The main cluster instance itself is at 33% CPU usage as it is being written to simultaneously while the replicas should are being read from. The lag time between the replicas is under 20 milliseconds. My application is querying the read only endpoint of the cluster but no balancing appears to be happening. Does anyone have any insight into why this may be happening or why the replica is at such a high CPU usage? The queries being ran against it are not complex by any means.
Aurora Cluster endpoints are DNS records and they only do DNS round robin during resolution. This means that when your client application opens connections to a cluster endpoint, you end up resolving the endpoint to different instances (different IPs basically), there by striping your connections across multiple replicas. Past that point, there is no load balancing. Connections are striped across instances, and queries run on each of those connections go to the corresponding instance backing it.
Now consider the scenario where your connection pool was already created to the cluster endpoint when you have one instance behind it. Now, if you add more instances, there will be no impact to your application, unless you terminate your connection and reestablish them. You would do a DNS round robin again, and this time some of your connections would land on the new instance that you provisioned.
Few callouts:
In Aurora, you have 2 cluster endpoints. One (RW) endpoint always points to the current writer and one (RO) does the DNS round robin between your read replicas.
Also, DNS propagation might take a few seconds when failovers happen, so that occasional errors are quite natural when failovers occur.
Hope this helps.
We've implemented a driver to try to mitigate this problem, with some visible gains: https://github.com/DiceTechnology/dice-fairlink
It regularly discovers the read-replicas to catch up with cluster changes and round-robins connections among them.
Despite not measuring any CPU utilisation, we've observed a better load distribution than with the native DNS based round-robin of the cluster reader endpoint
The Aurora's DNS based load balancing works at the connection level (not the individual query level). You must keep resolving the endpoint without caching DNS to get a different instance IP on each resolution. If you only resolve the endpoint once and then keep the connection in your pool, every query on that connection goes to the same instance. If you cache DNS, you receive the same instance IP each time you resolve the endpoint.
Unless you use a smart database driver, you depend on DNS record updates and DNS propagation for failovers, instance scaling, and load balancing across Aurora Replicas. Currently, Aurora DNS zones use a short Time-To-Live (TTL) of 5 seconds. Ensure that your network and client configurations don’t further increase the DNS cache TTL. Remember that DNS caching can occur anywhere from your network layer, through the operating system, to the application container. For example, Java virtual machines (JVMs) are notorious for caching DNS indefinitely unless configured otherwise. Here are AWS documentation and Aurora whitepaper on configuring DNS cache ttl.
My guess is that you are not connecting to the cluster endpoint.
Load Balancing – Connecting to the cluster endpoint allows Aurora to load-balance connections across the replicas in the DB cluster. This helps to spread the read workload around and can lead to better performance and more equitable use of the resources available to each replica. In the event of a failover, if the replica that you are connected to is promoted to the primary instance, the connection will be dropped. You can then reconnect to the reader endpoint in order to send your read queries to the other replicas in the cluster.
New Reader Endpoint for Amazon Aurora – Load Balancing & Higher Availability
[EDIT]
To load balance within a single application, you will need to reconnect to the endpoint. If you use the same connection for all queries only one replica will be responding. However, opening connections is expensive so this might not provide much benefit unless your queries take some time to run.

Activerecord Connection Pool Optimization for Highly Cached System

I have a rails activerecord project that has been scaled out to serve approximately 60-100k requests per minute. We use AWS and it takes about 5 xlarge c4 ec2 instances to serve this many requests. We have optimized the system to serve 99.99% of those requests off of a redis cache rendering our mysql DB barely used.
This is great and all but we keep running into connection limits for mysql. Amazon RDS apparently limits the number of connections we can have and it seems silly for upping our RDS instance size just so that we can have a larger number of sleeping connections. We literally profiled the RDS server and it maybe gets 10-50 queries a day depending on how many times we update the system.
Is there any way to keep the activerecord connection pool from reserving connections?
We tried simply lowering the connection pool and it helped in lowering connections, but then we started getting:
(ActiveRecord::ConnectionTimeoutError) "could not obtain a database connection within 5 seconds
What I would like to achieve is for the rails project to stop trying to pre-allocate connections and only open then up when they are necessary. I'm not well-versed enough in the rails framework and activerecord to understand how the system is reserving connections and why we are getting ConnectionTimeoutErrors even though the application isn't even making any DB calls.

Number and capacity of MySQL connections

I know it is very basic. But want to clear some concept of mysql connections. I have following scenario.
Db server and web servers are on different locations.
Web server is running an article based web site.
Articles data is stored in db server.
Web server is delivering 100 articles/pages per second.
My questions are as follows:
Single connection between web server and db server can handle it ?
How many connections would be created by default ?
If I suppose connections as pipes, what is i/o capacity of each connection ?
Is there any relationship between db server's RAM, processor, OS and number of connections ?
Thanks in advance
Single connection between web server and db server can handle it?
A single connection can pass several requests, but not at the same time, and only for a single client process. So the best scaling approach might be one connection pool per web server process, where threads can obtain an established connection to perform their request, and return the connection to the pool once they are done.
How many connections would be created by default ?
Depends on the client language, the MySQL connector implementation, the frameworks you use, the web server configuration, and probably a bunch of other details like this. So your best bet is to simply look at the list of network connections to the MySQL service, e.g. using lsof or netstat.
If I suppose connections as pipes, what is i/o capacity of each connection ?
The limiting factor will probably be shared resources, like the network bandwidth or processing capabilities at either end. So the number of connections shouldn't have a large impact on the data transfer rate. The overhead to establish a connection is significant, though, which is why I suggested reducing the number of connections using pooling.
Is there any relationship between db server's RAM, processor, OS and number of connections ?
Might be, if some application makes choices based on these parameters, but in general I'd consider this rather unlikely.