Lambda times out without error when querying RDS

Lambda times out without error when querying RDS - mysql

I have a lambda function that runs successfully the first time I run it (after a new deploy), but times out each subsequent time I try to run it. It definitely has something to do with RDS (MySql) because if I take the query call out, it works normally. I can console.log whatever I like out to CloudWatch, either before or after the query to RDS and all the logs make it into CloudWatch, but no errors are thrown. It's clearly executing the entire lambda, just not returning. It's also not the lambda callback code, because, again, I can simply remove the RDS query and I get a response without a hiccup.

Turns out, it's because I'm not shutting down the RDS connection. Apparently, lambda will not respond as long as an RDS connection remains open. Of course this makes sense, since you could potentially end up with an unbounded number of connections open (depending on the load of your lambda function). Unfortunately, lambda has no connection pooling feature for RDS. :( Perhaps this won't be an issue when Aurora Serverless is released?

Related

Will AWS Lambda automatically close MySQL connections?

If we don't close the MySQL connection at the end of the handler function in lambda-- will the MySQL connection close automatically when lambda dies and re-connect at the cold-start?

The connections won't be closed immediately but eventually they will. By default, the connection timeouts are 8 hour on MySQL and maximum connections are also capped at 66.
show variables like "wait_timeout"; -- 28800
show variables like "max_connections"; -- 66
When you create a connection to MySQL server, it would create a Thread on the MySQL server to serve this connection.
show status where variable_name = 'threads_connected';
select * from information_schema.processlist;
After a Lambda executes a request and sends a response, the Lambda execution environment is not removed immediately and the same one may be used to serve other requests. This is your Warm/Hot Lambda and in this case an active MySQL connection would be really good for your function execution and this is possible only when you did not close the connection in the previous invocation. Eventually, when there are no more requests, this Lambda execution environment can be shutdown and the resources are returned to the pool of AWS compute resources. When the Lambda execution environment shuts down, the TCP connection to the MySQL server from the Lambda will also terminate. Now the MySQL server can remove the thread associated with the Lambda and in essence would reduce the pool of active connections on the server. This also takes a bit of time. So if you are getting a lot of requests concurrently and if the maximum connections are already active, then the request would start failing.
I did some test to see how long does it really take to reclaim the connections and here is the snapshot. The X axis is in minutes and the Y axis is on the scale of 0-70 where each line parallel to X-Axis is 10 units away from each other.
It roughly takes 10-15 minutes to reclaim the connections. But again it depends on the Lambda usage pattern as well.
So should you close the connection on every invocation? Well, it depends!
Take a look at Lambda Runtime extensions and see if you can use the shutdown hook to close connection. If you can, then it would mean while the Lambda execution environment was serving multiple requests, you used a cached connection and just before your Lambda execution environment is taken away from you, you closed the connection.
Lambda RDS Proxy is also an alternative as mentioned above, but it is not free. Before you take the RDS Proxy route, do consider using another Serverless solution like AWS Fargate. In this case probably you would use a connection pool just like any long running server side application.

No, they will not be closed automatically unless you are doing something with your mysql client that implicit closes the connection when it goes out of scope.
The connection will stay open until it times out. There has been many people who reported problems in the past with poorly written Lambdas creating tons of open sessions/connections to relational databases because the connections were not properly closed and they had to wait to be timed out.
One feature that came out a year or so ago was RDS Proxies which are sort of an intermediary between clients and the MySQL server that implements connection pooling. This solves the problem with Lambdas not being able to effectively use connection pooling since RDS Proxies service can do that for serverless clients.

Azure database for MySQL DB 5.7 Transient handling in .net core

I am creating .net core 2.1 MVC application and using Azure database for MySQL DB 5.7.
I have read below links but seems they are applicable for MS SQL DB.
https://learn.microsoft.com/en-us/azure/mysql/concepts-high-availability
https://learn.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific
Transient handling for MySQL not possible? Help me link to MYSQL related similar pages.

A transient error, also known as a transient fault, is an error that will resolve itself. Most typically these errors manifest as a connection to the database server being dropped. Also new connections to a server can't be opened. Transient errors can occur for example when hardware or network failure happens.
Transient errors should be handled using retry logic. Situations that must be considered:
An error occurs when you try to open a connection
An idle connection is dropped on the server side. When you try to issue a command it can't be executed
An active connection that currently is executing a command is dropped.
The first and second case are fairly straight forward to handle. Try to open the connection again. When you succeed, the transient error has been mitigated by the system. You can use your Azure Database for MySQL again. We recommend having waits before retrying the connection. Back off if the initial retries fail. This way the system can use all resources available to overcome the error situation. A good pattern to follow is:
Wait for 5 seconds before your first retry.
For each following retry, the increase the wait exponentially, up to 60 seconds.
Set a max number of retries at which point your application considers the operation failed.
Read more here.
And you can read more on how to troubleshoot connection issues to Troubleshoot connection issues to Azure Database for MySQL here.

After Aurora Cluster DB failover, unable to write to DB

Right now I am connecting to a cluster endpoint that I have set up for an Aurora DB-MySQL compatible cluster, and after I do a "failover" from the AWS console, my web application is unable to properly connect to the DB that should be writable.
My setup is like this:
Java Web App (tomcat8) with HikariCP as the connection pool, with ConnecterJ as the driver for MySQL. I am evaluating Aurora-MySQL to see if it will satisfy some of the needs the application has. The web app sits in an EC2 instance that is in the same VPC and SG as the Aurora-MySQL cluster. I am connecting through the cluster endpoint to get to the database.
After a failover, I would expect HikariCP to break connections (it does), and then attempt to reconnect (it does), however, the application must be connecting to the wrong server, because anytime a write is hit to the database, a SQL Exception is thrown that says:
The MySQL server is running with the --read-only option so it cannot execute this statement
What is the solution here? Should I rework my code to flush DNS after all connections go down, or after I start receiving this error, and then try to re-initiate connections after that? That doesn't seem right...

I don't know why I keep asking questions if I just answer them (I should really be more patient), but here's an answer in case anyone stumbles upon this in a Google search:
RDS uses DNS changes when working with the cluster endpoint to make it looks "seamless". Since the IP behind the hostname can change, if there is any sort of caching going on, then you can see pretty quickly how a change won't be reflected. Here's a page from AWS' docs that go into it a bit more: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html
To resolve my issue, I went into the jvm's security file and then changed it to be 0 just to verify if what was happening was correct. Seems correct. Now I just need to figure out how to do it properly...

AWS Aurora: The MySQL server is running with the --read-only option so it cannot execute this statement

I am getting this error when executing a GRANT statement on my Aurora DB instance in AWS:
The MySQL server is running with the --read-only option so it cannot execute this statement
My user is not read-only though, so why is this happening?

It turned out to be a silly mistake, but posting it anyway in case anyone else has the problem:
I was accessing the replica instance by mistake - I had copied the endpoint for the replica, and it is read-only apparently. So if you have this problem, verify that you are connecting to the Primary Instance or best of all the DB Cluster endpoint.
Edit: According to #Justin's answer we definitely should use DB Cluster:
You need to connect to the cluster, rather than an instance. This is because instances seem to take a turn to be the readers and writers.

You need to connect to the cluster, rather than an instance. This is because instances seem to take a turn to be the readers and writers.

In my case, I was receiving this error after performing a Blue/Green failover in a Test environment. I was trying to access the Blue database, in order to confirm the process for reverting back to Blue database should that be required later.
Accessing Blue via the cluster address yielded this error, as did attempting to use the direct links to the Blue "reader" and "writer" instances. In the end, I performed a failover of the Blue "reader" and "writer" instances, after which the cluster address was in a working state again.
tl;dr
Try a failover of the "writer".

Detect when DB server goes down during JDBC query

My application makes queries to MySQL using JDBC. Sometimes, while a query is running, connectivity will be lost to the server. Rather than detecting this and throwing an exception, the code hangs until the TCP connection finally times out (which takes over 10 minutes)
Setting a query timeout doesn't work. If the DB server stays up this will timeout queries, but does nothing if the server goes down before the timeout triggers.
Setting socketTimeout in the MySQL connection string or invoking .withNetworkTimeout on the Connection object sort of works. This does force the connection to timeout if no response is received after the specified timeout, however it will also kill queries that run longer than the timeout even if the DB server is up. I want to die fast if the DB server goes down but still be able to run long queries.
If I could get at the sockets Keepalive settings so I could set the interval/number of probes lower and that would solve the problem, but I can't see anyway to do that with the MySQL JDBC driver.
How can I cause queries to fail quickly when the DB server goes down, while still being able to run long queries?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008