Reaching QueuePool limit overflow in FastAPI application - sqlalchemy

I have the following project on GitHub built with FastAPI, SQLAlchemy and PostgresDB.
And during the load tests via Locust I investigated that my server fails with the following error:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00
Here is a picture in Locust:
Why am I getting this requests as HTTPError('500 Server Error: Internal Server Error and not as timeout. How can I avoid this by turning this requests into timeout?
Obviously I can set pool_size and max_overflow options in engine constructor to greater values, but I don't think it's a good practice.
Is it possible to control number of session created?

Related

AWS Time Out Problems with Elastic Beanstalk App with DB Access

Hi When my Elastic Beanstalk (m5a.large Windows Server with deployed .net Core WebApi) comes under heavy load, the Status in the Health Page for my EC2 instances turns red, my Requests and the Healthcheck are timing out. That happens around 1-3 minutes after having a minimum of 10-20 Req/sec for a server.
I have to launch a lot of servers, so that each server gets a Request/Second count of 1-5 so they do not turn red.
In my logs I saw the following Errors:
Exception=MySql.Data.MySqlClient.MySqlException (0x80004005): Unable to connect to any of the specified MySQL hosts.
---> MySql.Data.MySqlClient.MySqlException (0x80004005): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
These Errors brought me to the topic Connection Pooling so i switched
using MySql.Data.MySqlClient;
to
using MySqlConnector;
Now these Errors do not come up anymore but the Problem remains.
The Monitoring Feature of EB and RDS do not state any obvious Problems. Running Queries in Mysql Workbench against the Database is fast as usual.
At the moment, my Database calls from the server are synchronous and not using the async feature of MysqlConnector.
Does the m5a.large cannot process more than 5 Request/Second?
Kind Regards

MySql errors while executing queries on a Django project on AWS Lambda function

I have a Django project hosted on AWS Lambda function.
This microservice uses pymysql to connect to AWS Aurora RDS database.
This service executes one and only one query over and over again.
1 in 200 times the query fails with EOF packet error.
In order to investigate this issue i have implemented a "repeater" which would repeat the same query if one fails (maximum 2 repeats with 0.25 seconds delay).
Once again, in a rare ocasion, a query has failed and I expected to see a successful query after the first reattempt. However it failed in all 3 consecutive calls with all DIFFERENT error messages.
Error messages (in order):
AssertionError: Protocol error, expecting EOF
django.db.utils.InternalError: Packet sequence number wrong - got 24 expected 1
django.db.utils.InterfaceError: (0, '')
These are errors from 3 separate queries executed against MySql Aurora RDS database. (I just wanted to ephesize that indeed it is not a stack trace but rather different query errors).
More useful info:
The microservice uses Django ORM to create queries.
The database is in Master-Slave configuration and those queries go against a Slave database.
The parameters observed in Master and Slave databases (such as CPU usage, free RAM space, various latencies, various throughputs, etc.) are completely normal and do not indicate any potential errors.
It is not a multithreaded environment.
Error stack traces:
Complete stack for *EOF error*:
https://pastebin.com/BracLTZX
Complete stack for *Packet sequence error*:
https://pastebin.com/fYmRGh69
Complete stack for *Interface error*:
https://pastebin.com/bstG1r2q

Azure database for MySQL DB 5.7 Transient handling in .net core

I am creating .net core 2.1 MVC application and using Azure database for MySQL DB 5.7.
I have read below links but seems they are applicable for MS SQL DB.
https://learn.microsoft.com/en-us/azure/mysql/concepts-high-availability
https://learn.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific
Transient handling for MySQL not possible? Help me link to MYSQL related similar pages.
A transient error, also known as a transient fault, is an error that will resolve itself. Most typically these errors manifest as a connection to the database server being dropped. Also new connections to a server can't be opened. Transient errors can occur for example when hardware or network failure happens.
Transient errors should be handled using retry logic. Situations that must be considered:
An error occurs when you try to open a connection
An idle connection is dropped on the server side. When you try to issue a command it can't be executed
An active connection that currently is executing a command is dropped.
The first and second case are fairly straight forward to handle. Try to open the connection again. When you succeed, the transient error has been mitigated by the system. You can use your Azure Database for MySQL again. We recommend having waits before retrying the connection. Back off if the initial retries fail. This way the system can use all resources available to overcome the error situation. A good pattern to follow is:
Wait for 5 seconds before your first retry.
For each following retry, the increase the wait exponentially, up to 60 seconds.
Set a max number of retries at which point your application considers the operation failed.
Read more here.
And you can read more on how to troubleshoot connection issues to Troubleshoot connection issues to Azure Database for MySQL here.

nodejs runtime stops responding when testing under stress

I'm basically checking all the routes via request module with mocha.
https://www.npmjs.com/package/request
I'm doing a stress test, by opening two console windows side by side and running them simultaneously. Most of the time tests are successful, but then an instant comes when the tests fail without timeout error, and from postman I've this specific route that stops responding.
it happens once in around 7 times, and I'm wondering what I could do to figure this out.
Edit:
Increased to 4 console windows running tests simultaneously, they ran fine couple of times but then start to timeout.
even no console output on app.get, app.post etc. routes.
Any suggestions?
Edit
Caught some request errors based on the suggestion within tests.
Uncaught AssertionError: { [Error: connect ECONNREFUSED]
code: 'ECONNREFUSED',
errno: 'ECONNREFUSED',
syscall: 'connect' } == null
The corresponding code for the above error is
request({url: endpoint + "/SignIn?emailAddress=" + emailAddress + "&password=" + password}, function (error, response, body) {
assert.equal(error, null);
Edit 2
Dig further deep with console statements and noticed the mysql connection callback was not called. Attaching a screenshot and noticing some connection limit, is it because of this? I'm using connection pools though.
logs says forcing close of threads.
Probable Answer:
This thread helped with the issue.
https://github.com/felixge/node-mysql/issues/405
I set the waitForConnections: false and then started to see the error ->
[Error: No connections available.]
so it seems to me that system was waiting for the connections but test runner didn't wait and ended up with timeout error.
It also seems there's some limit on the maximum number of connections, though I was calling release on connections after each query, not sure how this works on production systems out there? do we have a limit there?
You are running out of tcp connections. You need to make few changes in system and application level, to make it handle more load.
1. Change your connection setting to keepAlive, wherever possible.
2. On unix, you have ulimit, i.e., the maximum number of file handles that any process can hold at any instant. Remember, in unix every socket is also a file.
3. Manage your time out settings, based on the response time of your database server or another web server.
You'll have to do similar changes at each level of handling request, if you have a multi-tier architecture.

MySQL giving "read ECONNRESET" error after idle time on node.js server

I'm running a Node server connecting to MySQL via the node-mysql module. Connecting to and querying MySQL works great initially without any errors, however, the first query after leaving the Node server idle for a couple hours results in an error. The error is the familiar read ECONNRESET, coming from the depths of the node-mysql module.
A stack trace (note that the three entries of the trace belong to my app's error reporting code):
Error
at exports.Error.utils.createClass.init (D:\home\site\wwwroot\errors.js:180:16)
at new newclass (D:\home\site\wwwroot\utils.js:68:14)
at Query._callback (D:\home\site\wwwroot\db.js:281:21)
at Query.Sequence.end (D:\home\site\wwwroot\node_modules\mysql\lib\protocol\sequences\Sequence.js:78:24)
at Protocol.handleNetworkError (D:\home\site\wwwroot\node_modules\mysql\lib\protocol\Protocol.js:271:14)
at PoolConnection.Connection._handleNetworkError (D:\home\site\wwwroot\node_modules\mysql\lib\Connection.js:269:18)
at Socket.EventEmitter.emit (events.js:95:17)
at net.js:441:14
at process._tickCallback (node.js:415:13)
This error happens both on my cloud Node server and MySQL server as well as a local setup of both.
My questions:
Does this problem appear to be a disconnection of Node's connection to my MySQL server(s), perhaps due to a connection lifetime limitation?
When using connection pools, node-mysql is supposed to gracefully handle disconnections and prune them from the pool. Is it not aware of the disconnect until I make a query, thus making the error unavoidable?
Considering that I see the "read ECONNRESET" error a lot in other StackOverflow posts, should I be looking elsewhere from MySQL to diagnose the problem?
Update: After more browsing, I think my issue is a duplicate of this one. It appears his connection is disconnecting as well, but no one has suggested how to keep the connection alive or how to address the error outside of failing on the first query back.
I reached out to the node-mysql folks on their Github page and got some firm answers.
MySQL does indeed prune idle connections. There's a MySQL variable "wait_timeout" that sets the number of second before timeout and the default is 8 hours. We can set the default to be much larger than that. Use show variables like 'wait_timeout'; to view your timeout setting and set wait_timeout=28800; to change it.
According to this issue, node-mysql doesn't prune pool connections after these sorts of disconnections. The module developers recommended using a heartbeat to keep the connection alive such as calling SELECT 1; on an interval. They also recommended using the node-pool module and its idleTimeoutMillis option to automatically prune idle connections.
If this happens when establishing a single reused connection, it can be avoided by establishing a connection pool instead.
For example, if you're doing something like this...
var db = require('mysql')
.createConnection({...})
.connect(function(err){});
do this instead...
var db = require('mysql')
.createPool({...});
Does this problem appear to be a disconnection of Node's connection to my MySQL server(s), perhaps due to a connection lifetime limitation?
Yes. The server has closed its end of the connection.
When using connection pools, node-mysql is supposed to gracefully handle disconnections and prune them from the pool. Is it not aware of the disconnect until I make a query, thus making the error unavoidable?
Correct, but it should handle the error internally, not pass it back to you. This appears to be a bug in node-mysql. Report it.
Considering that I see the "read ECONNRESET" error a lot in other StackOverflow posts, should I be looking elsewhere from MySQL to diagnose the problem?
It is either a bug in the node-MySQL connection pool implementation, o else you haven't configured it properly to detect failures.
I have been also facing the same issue. Apparently it was happening because one of the backend process has been triggered on table which was being referred in my api.
This caused table to go in lock wait state and my query request got failed with connection reset. Though i'm wondering why i didn't receive lock wait error .