I have 2 Virtual Machines on Azure in the same Virtual Network.
One virtual machine runs a NodeJs process which is responsible for MySQL operations.
Other virtual machine runs a MySQL instance. I can connect to it from the other VM and from the NodeJs process fine.
Sometimes it will fail and throw an error about Connection timeout when acquiring a connection from the pool.
My connection string uses a local IP address from within the Virtual Network to access the database so it should have this much delay to exceed a 10 second timeout. When it works it's rapid, I mean really fast! But sometimes it just breaks and randomly starts working again. Anyone ever come across this?
If it's any help this is a MySQL instance based on Ubuntu Server 15.10.
Exception:
{
"error": {
"name": "Error",
"status": 500,
"message": "connect ETIMEDOUT",
"errorno": "ETIMEDOUT",
"code": "ETIMEDOUT",
"syscall": "connect",
"fatal": true,
"stack": "Error: connect ETIMEDOUT
at PoolConnection.Connection._handleConnectTimeout (projectdir/node_modules/loopback-connector-mysql/node_modules/mysql/lib/Connection.js:375:13)
at Socket.g (events.js:180:16)
at Socket.EventEmitter.emit (events.js:92:17)
at Socket._onTimeout (net.js:327:8)
at Timer.unrefTimeout [as ontimeout] (timers.js:412:13)
--------------------
at Protocol._enqueue (projectdir/node_modules/loopback-connector-mysql/node_modules/mysql/lib/protocol/Protocol.js:135:48)
at Protocol.handshake (projectdir/node_modules/loopback-connector-mysql/node_modules/mysql/lib/protocol/Protocol.js:52:41)
at PoolConnection.connect (projectdir/node_modules/loopback-connector-mysql/node_modules/mysql/lib/Connection.js:123:18)
at Pool.getConnection (projectdir/node_modules/loopback-connector-mysql/node_modules/mysql/lib/Pool.js:45:23)
at MySQL.executeSQL (projectdir/node_modules/loopback-connector-mysql/lib/mysql.js:200:12)
at projectdir/node_modules/loopback-connector-mysql/node_modules/loopback-connector/lib/sql.js:408:10
at projectdir/node_modules/loopback-datasource-juggler/lib/observer.js:175:9
at doNotify (projectdir/node_modules/loopback-datasource-juggler/lib/observer.js:93:49)
at MySQL.ObserverMixin._notifyBaseObservers (projectdir/node_modules/loopback-datasource-juggler/lib/observer.js:116:5)
at MySQL.ObserverMixin.notifyObserversOf (projectdir/node_modules/loopback-datasource-juggler/lib/observer.js:91:8)"
}
}
Per my experience, there are 2 situations may raise your issue.
The connections reaches the max_connection number of MySQL server, and there are no available connections for a new connection client. In this situation, you may check your code, whether you release the connection after MySQL operations.
On the other hand, when you get the timeout exception, you may login on Azure manage portal, in monitor page of your VM portal, to check whether the metrics reach the bottleneck of the VM which will also occur your issue. In this situation, you can scale on your VM to enlarge your VM hardwares.
The Node.js mysql module has a few options.
One of these options is the 'connectTimeout', which defaults to 10000ms (which is roughly 10 seconds).
If nothing is done within those 10 seconds, the connection closes automatically.
What the solution could be for you problem, is using pooling connections.
With this, you create a connection pool. Everytime a query needs to be executed, it takes a connection from the pool uses it and when it expires it automatically returns to the pool, ready to be restarted, thus no more connection timeout errors.
I have a process running all the time and when it idles for a while the connections in the pool are in a sleeping state. Eventually MySQL purges those connections based on the wait_timeout setting in my.cnf. Once this happens and I try to use a connection it will fail because the module assumes the connection is still live and tries to use it only to get a timeout or connection exception.
To prevent this you can either overwrite the mysql module code to support "connection lifetime" in the pool or stop using the pool and manage your own connections.
Related
I have aplications that connect to a remote server (MySQL 5.5 on Windows Server 2012), at first I started receiving "too many connections" message which I solved by increasing MAX_CONNECTION value in my.inf to 500, then I start getting "can't create new thread" message so I decrease decrease timeouts to avoid idle connections using a socket, which didn't completely work. Now I get odd messages like 'file not found', as soon as I restart the service I stop getting the messages and everything works correctly.
The problem occurs when the server reaches around 170 connections at the same time.
Is there some configuration I'm missing?, I really don't know what info you need to give me a hint to fix this. I mean, there are servers that accept a lot morw of connections at the same time, right? waht I'm missing.
RAM and CPU of the system dosen't reach 35-40% at max connections (170).
Edit: Error occur at 2 'places', when running a query or at the attempt of conennection, it's like the MySQL service rejects the attempt. VB6 is the language used in the client app (ODBC connector). The app opens, executes and closes the connection.
Note: I have full control over client app and server config.
If there are many requests of db server at the same time saying that the QPS is 100, and the DB server has a connection limit saing 1000, so if the requests are slow queries which will eventually got inactivity timeout, at this time what shoud i do to prevent the npm package mysql from creating new connection?
Because the npm package mysql will remove the connection object from the connection object pool with fatal error like inactivity timeout and leave space for creating new connection.
For high load, you should use connection pools with persistent connections. Those are usually available in hight level query builders and ORMs like knex and sequelize.
But if you don't want use them, you can also try native pools.
I have a Java EE application running in GlassFish on EC2, with a MySQL database on Amazon RDS.
I am trying to configure the JDBC connection pool to in order to minimize downtime in case of database failover.
My current configuration isn't working correctly during a Multi-AZ failover, as the standby database instance appears to be available in a couple of minutes (according to the AWS console) while my GlassFish instance remains stuck for a long time (about 15 minutes) before resuming work.
The connection pool is configured like this:
asadmin create-jdbc-connection-pool --restype javax.sql.ConnectionPoolDataSource \
--datasourceclassname com.mysql.jdbc.jdbc2.optional.MysqlConnectionPoolDataSource \
--isconnectvalidatereq=true --validateatmostonceperiod=60 --validationmethod=auto-commit \
--property user=$DBUSER:password=$DBPASS:databaseName=$DBNAME:serverName=$DBHOST:port=$DBPORT \
MyPool
If I use a Single-AZ db.m1.small instance and reboot the database from the console, GlassFish will invalidate the broken connections, throw some exceptions and then reconnect as soon the database is available. In this setup I get less than 1 minute of downtime.
If I use a Multi-AZ db.m1.small instance and reboot with failover from the AWS console, I see no exception at all. The server halts completely, with all incoming requests timing out. After 15 minutes I finally get this:
Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.3.2.v20111125-r10461): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 940,715 milliseconds ago. The last packet sent successfully to the server was 935,598 milliseconds ago.
It appears as if each HTTP thread gets blocked on an invalid connection without getting an exception and so there's no chance to perform connection validation.
Downtime in the Multi-AZ case is always between 15-16 minutes, so it looks like a timeout of some sort but I was unable to change it.
Things I have tried without success:
connection leak timeout/reclaim
statement leak timeout/reclaim
statement timeout
using a different validation method
using MysqlDataSource instead of MysqlConnectionPoolDataSource
How can I set a timeout on stuck queries so that connections in the pool are reused, validated and replaced?
Or how can I let GlassFish detect a database failover?
As I commented before, it is because the sockets that are open and connected to the database don't realize the connection has been lost, so they stayed connected until the OS socket timeout is triggered, which I read might be usually in about 30 minutes.
To solve the issue you need to override the socket Timeout in your JDBC Connection String or in the JDNI COnnection Configuration/Properties to define the socketTimeout param to a smaller time.
Keep in mind that any connection longer than the value defined will be killed, even if it is being used (I haven't been able to confirm this, is what I read).
The other two parameters I mention in my comment are connectTimeout and autoReconnect.
Here's my JDBC Connection String:
jdbc:(...)&connectTimeout=15000&socketTimeout=60000&autoReconnect=true
I also disabled Java's DNS cache by doing
java.security.Security.setProperty("networkaddress.cache.ttl" , "0");
java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "0");
I do this because Java doesn't honor the TTL's, and when the failover takes place, the DNS is the same but the IP changes.
Since you are using an Application Server, the parameters to disable DNS cache must be passed to the JVM when starting the glassfish with -Dnet and not the application itself.
I'm running a Node server connecting to MySQL via the node-mysql module. Connecting to and querying MySQL works great initially without any errors, however, the first query after leaving the Node server idle for a couple hours results in an error. The error is the familiar read ECONNRESET, coming from the depths of the node-mysql module.
A stack trace (note that the three entries of the trace belong to my app's error reporting code):
Error
at exports.Error.utils.createClass.init (D:\home\site\wwwroot\errors.js:180:16)
at new newclass (D:\home\site\wwwroot\utils.js:68:14)
at Query._callback (D:\home\site\wwwroot\db.js:281:21)
at Query.Sequence.end (D:\home\site\wwwroot\node_modules\mysql\lib\protocol\sequences\Sequence.js:78:24)
at Protocol.handleNetworkError (D:\home\site\wwwroot\node_modules\mysql\lib\protocol\Protocol.js:271:14)
at PoolConnection.Connection._handleNetworkError (D:\home\site\wwwroot\node_modules\mysql\lib\Connection.js:269:18)
at Socket.EventEmitter.emit (events.js:95:17)
at net.js:441:14
at process._tickCallback (node.js:415:13)
This error happens both on my cloud Node server and MySQL server as well as a local setup of both.
My questions:
Does this problem appear to be a disconnection of Node's connection to my MySQL server(s), perhaps due to a connection lifetime limitation?
When using connection pools, node-mysql is supposed to gracefully handle disconnections and prune them from the pool. Is it not aware of the disconnect until I make a query, thus making the error unavoidable?
Considering that I see the "read ECONNRESET" error a lot in other StackOverflow posts, should I be looking elsewhere from MySQL to diagnose the problem?
Update: After more browsing, I think my issue is a duplicate of this one. It appears his connection is disconnecting as well, but no one has suggested how to keep the connection alive or how to address the error outside of failing on the first query back.
I reached out to the node-mysql folks on their Github page and got some firm answers.
MySQL does indeed prune idle connections. There's a MySQL variable "wait_timeout" that sets the number of second before timeout and the default is 8 hours. We can set the default to be much larger than that. Use show variables like 'wait_timeout'; to view your timeout setting and set wait_timeout=28800; to change it.
According to this issue, node-mysql doesn't prune pool connections after these sorts of disconnections. The module developers recommended using a heartbeat to keep the connection alive such as calling SELECT 1; on an interval. They also recommended using the node-pool module and its idleTimeoutMillis option to automatically prune idle connections.
If this happens when establishing a single reused connection, it can be avoided by establishing a connection pool instead.
For example, if you're doing something like this...
var db = require('mysql')
.createConnection({...})
.connect(function(err){});
do this instead...
var db = require('mysql')
.createPool({...});
Does this problem appear to be a disconnection of Node's connection to my MySQL server(s), perhaps due to a connection lifetime limitation?
Yes. The server has closed its end of the connection.
When using connection pools, node-mysql is supposed to gracefully handle disconnections and prune them from the pool. Is it not aware of the disconnect until I make a query, thus making the error unavoidable?
Correct, but it should handle the error internally, not pass it back to you. This appears to be a bug in node-mysql. Report it.
Considering that I see the "read ECONNRESET" error a lot in other StackOverflow posts, should I be looking elsewhere from MySQL to diagnose the problem?
It is either a bug in the node-MySQL connection pool implementation, o else you haven't configured it properly to detect failures.
I have been also facing the same issue. Apparently it was happening because one of the backend process has been triggered on table which was being referred in my api.
This caused table to go in lock wait state and my query request got failed with connection reset. Though i'm wondering why i didn't receive lock wait error .
We are experiencing the same error as this StackOverflow Q ...
System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)
at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at System.Data.Linq.SqlClient.SqlConnectionManager.UseConnection(IConnectionUser user)
at System.Data.Linq.SqlClient.SqlProvider.get_IsSqlCe()
at System.Data.Linq.SqlClient.SqlProvider.InitializeProviderMode()
at System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Execute(Expression query)
... except that in the referenced StackOverflow Q, they need to restart SQL Server once the error occurs - and we do not. We'll get this error once a day, or once every few days - and all is fine after the error occurs, until the next time it occurs.
This makes us think it's not a "forgot to close connections" issue. We have a moderately busy ASP.NET 4.0 WebForms / SQL Server 2008 R2 app; but we're quite positive we're not exceeding the max # of database connections.
Any thoughts on this problem, or an approach to diagnose?
Thought I would comment on our progress with this.
While none of the SQL Server documentation/articles/blogs mention that this error can be caused by server busyness, I found a forum posting where some seasoned IT pro named Matt Neerincx states that it can be, as follows:
Possible reasons for this error include:
1. Poor network link from client to server.
2. Server is very busy (meaning high CPU) and cannot respond to new connection attempts.
3. Server is running out of memory (so high memory usage for SQL).
4. tcp-ip layer on client is over-saturated with connection attempts so tcp-ip layer rejects the connection.
5. tcp-ip layer on server side is over-staturated with connection attempts and so tcp-ip layer is rejecting new connections.
6. With SQL 2005 SP2 and later there could be a custom login trigger that rejects your connection.
You can increase the connect timeout to potentially alleviate issues #2, #3, #4, #5. Setting a longer connect timeout means the driver will try longer to connect and may eventually succeed.
To determine the root cause of these intermittent failures is not super easy to do unfortunately. What I normally do is start by examining the server environment, is the server constantly running in high CPU for example, this points to #2. Is the server using a hugh amount of memory, this points to #3. You can run SQL Profiler to monitor logins and look for patterns of logins, perhaps every morning at 9AM there is a flurry of connections etc...
So we are presently walking down this path - reducing the # of queries that execute at the same time in some of our batch queries, optimizing some of our queries, etc.
Also, in our app connection string, we increased the connection timeout, and set Min Pool Size to 20 (thinking it's good to try to ensure some existing, unused connections for the app to grab, rather than needing to establish a new connection).
At this moment, it's been almost 48 hours without receiving the error; making us very hopeful.