Tomcat 8.5 Connection Pool not reconnecting after DB failover - mysql

I have an application using Tomcat 8.5 connection pool, Java 8, and Multi-AZ AWS RDS MySQL database. In the last years, we had a couple of database issues that lead to failover. When the failover occurred, the pool was always able to detect the connection was closed (No operations allowed after connection closed) and reconnect correctly a minute later when the backup node is up.
Some days ago we had a failover that didn't follow this rule. Because of a hardware database issue, the database was unavailable and a failover took place. Then, when the backup node was up a couple of minutes later, we could connect correctly to the database from our desktop MySQL client.
Even several minutes after the failover took place and connectivity to database was recovered, the application showed logs hundreds of exceptions like:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed
...
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
...
The last packet successfully received from the server was 20,017 milliseconds ago. The last packet sent successfully to the server was 20,016 milliseconds ago
...
Caused by: java.net.SocketTimeoutException: Read timed out
...
The application couldn't reconnect until we restarted the Tomcat servers.
Our pool is configured this way:
initialSize = 5
maxActive = 16
minIdle = 5
maxIdle = 8
maxWait = 10000
maxAge = 600000
timeBetweenEvictionRunsMillis = 5000
minEvictableIdleTimeMillis = 60000
validationQuery = "SELECT 1"
validationQueryTimeout = 3
validationInterval = 15000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
jdbcInterceptors = "ConnectionState;StatementCache(max=200)"
defaultTransactionIsolation = java.sql.Connection.TRANSACTION_READ_COMMITTED
And the JDBC connection URL has these parameters:
autoreconnect=true&socketTimeout=20000
Under my understanding, the validationQuery should have failed and the connection discarded, so a new correct connection should have created. Also, according to maxAge after 10 minutes all connections should have been discarded and new ones created.
The pool couldn't be recovered even after 20 minutes. As said, we had to restart the Tomcat servers.
Is there any explanation why the pool has always recovered correctly from a failover, but in this case, it couldn't?

Try to add ENABLE=Broken in your connection string.
For example :
jdbc:oracle:thin:#(DESCRIPTION=(ENABLE=BROKEN)(ADDRESS=(PROTOCOL=tcp)(PORT=)(HOST=))(CONNECT_DATA=(SID=)))

I ended up adding an AWS RDS Proxy that resolves this issue.
I have been provoking DB Failovers for an hour and everything worked fine with outages less than 20 seconds. And this, without modifying my application code, only pointing to the new proxy endpoint.

Related

Jmeter-Springboot-Mysql-ec2 Hikari connection pool - Connection not available

I am trying to perform Jmeter testing for a rest api built using spring boot micro service and JPA.
The API connects to a Mysql instance executes couple of queries in parallel (async) and provide the result in JSON format.
Mysql Instance is deployed in aws.
It works fine for upto 10 users. If I increase it more , then I get connection not available .
engine.jdbc.spi.SqlExceptionHelper : HikariPool-1 - Connection is not available, request timed out after 30000ms.
The only property I have configured in application.properties is connectionTimeout (This property controls the maximum number of milliseconds that a client (that's you) will wait for a connection from the pool. If this time is exceeded without a connection becoming available, a SQLException will be thrown. Lowest acceptable connection timeout is 250 ms. Default: 30000 (30 seconds))
spring.datasource.hikari.connectionTimeout: 80000 (80 seconds).
I read about HikariCP and found that
maximumPoolSize: default: 10.
minimumIdle : Default: same as maximumPoolSize
maxLifetime: Default: 1800000 (30 minutes)
I am trying various combination of the properties mentioned above to test for 100 concurrent users at a time.
Can some one tell me which properties of connection pool to tweak in order to test this for 100 users? Or what is the optimum configuration ?
Thanks in advance

Grails : Timeout: Pool empty. Unable to fetch a connection in 10 sec

Timeout: Pool empty. Unable to fetch a connection in 10 seconds, none available[size:50; busy:50; idle:0; lastwait:10000]
Whenever we are connecting to web app with Socket, it throws this error and socket gets disconnected.
Even after doing following things, problem still persists -
Scaled up AWS EC2 from micro to large
In /etc/my.cnf
wait_timeout = 28800
interactive_timeout = 28800
Added following configurations under both Development as well as production environment.
maxActive = 50
minIdle = 5
maxIdle = 25
maxWait = 10000
maxAge = 10 * 60000
has anyone faced this problem?

connection issues with cleardb from cloudfoundry (on pivotal)

we constantly face issues with the connections to MySQL hosted by ClearDB. We have a dedicated plan which offers more then 300+ connections for our application.
I know the CBR on ClearDB site automatically closes an inactive connection after 60s.
The (Spring) application runs in Tomcat and uses a ConnectionPool with the following settings:
org.apache.tomcat.jdbc.pool.DataSource dataSource = new org.apache.tomcat.jdbc.pool.DataSource();
dataSource.setDriverClassName("com.mysql.jdbc.Driver");
dataSource.setUrl(serviceInfo.getJdbcUrl());
dataSource.setUsername(serviceInfo.getUserName());
dataSource.setPassword(serviceInfo.getPassword());
dataSource.setInitialSize(10);
dataSource.setMaxActive(30);
dataSource.setMaxIdle(30);
dataSource.setTimeBetweenEvictionRunsMillis(34000);
dataSource.setMinEvictableIdleTimeMillis(55000);
dataSource.setTestOnBorrow(true);
dataSource.setTestWhileIdle(true);
dataSource.setValidationInterval(34000);
dataSource.setValidationQuery("SELECT 1");
The error we see in our stack is:
2015-01-13T13:36:22.75+0100 [App/0] OUT The last packet successfully received from the server was 90,052 milliseconds ago. The last packet sent successfully to the server was 90,051 milliseconds ago.; nested exception is com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
2015-01-13T13:36:22.75+0100 [App/0] OUT The last packet successfully received from the server was 90,052 milliseconds ago. The last packet sent successfully to the server was 90,051 milliseconds ago.
2015-01-13T13:36:22.75+0100 [App/0] OUT ... 52 common frames omitted
2015-01-13T13:36:22.75+0100 [App/0] OUT Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
2015-01-13T13:36:22.75+0100 [App/0] OUT at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2914) ~[mysql-connector-java-5.1.33.jar:5.1.33]
2015-01-13T13:36:22.75+0100 [App/0] OUT at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3337) ~[mysql-connector-java-5.1.33.jar:5.1.33]
2015-01-13T13:36:22.75+0100 [App/0] OUT ... 64 common frames omitted
Do you have any ideas what could be causing this or did you have similar experiences with ClearDB and maybe moved somewhere else?
unfortunate I'm out of any ideas, any help is really appreciated.
The error you listed looks a lot like your connection has been disconnected on the remote end (i.e. by ClearDb). 60s is a pretty short window for idle connections, so I'd suggest a few changes to your pool config.
1.) Set initialSize and minIdle (defaults to initialSize) intentionally low. This will keep the number of idle connections low. Less idle connections means there's more of a chance the connection will be reused before the 60s window expires.
2.) You don't need maxIdle here. It defaults to maxActive.
3.) Set timeBetweenEvictionRunsMillis lower. This sets how often the pool will check for idle connections. The default of 5s is probably fine.
4.) Lower minEvictableIdleTimeMillis. This is the minimum amount of time the connection will be in the pool before it can be evicted. It doesn't mean it will be evicted exactly when it's this old though. If the idle check just ran and your connection is minEvictableIdleTimeMillis - 1s old, it will have to wait for the next check to evict the connection (i.e timeBetweenEvictionRunsMillis). If you're using the default timeBetweenEvictionRunsMillis of 5s, setting this to 50s should give it plenty of time.
5.) Set the validationInterval lower. This determines how long the pool will wait since the last successful validation before it validates the connection again. I'd go with something between 2 and 5s. It's high enough that you'll get some benefit when you're busy, and low enough that it won't cause you to miss validation on bad connections.
6.) I'd also suggest that you enable removeAbandoned and logAbandoned, with removeAbandonedTimeout set to something like 5 or 10s (most web apps shouldn't hold the db connection for that long). This will eliminate the possibility that your web app is holding the connection in an idle state for more than 60s, then trying to use it again.

Grails app seems like it's holding a reference to a stale DB connection

Update: Rackspace have got back to me and told me that their MySQL cloud uses a wait_timeout value of 120 seconds
I've been banging my head against this so I thought I'd ask you guys. Any ideas you might have would be appreciated
util.JDBCExceptionReporter - SQL Error: 0, SQLState: 08S01
util.JDBCExceptionReporter - Communications link failure
The last packet successfully received from the server was 264,736 milliseconds
ago. The last packet sent successfully to the server was 32 milliseconds ago.
The error occurs intermittently, often just minutes after the server in question comes up. The DB is nowhere near capacity in terms of load or connections, and I've tried dozens of different configuration combinations.
The fact that this connection last received a packet from the server 264 seconds ago is revealing because it's well above the 120 second timeout put in place by Rackspace. I've also confirmed from the DB end that my 30 second limit is being respected.
Things I've tried
Setting DBCP to expire connections aggressively after 30 seconds, and verified that the MySQL instance reflects this behaviour via SELECT * FROM PROCESSLIST
Switched connection string from hostname to IP address, so this isn't a DNS issue
Various different combinations of connection settings
Tried declaring the connection pool settings in DataSources.groovy or resources.groovy, but I'm fairly sure that the settings are being respected as the DB reflects them: anything over 30 seconds is quickly killed
Any ideas?
Right now my best guess is that something in Grails is holding onto a reference to a stale connection for long enough that the 120 second limit is problematic... but it's a desperate theory and realistically I doubt it's true, but that leaves me short of ideas.
The latest config I've tried:
dataSource {
pooled = true
driverClassName = "com.mysql.jdbc.Driver"
dialect = 'org.hibernate.dialect.MySQL5InnoDBDialect'
properties {
maxActive = 50
maxIdle = 20
minIdle = 5
maxWait = 10000
initialSize = 5
minEvictableIdleTimeMillis = 1000 * 30
timeBetweenEvictionRunsMillis = 1000 * 5
numTestsPerEvictionRun = 50
testOnBorrow = true
testWhileIdle = true
testOnReturn = true
validationQuery = "SELECT 1"
}
}
Stack trace:
2012-10-25 12:36:12,375 [http-bio-8080-exec-2] WARN util.JDBCExceptionReporter - SQL Error: 0, SQLState: 08S01
2012-10-25 12:36:12,375 [http-bio-8080-exec-2] ERROR util.JDBCExceptionReporter - Communications link failure
The last packet successfully received from the server was 264,736 milliseconds ago. The last packet sent successfully to the server was 32 milliseconds ago.
2012-10-25 12:36:12,433 [http-bio-8080-exec-2] ERROR errors.GrailsExceptionResolver - EOFException occurred when processing request: [GET] /cart
Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.. Stacktrace follows:
org.hibernate.exception.JDBCConnectionException: could not execute query
at grails.orm.HibernateCriteriaBuilder.invokeMethod(HibernateCriteriaBuilder.java:1531)
at SsoRealm.hasRole(SsoRealm.groovy:30)
at org.apache.shiro.grails.RealmWrapper.hasRole(RealmWrapper.groovy:193)
at org.apache.shiro.authz.ModularRealmAuthorizer.hasRole(ModularRealmAuthorizer.java:374)
at org.apache.shiro.mgt.AuthorizingSecurityManager.hasRole(AuthorizingSecurityManager.java:153)
at org.apache.shiro.subject.support.DelegatingSubject.hasRole(DelegatingSubject.java:225)
at ShiroSecurityFilters$_closure1_closure4_closure6.doCall(ShiroSecurityFilters.groovy:98)
at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:195)
at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)
at org.apache.shiro.grails.SavedRequestFilter.doFilter(SavedRequestFilter.java:55)
at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:380)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 264,736 milliseconds ago. The last packet sent successfully to the server was 32 milliseconds ago.
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1116)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3589)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3478)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4019)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2683)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2144)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2310)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
... 20 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3039)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3489)
... 29 more
Figured this out. The grails-elasticsearch plugin was holding onto stale connections. This was a known issue in that plugin and a fix came in via this pull request:

Correct way to keep pooled connections alive (or time them out and get fresh ones) during longer inactivity for MySQL, Grails 2 app

I have a grails app that has flurries of high activity, but then often periods of inactivity that can last several hours to over night. I notice that the first users in the morning get the following type of exception, and I believe this is due to the connections in the pool going stale and MYSql database closing them.
I've found conflicting information in Googling about whether using Connector/J connection property 'autoReconnect=true' is a good idea (and whether or not the client will still get an exception even if the connection is then restored), or whether to set other properties that will periodically evict or refresh idle connections, test on borrow, etc. Grails uses DBCP underneath. I currently have a simple config as below, and am looking for an answer on how to best ensure that any connection grabbed out of the pool after a long inactive period is valid and not closed.
dataSource {
pooled = true
dbCreate = "update"
url = "jdbc:mysql://my.ip.address:3306/databasename"
driverClassName = "com.mysql.jdbc.Driver"
dialect = org.hibernate.dialect.MySQL5InnoDBDialect
username = "****"
password = "****"
properties {
//what should I add here?
}
}
Exception
2012-06-20 08:40:55,150 [http-bio-8443-exec-1] ERROR transaction.JDBCTransaction - JDBC begin failed
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 64,129,968 milliseconds ago. The last packet sent successfully to the server was 64,129,968 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1116)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3851)
...... Lots more .......
Caused by: java.sql.SQLException: Already closed.
at org.apache.commons.dbcp.PoolableConnection.close(PoolableConnection.java:114)
The easiest is to configure the connection pool to specify the query to be run to test the connection before it is passed to the application:
validationQuery="select 1 as dbcp_connection_test"
testOnBorrow=true
This same "connection validation" query can be run on other events. I'm not sure of the defaults for these:
testOnReturn=true
testWhileIdle=true
There are also configuration settings that limit the "age" of idle connections in the pool, which can be useful if idle connections are being closed at the server end.
minEvictableIdleTimeMillis
timeBetweenEvictionRunsMillis
http://commons.apache.org/dbcp/configuration.html
I don't know if it is the best way to handle database connection, but I had the same problems as you described. I tried a lot and ended up with the c3p0 connection pool.
Using c3p0 you could force your app to refresh the database connection after a certain time.
Place the c3p0.jar into your lib folder and add your configuration to conf/spring/resources.groovy.
My resources.groovy looks like this:
import com.mchange.v2.c3p0.ComboPooledDataSource
import org.codehaus.groovy.grails.commons.ConfigurationHolder as CH
beans = {
/**
* c3P0 pooled data source that forces renewal of DB connections of certain age
* to prevent stale/closed DB connections and evicts excess idle connections
* Still using the JDBC configuration settings from DataSource.groovy
* to have easy environment specific setup available
*/
dataSource(ComboPooledDataSource) { bean ->
bean.destroyMethod = 'close'
//use grails' datasource configuration for connection user, password, driver and JDBC url
user = CH.config.dataSource.username
password = CH.config.dataSource.password
driverClass = CH.config.dataSource.driverClassName
jdbcUrl = CH.config.dataSource.url
//force connections to renew after 4 hours
maxConnectionAge = 4 * 60 * 60
//get rid too many of idle connections after 30 minutes
maxIdleTimeExcessConnections = 30 * 60
}
}