Spring Boot + Liquibase: Communications link failure - mysql

I'm using Liquibase in all my projects, I really love the way it handles db updates, but recently I'm having this issue:
liquibase : Successfully acquired change log lock
liquibase : Successfully released change log lock
liquibase : Could not release lock
liquibase.exception.LockException: liquibase.exception.DatabaseException: liquibase.exception.DatabaseException: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.
at liquibase.lockservice.StandardLockService.releaseLock(StandardLockService.java:283) ~[liquibase-core-3.5.5.jar!/:na]
at liquibase.Liquibase.update(Liquibase.java:218) [liquibase-core-3.5.5.jar!/:na]
at liquibase.Liquibase.update(Liquibase.java:192) [liquibase-core-3.5.5.jar!/:na]
.
.
.
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.
.
.
liquibase : Failed to restore the auto commit to true
.
.
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 16,913 milliseconds ago. The last packet sent successfully to the server was 89 milliseconds ago.
.
.
Caused by: java.lang.NullPointerException: null
at com.mysql.jdbc.MysqlIO.clearInputStream(MysqlIO.java:899) ~[mysql-connector-java-5.1.46.jar!/:5.1.46]
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2477) ~[mysql-connector-java-5.1.46.jar!/:5.1.46]
I have 3 apps running on my server. One of them running for about 2years. The second running for 2 months and the third one is new.
They are all Spring Boot applications.
This started happening when I want to update the jar of the second app, I stopped the old jar, run the new one and I can not get it to start with the error above.
Then I run the old jar of the same app and it started just fine. I try stopping and restarting this old jar and a few times I get the error above but most of the times it started just fine.
I try restarting the server (all three apps start as services with system boot) but none of them successfully start because of the same error. Not even the one running for 2 years. I stop all other apps and try again with this long-lasting app, it kept failing until it started.
Could it be a memory issue? I'm using a DigitalOcean droplet with 1 Core and 2gb RAM.
Also, what I've notice is that when it successfully start the logs looks like:
liquibase : Successfully acquired change log lock
liquibase : Reading from myDataBase.DATABASECHANGELOG
liquibase : Successfully released change log lock
Note the statement between the acquisition and release of the lock. I also suspect about a timing issue, between acquiring the lock and reading the changelog. But don't know if I can increase that time or how can I do it.
UPDATE
I don't known the source of the issue, but after updating the liquibase-core to v3.7.0 all the apps started correctly
UPDATE 2
About the previous update, I was testing as is not that the problem is solved, is just less frequent. But it has happen
IMPORTANT UPDATE 3
After some digging I realize that by the same time this happen, the team was playing around with DB Connection Pool. The application.properties files contain this:
spring.datasource.tomcat.initial-size=5
spring.datasource.tomcat.max-wait=20000
spring.datasource.tomcat.max-active=5
spring.datasource.tomcat.max-idle=5
spring.datasource.tomcat.min-idle=5
spring.datasource.tomcat.default-auto-commit=true
Sadly I don't understand much about DB Pool. But what raise my attention was the last line about the auto-commit. The last time this error pops up. I disable all those lines and the app started fine. I don't know if it was just coincidence or not.
Thank you very much!
UPDATE 4
I'm back to square one... I don't know how to solve this any clue would be appreciatted
UPDATE 5
I updated mysql-connector-java to v8.0.15. My server has MySQL v5.6.33. I'm still having this issue and now I can not restart my app.
After the above change, now the exception ends with:
The last packet successfully received from the server was 10,036 milliseconds ago. The last packet sent successfully to the server was 10,036 milliseconds ago.
.
.
.
Caused by: java.io.IOException: Socket is closed
at com.mysql.cj.protocol.AbstractSocketConnection.getMysqlInput(AbstractSocketConnection.java:72) ~[mysql-connector-java-8.0.15.jar!/:8.0.15]
at com.mysql.cj.protocol.a.NativeProtocol.clearInputStream(NativeProtocol.java:833) ~[mysql-connector-java-8.0.15.jar!/:8.0.15]
... 92 common frames omitted
UPDATE 6 - WORKAROUND SOLUTION
In a moment of clarity I decided to disable liquibase from my project, by setting liquibase.enabled=false in the application.properties. Now I can start/stop/restart the app anytime with no problem. As I only use liquibase to mantain the DataBase, and the changes I made are executed (the problem is when releasing the lock), I will enable Liquibase just to update the schema and then disable it to run the app
UPDATE 7 - I think I've solve it
I kept stoping and starting the apps over and over and every time I did it I get the Socket is closed so I began to make my head around a timing issue... From the exceptions thrown I notice that every time it says that:
The last packet successfully received from the server was xxxx milliseconds ago.
The smallest value I get of those milliseconds ago was 10.085 and by chance I have some logs of the same app from time ago when this issue never pop-up. From those logs statements I meassure the time between acquiring the change-log-lock and releasing it. To my surprise it was around 9 seconds and never above that.
So I login to mysql console and issue:
show variables LIKE '%timeout%';
Which gave me a list of the defined timeouts. And there it was connect_timeout was the smallest and set to 10 seconds. It has to be it.
I googled how to set that value. I've try creating a file /etc/my.cnf and adding to it:
[mysqld]
connect_timeout=20
Then restart mysql with:
sudo /etc/init.d/mysql restart
But when I log back to the mysql console the value was still 10. So in the mysql console I issued this other command:
SET GLOBAL connect_timeout=20;
And the value was updated (I don't know if this will persist after a restart)
I restart mysql again and... voilĂ ! Now the apps start normally with liquibase enabled. I'm happy! :-)

I kept stoping and starting the apps over and over and every time I did it I get the Socket is closed so I began to make my head around a timing issue... From the exceptions thrown I notice that every time it says that:
The last packet successfully received from the server was xxxx milliseconds ago.
The smallest value I get of those milliseconds ago was 10.085 and by chance I have some logs of the same app from time ago when this issue never pop-up. From those logs statements I meassure the time between acquiring the change-log-lock and releasing it. To my surprise it was around 9 seconds and never above that.
So I login to mysql console and issue:
show variables LIKE '%timeout%';
Which gave me a list of the defined timeouts. And there it was connect_timeout was the smallest and set to 10 seconds. It has to be it.
I googled how to set that value. I've try creating a file /etc/my.cnf and adding to it:
[mysqld]
connect_timeout=20
Then restart mysql with:
sudo /etc/init.d/mysql restart
But when I log back to the mysql console the value was still 10. So in the mysql console I issued this other command:
SET GLOBAL connect_timeout=20;
And the value was updated (I don't know if this will persist after a full server restart)
I restart mysql again and... voilĂ ! Now the apps start normally with liquibase enabled. I'm happy! :-)

We were working with MySQL 8.0.19. For me, it were only a query which used to fail with the exception Caused by: java.io.IOException: Socket is closed.
Found that for MySQL 8.0.19, there was a bug reported with a similar stack trace of mine - https://bugs.mysql.com/bug.php?id=99234
After upgrading MySQL server to 8.0.21, it started working ! No other changes.
Hope it helps for folks who have spent quite some time with selected query(ies) not working and having the same error, and tried out various other options.

have you solved it yet?
I had a similar problem and solved it by forcing Liquibase to execute in synchronous mode.
Was your spring boot app generated by jhipster by any chance? If so, take a look at DatabaseConfiguration and change from AsyncSpringLiquibase to SpringLiquibase

Related

Communication link failure: 1047 WSREP has not yet prepared node for application use in

We had a three-node cluster with MariaDB 10.4. We had an outage and the servers all rebooted with one having an irrecoverable network issue at the time.
We set up another server and added it to the cluster as a third member later.
However, ever since that, we have constantly been getting this error every now and then.
*3287799 FastCGI sent in stderr: "PHP message: An Error occurred while handling another error:
PDOException: SQLSTATE[08S01]: Communication link failure: 1047 WSREP has not yet prepared node for application use in /var/....yii2/db/Command.php:1293
In order to fix this issue, we turned down all three nodes one by one and then re-initialized the cluster, even with a new cluster name and all.
The first one was started with "galera_new_cluster" and the remaining two were added to this cluster. However, we still kept getting the same error intermittently.
The workaround at mariadb galera - Error when a node shutdown ERROR 1047 WSREP has not yet prepared node for application use was followed but that didn't do anything, as expected.
Next, what we did is set up a single fresh server and installed the new 10.5.X MariaDB server on it. Took backup from the old cluster using mariabackup and restored it onto this new single server.
This single server was set up as a new cluster with fresh details and everything. We wanted to run it as a single node cluster to make sure if the error still persisted. Oddly enough, the error is still there and it comes off every half an hour or so.
Has anyone got any clue what could be the reason for this weird issue we're facing? Currently, we don't know what exactly is the issue which is why we're facing a hard time solving it.
Any help would be greatly appreciated.
Update:
We turned off galera on this single-node cluster and ran it as a simple stand-alone mariadb server. However we still go the same errors in our web-server's logs. This is bonkers.
Any idea? Anyone?

ColdFusion 10 Communications link failure to MySQL

We are migrating some websites onto a cloud infrastructure running Windows 2008 virtual machines. These websites all run on ColdFusion with MySQL databases. They currently are running in our CoLo with no problems. Additionally, they are running on our development network in our offices with no problems.
We are setting up our cloud to match as closely as possible the configuration we currently use which is, essentially, CF10 + IIS on one server and MySQL on a separate machine. We are 99% finished and most things are running great. However....
We have run into a couple, as in 2, places where we click a link/button and are greeted with:
Error Executing Database Query.
Communications link failure The last packet successfully received from the server was 0 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.
Scanning the stack-trace I also find:
Caused by: java.net.SocketException: Connection reset
The communications link error is ALWAYS: 0ms.
What's most puzzling is the Queries that seem to be causing this are simple queries that are used ALL OVER the sites with no problems. Why they are failing at hese 2 particular places has us at wits end.
Our only clue is, looking at the CF Error description of what scripts are called, we can see the script where the query is failing is getting called twice? For example, one of the occurences is in our Application file:
>The error occurred in D:/Our_Web_Sites/oursite/Application.cfm: line 73
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 17
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 1
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 73
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 17
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 1
We can find nothing in our CF code that would be causing the script to be called twice so our guess is the first call is failing on the Query so CF tries again...only to fail and error.
Googling this issue I've found lots of posts about changing the MySQL timeouts. None of those worked and I didn't expect them to since what we're dealing with doesn't appear to be a timeout issue. These pages fail each and every time.
The closest we've come to a solution came from this blog posting:
http://www.talkingtree.com/blog/index.cfm/2011/1/12/Validation-Query-for-MySQL-communications-link-failure!
If we UNCHECK the "Maintain connections across client requests. " setting in CFAdmin then the error goes away. The blog suggests leaving that checked, which is our preference, and using Connection Validation of "SELECT 1;". Try that...same error.
We've also tried the JDBC AutoConnect=true option. No effect.
Downloaded latest JDBC Connector and used it instead of standard CF10-MySQL connector. No effect.
Again, 99% of the site works with the exception of these two links, both of which work just fine in all our other environments. Any other ideas?
I feel like I've had a similar problem every time I upgrade CF or MySQL. Usually a change in the JDBC driver or connection string helps, which I see you already tried.
Have you checked the MySQL error log for any hints? Ours is in /var/lib/mysql (whatever your 'datadir' variable is set to) and ends with a .err extension.
Also, maybe trying some of the other JDBC connection string options for your version? I see some extended logging you can enable.
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html
Found the issue. We are running our network on Savvis' cloud infrastructure. The Windows server instances we were using from Savvis had Trend Micro Deep Security Agent installed. This is an intrusion protection system and it was the problem. Disabling the service cleared up all communication errors. I have no clue why it was rejecting some queries that it had just accepted previously. I am just glad to (finally) put this behind me!

CI insertion failed

I was having a problem while registering a new user in Code Igniter and it failed. I some how managed to echo the query and when i run it manually on phpmyadmin, it gives me the following error
#1205 - Lock wait timeout exceeded; try restarting transaction
MySQL error 1205 can occur when your application (in this case, PHP) dies in the middle of a transaction and the connection is not closed.
Restarting MySQL will fix the problem, but you should try finding out where the error is in your code. It should be in the PHP error log. If Code Igniter was able to catch the error (unlikely, given the circumstance) then it may be in application/logs.
If execution time might be the culprit, check PHP: Runtime Configuration. Specifically:
max_execution_time and, additionally, memory_limit
Increasing these values (the amount of which, you'll have to experiment with) should eliminate script execution time as a potential issue. You can set these in your php.ini. Since you're using xampp, the location will be either <installation drive/directory>\xampp\php or <installation drive/directory>\xampp\apache\bin.
Hopefully this helps you resolve your question.
this problem was on my local server, i am using xampp, but the query was running fine on dev server. May be there is some execution time issue with xampp. But the issue is fixed now. Thanks to all :)

Quartz failure in notifyJobStoreJobComplete method

Scenario:
We have a scheduler which is using JDBC Job Store. Quartz version is 2.1.2.
The job which is being scheduling is also updating a database.
The database is same for both quartz and the job itself and is hosted in MySQL Server. Both application tables and quartz tables are stored in the same database.
Connection pool is different for both application and quartz. In the application we are using spring for connection pooling and quartz is forced to use connection pooling via quartz.properties.
Here is the snippet of quartz.properties
org.quartz.dataSource.qzDS.driver = com.mysql.jdbc.Driver
org.quartz.dataSource.qzDS.URL = jdbc:mysql://localhost:3306/dbname?autoReconnect=true
org.quartz.dataSource.qzDS.user = dbuser
org.quartz.dataSource.qzDS.password =dbpassword
org.quartz.dataSource.qzDS.maxConnections = 30
org.quartz.datasource.qzDS.validationQuery = select 1
#org.quartz.datasource.qzDS.minEvictableIdleTimeMillis=21600000
#org.quartz.datasource.qzDS.timeBetweenEvictionRunsMillis=1800000
#org.quartz.datasource.qzDS.numTestsPerEviction=-1
#org.quartz.datasource.qzDS.testWhileIdle=true
org.quartz.datasource.qzDS.debugUnreturnedConnectionStackTraces=true
org.quartz.datasource.qzDS.unreturnedConnectionTimeout=120
org.quartz.datasource.qzDS.initialPoolSize=5
org.quartz.datasource.qzDS.minPoolSize=5
org.quartz.datasource.qzDS.maxPoolSize=30
org.quartz.datasource.qzDS.acquireIncrement=5
org.quartz.datasource.qzDS.maxIdleTime=120
org.quartz.datasource.qzDS.validateOnCheckout=true
Database is clustered with MASTER-MASTER replication on two servers and they are being used via virtual IP everywhere in the application and quartz.
Scheduler i.e. quartz is also clustered on the same two machines where MySQL is clustered.
The problem:
One of the servers (till now we have got the problem with backup server machine) is occasionally throwing database connection error while calling notifyJobStoreJobComplete method. This is causing the job to stay in BLOCKED state even if the job itself has successfully completed but quartz was unable to update its status.
Questions:
What can be the cause of the problem?
How to move the BLOCKED jobs into WAITING state so that the jobs can be run on their next scheduled time at least. Direct editing the QRTZ_SIMPLE_TRIGGERS tables would not be a good solution, even if it works.
EDIT: To bump up the question.
the error during notifyJobStoreJobComplete is: org.quartz.impl.jdbcjobstore.JobStoreTX - Failed to override connection auto commit/transaction isolation.
[java] com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 619,082,686 milliseconds ago. The last packet sent successfully to the server was 619,082,686 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
I think main problem was communication link failure by MySQL which we solved it by increasing 'wait_timeout' to 14 days and as our maintenance is scheduled in every 15 days, we restart the each of MySQL server is our DB cluster (We have Master-Master replication in place). With approach we haven't get any communication link failure after that. In fact some time we don't restart the server in every 15 days but still no error(touch wood). :)
And as far as Quartz triggers being locked in BLOCKED state, we updated the quartz to 2.1.4 which possibly has the fix for the almost same problem. After the quartz update, we have faced the triggers being in BLOCKED state very very less frequent.
We are still unable to find out how to get the trigger out of BLOCKED state without directly modifying the quartz tables. Whenever we face this problem, we manually remove the entry for BLOCKED trigger from the qrtz_fired_triggers table and it solves the problem. I think enterprise version of quartz may have this feature from some web UI.

Best way to debug MySQL connections that are being closed on me after ~39 minutes?

I have hibernate 3.3, c3p0, MySql 5.1, and Spring.
The MySQL connections in my service calls are consistently being closed after ~39 minutes. The natural running time of my service call is on the order of ~5 hours.
I've tried changing various c3p0 config, etc, to avoid the 39 minute cap. No luck.
Is there a more direct, systematic way to log or troubleshoot this? i.e. can I find out why the connection is being closed, and by whom, at which layer?
Update: stack trace
24 Oct 2010 02:22:12,262 [WARN] 012e323c-df4b-11df-89ed-97e9a9c1ac19 (Foobar Endpoint : 3) org.hibernate.util.JDBCExceptionReporter: SQL Error: 0, SQLState: 08003
24 Oct 2010 02:22:12,264 [ERROR] 012e323c-df4b-11df-89ed-97e9a9c1ac19 (Foobar Endpoint : 3) org.hibernate.util.JDBCExceptionReporter: No operations allowed after connection closed.
24 Oct 2010 02:22:12,266 [ERROR] 012e323c-df4b-11df-89ed-97e9a9c1ac19 (Foobar Endpoint : 3) org.hibernate.event.def.AbstractFlushingEventListener: Could not synchronize database state with session
I have hibernate 3.3, c3p0, MySql 5.1, and Spring. The MySQL connections in my service calls are consistently being closed after ~39 minutes. The natural running time of my service call is on the order of ~5 hours.
I'm not sure I understood. Do you have processes that are supposed to run for 5 hours but currently get aborted after ~39mn (or probably 2400 seconds). Can you confirm? What is previously working? Did you change anything?
Meanwhile, here are some ideas:
start with the database (see B.5.2.11. Communication Errors and Aborted Connections)
start the server with the --log-warnings option and check the logs for suspicious messages
see if you can reproduce the problem using a MySQL client from the db host
if it works, do the same thing from the app server machine
it if works, you'll know MySQL is ok
move at the app server level
activate logging (of Hibernate and C3P0) to get a full stack trace and/or more hints about the culprit
also please show your C3P0 configuration settings
And don't forget that C3P0's configuration when using Hibernate is very specific and some settings must go in a c3p0.properties file.
Auto reconnect configuration
http://hibernatedb.blogspot.com/2009/05/automatic-reconnect-from-hibernate-to.html