Google MySQL fails with ERROR 2013 HY000 system error 2 - mysql

I have a D8 google mysql instance. I'm running an etl process trying to push about 100GB of data in but the script keep stopping because of the error.
In order to get it working I have to restart the mysql instance and then re run the process from where it failed. Any help is greatly appreciated. I haven't found anything on google.

This error tends to mean the connection to the MySQL server was lost at some point. As mentioned in this 'Lost connection to MySQL server' article, this error may be encountered when writing a significant number or rows.
The article suggests setting the net_read_timeout to something greater than the default value of 30 seconds. You may also want to consider breaking up your write tasks as well.

Related

Communication link failure: 1047 WSREP has not yet prepared node for application use in

We had a three-node cluster with MariaDB 10.4. We had an outage and the servers all rebooted with one having an irrecoverable network issue at the time.
We set up another server and added it to the cluster as a third member later.
However, ever since that, we have constantly been getting this error every now and then.
*3287799 FastCGI sent in stderr: "PHP message: An Error occurred while handling another error:
PDOException: SQLSTATE[08S01]: Communication link failure: 1047 WSREP has not yet prepared node for application use in /var/....yii2/db/Command.php:1293
In order to fix this issue, we turned down all three nodes one by one and then re-initialized the cluster, even with a new cluster name and all.
The first one was started with "galera_new_cluster" and the remaining two were added to this cluster. However, we still kept getting the same error intermittently.
The workaround at mariadb galera - Error when a node shutdown ERROR 1047 WSREP has not yet prepared node for application use was followed but that didn't do anything, as expected.
Next, what we did is set up a single fresh server and installed the new 10.5.X MariaDB server on it. Took backup from the old cluster using mariabackup and restored it onto this new single server.
This single server was set up as a new cluster with fresh details and everything. We wanted to run it as a single node cluster to make sure if the error still persisted. Oddly enough, the error is still there and it comes off every half an hour or so.
Has anyone got any clue what could be the reason for this weird issue we're facing? Currently, we don't know what exactly is the issue which is why we're facing a hard time solving it.
Any help would be greatly appreciated.
Update:
We turned off galera on this single-node cluster and ran it as a simple stand-alone mariadb server. However we still go the same errors in our web-server's logs. This is bonkers.
Any idea? Anyone?

MySQL Query running even after losing connection

I've a MySQL 5.1.41 Server installed on a Ubuntu machine. I get connected to it through Workbench from my Windows machine over TCP/IP. I run a bigger query, after 900 seconds I got the below message, (there is no wait_timeout defined in the server's configuration file my.cnf)
Error Code: 2013. Lost connection to MySQL server during query
But when I look into the process list by using show processlist; command, I can still see my query running.
I got this link http://dev.mysql.com/doc/refman/5.0/en/gone-away.html where I found the below lines,
The problem on Windows is that in some cases MySQL does not get an
error from the OS when writing to the TCP/IP connection to the server,
but instead gets the error when trying to read the answer from the
connection.
I'm not sure whether this is the reason for my observation.
Please clarify me on this.
Thanks in advance!!
Closing connection is not a reason to stop a query. A query might be update, or kind of transaction, or select with output to remote (server) file.
Closed connection is just is just means, that you will not receive any data from DBMS after executing query (data, timings - nothing).
The reason of closing connection could be different, as SO-User posted. Try increasing
on server side:
wait_timeout
max_allowed_packet
on client side:
any kinds of timeout you find in your client (i.e. that SO-User suggests)
Do not forget to reload DBMS config and restart client (for sure)
In MySQL WorkBench we have an option to change timeout.
Find it under
Edit → Preferences → SQL Editor → DBMS connection read time out (in seconds): 600
Changed the value to 6000 or something higher.
Update
Lost connection to MySQL server
There are three likely causes for this error message.
Usually it indicates network connectivity trouble and you should check
the condition of your network if this error occurs frequently. If the
error message includes “during query,” this is probably the case you
are experiencing.
Sometimes the “during query” form happens when millions of rows are
being sent as part of one or more queries. If you know that this is
happening, you should try increasing net_read_timeout from its default
of 30 seconds to 60 seconds or longer, sufficient for the data
transfer to complete.
More rarely, it can happen when the client is attempting the initial
connection to the server. In this case, if your connect_timeout value
is set to only a few seconds, you may be able to resolve the problem
by increasing it to ten seconds, perhaps more if you have a very long
distance or slow connection. You can determine whether you are
experiencing this more uncommon cause by using SHOW GLOBAL STATUS LIKE
'Aborted_connects'. It will increase by one for each initial
connection attempt that the server aborts. You may see “reading
authorization packet” as part of the error message; if so, that also
suggests that this is the solution that you need.
If the cause is none of those just described, you may be experiencing
a problem with BLOB values that are larger than max_allowed_packet,
which can cause this error with some clients. Sometime you may see an
ER_NET_PACKET_TOO_LARGE error, and that confirms that you need to
increase max_allowed_packet.
Doc link: Error lost connection
and also check here

MySQL 5.5 : "Got an error reading communication packets"

I just upgraded MySQL from 5.1 to 5.5.
I fixed few issues running mysql_upgrade, and changing some deprecated configurations...
I also updated PHP, from 5.3.3-7 to 5.3.29-1.
But, since that, I'm having a reccurent problem (always thrown in this order) :
1. Client* - PHP Warning
Warning: Packets out of order. Expected 1 received 0. Packet size=1 in
/home/www/www.mywebsite.com/shared/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php
line 694
2. Client* - PHP Warning
Warning: PDOStatement::execute() [pdostatement.execute]: Error reading
result set's header in
/home/www/www.mywebsite.com/shared/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php
line 694
3. Server* - MySQL Warning :
150127 17:25:15 [Warning] Aborted connection 309 to db:
'my_database' user: 'root' host: '127.0.0.1' (Got an error
reading communication packets)
4. Client* - PHP Error
PDOStatement::execute() [pdostatement.execute]: MySQL server
has gone away in
/home/www/www.mywebsite.com/shared/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php
line 694
*NB: What I call "Client" is the PHP Application, and "Server" is the MySQL Server, even if they're both on the same localhost Server.
So, apparently, the origin of all those problems is the first one : "Packets out of order".
But when I search for this error I can't find many answers, and they are most of the time not related to my problem : I use Doctrine as an abstraction, so I don't write any query or fetch any result myself. Plus, it's almost never the same values as me, but in my case I always get those values ("Expected 1 received 0. Packet size=1").
The closest result would be this MySQL bug report, but "No feedback was provided for this bug for over a month, so it is
being suspended automatically"...
Plus, some of the "2." errors aren't thrown by my PHP Doctrine code (they're not executed from my localhost, but from another known external service, probably using some old PHP Propel code).
So that might mean there is a problem with my MySQL configuration itself, but I tried changing some parameters without obtaining any obvious effect (sometimes it takes more time after restarting MySQL to get the first errors for example).
Any help would be very much appreciated !
And here is my current configuration (I've got 2 MySQL instances, the second one using replication is mostly for read only).
I also checked most of the system resources with Munin and didn't see anything abnormal (the RAM usage for example is pretty high, but as there is 50Go on the server it's not full at all).
UPDATE
I isolated an SQL query that was repeatedly failing from my PHP Client. When I executed from my local with MySQL Workbench, it did exactly the same (closed the connexion with a MySQL server has gone away message). When I did it from the sql command line it also did the same. Then I executed it from the sql command line on the server host, and it succeded. But some time after when I tried again from Workbench/whatever it worked... So it looks like those "corrupted packets" are cached and disapear after some time.
Thanks, I fixed this issue doing :
RESET QUERY CACHE;
FLUSH QUERY CACHE;

MySQL Cluster ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER

Today my datacenter had a breaker fail which resulted in my servers losing power. I'm running a 4 node MySQL cluster. I restarted the cluster, first the management nodes, then the data nodes, then after the data nodes were running I started the SQL nodes. I then checked the cluster with ndb_mgm -e SHOW. Everything seemed fine until I tried running a query. I got this error,
ERROR 1296 (HY000): Got error 157 'Unknown error code' from NDBCLUSTER
I check the MySQL logs and could not find any errors. I then tried a full shutdown and restart of the MySQL cluster and checked config between the shutdown and start. Everything seemed to check out. I then ran a query on another database using the NDBCLUSTER engine. The query was successful. I've tried searching google but no one seems to have any answers that help. I've checked the config, I've made sure ndbd is running on the data nodes, ect. The other databases seem to be working fine except this one. I have a backup of the database but I would much preferably recover the database if possible.
If anyone has any suggestion or ideas, it would be greatly appreciated.
Thanks in advance.
Error 157 is actually 'could not connect to storage engine' and the fact that MySQL fails to report that error correctly is a bug: http://bugs.mysql.com/bug.php?id=44817
The case described in that bug mentions that you get the error when you try to query a table in NDB when the cluster is still down.
So I'm just guessing, but I would conclude that your cluster is not started. Either you missed starting one of the nodes, or else something went wrong starting one of the nodes.
Check the mysql server is really connected to the NDB storage.
Do from a mysql server that should be connected to NDB
SHOW GLOBAL STATUS LIKE 'Ndb_cluster_node_id';
Is the answer > 0?
SHOW GLOBAL STATUS LIKE 'Ndb_number_of_data_nodes';
Is the answer > 0 ?
If not, then the mysql server is not connected and then i would recommend you check your firewall and /etc/hosts table and make sure you dont have a line like:
127.0.0.1 localhost ..
Best regards
Johan

ColdFusion 10 Communications link failure to MySQL

We are migrating some websites onto a cloud infrastructure running Windows 2008 virtual machines. These websites all run on ColdFusion with MySQL databases. They currently are running in our CoLo with no problems. Additionally, they are running on our development network in our offices with no problems.
We are setting up our cloud to match as closely as possible the configuration we currently use which is, essentially, CF10 + IIS on one server and MySQL on a separate machine. We are 99% finished and most things are running great. However....
We have run into a couple, as in 2, places where we click a link/button and are greeted with:
Error Executing Database Query.
Communications link failure The last packet successfully received from the server was 0 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.
Scanning the stack-trace I also find:
Caused by: java.net.SocketException: Connection reset
The communications link error is ALWAYS: 0ms.
What's most puzzling is the Queries that seem to be causing this are simple queries that are used ALL OVER the sites with no problems. Why they are failing at hese 2 particular places has us at wits end.
Our only clue is, looking at the CF Error description of what scripts are called, we can see the script where the query is failing is getting called twice? For example, one of the occurences is in our Application file:
>The error occurred in D:/Our_Web_Sites/oursite/Application.cfm: line 73
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 17
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 1
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 73
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 17
>Called from D:/Our_Web_Sites/oursite/Application.cfm: line 1
We can find nothing in our CF code that would be causing the script to be called twice so our guess is the first call is failing on the Query so CF tries again...only to fail and error.
Googling this issue I've found lots of posts about changing the MySQL timeouts. None of those worked and I didn't expect them to since what we're dealing with doesn't appear to be a timeout issue. These pages fail each and every time.
The closest we've come to a solution came from this blog posting:
http://www.talkingtree.com/blog/index.cfm/2011/1/12/Validation-Query-for-MySQL-communications-link-failure!
If we UNCHECK the "Maintain connections across client requests. " setting in CFAdmin then the error goes away. The blog suggests leaving that checked, which is our preference, and using Connection Validation of "SELECT 1;". Try that...same error.
We've also tried the JDBC AutoConnect=true option. No effect.
Downloaded latest JDBC Connector and used it instead of standard CF10-MySQL connector. No effect.
Again, 99% of the site works with the exception of these two links, both of which work just fine in all our other environments. Any other ideas?
I feel like I've had a similar problem every time I upgrade CF or MySQL. Usually a change in the JDBC driver or connection string helps, which I see you already tried.
Have you checked the MySQL error log for any hints? Ours is in /var/lib/mysql (whatever your 'datadir' variable is set to) and ends with a .err extension.
Also, maybe trying some of the other JDBC connection string options for your version? I see some extended logging you can enable.
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html
Found the issue. We are running our network on Savvis' cloud infrastructure. The Windows server instances we were using from Savvis had Trend Micro Deep Security Agent installed. This is an intrusion protection system and it was the problem. Disabling the service cleared up all communication errors. I have no clue why it was rejecting some queries that it had just accepted previously. I am just glad to (finally) put this behind me!