MySQL events stop working after awhile - mysql

I have some games where users health and other attributes are updated ever couple of minutes using MySQL events. I ran into a problem where eventually the events are no longer being run, the SQL in the event doesn't get executed.
I wasn't sure how else to fix it, so I tried restarting MySQL and that fixes it for awhile. I setup MySQL to restart every night in cron, but that's not a very good solution. Sometimes MySQL fails to restart and hangs.
Edit: All of the tables in my databases that use the events are InnoDB.

It could be that you have events that are not completing and holding many locks. Eventually additional jobs will "stack up" each trying to acquire locks but appearing to do no work. This can be especially true if you are using MyISAM tables as they have table level, not row level locking.
Consider configuring pt-stalk (part of the Percona Toolkit) to capture regular snapshots of 'show processlist' and other important details. Then you can track down when things "stop working" and work backwords to when the problem started.
To prevent jobs from "stacking up" use the GET_LOCK function:
SELECT GET_LOCK('THIS_IS_A_NAMED_LOCK', 0) INTO #got_lock;
IF #got_lock = 1 THEN
select 'do something here';
SELECT RELEASE_LOCK('THIS_IS_A_NAMED_LOCK') INTO #discard;
END IF;
If you are using InnoDB, make sure that you issue START TRANSACTION and COMMIT commands in your event to ensure that you are not creating long running transactions.

I ended up taking my iOS games offline because of this problem recently. I couldn't figure out how to implement the answer by Justin Swanhart. If anyone is interested I can make my php/mysql code available to see if you can fix this problem. Just let me know. andy.triboletti#gmail.com

Related

Why is MySQL Event Scheduler Stuck Opening Tables?

I'm using MySQL 8.0.21 From the MySQL Community Installer on Windows 10 updated to version2004 and for some reason if I create a event in the event scheduler, which calls a procedure once every second (regardless of what that SP actually does, I'll explain my test case) - my CPU maxes out and when I look at the active connections in MySQL Workbench, it stacks up a ton of worker threads which stall on the "Opening Tables" state. My PC freezes, I have to edit the event to be disabled, stop the MySQL process in Task Manager and Start the service again.
TEST CASE
During setup of a brand new server, I used all default settings, except I enabled the general log and I use the new 8.0+ mysql_sha2_password encryption (although I ALTER USER to mysql_native_password for phpmyadmin so that might revert it, I'm honestly not sure)
I create a new Schema called "Test"
I create one Table called
"TestTable" has only one column called "column1" INT
I then create a Stored Procedure "TestProc" which does "SELECT COUNT(*) FROM
TestTable;" Adjusts Priv.'s, DEFINER::Definer is root#localhost and
Reads SQL
And Finally I create an Event called "TestEvent" which does
"CALL TestProc()s" Reoccurring every 1/sec, preserve on Complete, and
definer is root#localhost
restart server before event is fired.
Also, if I enable the event, or create it, it'll run without issue, it's important to note that the issue begins when the event scheduler is left on, and the event is left enabled, then the server is restarted from the services in task manager. Immediately the CPU jacks up to max and active connections show threads stacking up without completing.
Any clues are appreciated, I find no actual errors nor do I have any idea where to begin debugging anymore. I've tried skipping grant tables (but obviously that's not optimal, and didn't work).
I did find a hint when reviewing the MySQL 8.0+ docs
"If a repeating event does not terminate within its scheduling interval, the result may be multiple instances of the event executing simultaneously. If this is undesirable, you should institute a mechanism to prevent simultaneous instances. For example, you could use the GET_LOCK() function, or row or table locking. " from
However, when analyzing there does not appear to be any locks, nor should I need to implement such manually just for this test case (or my actual program)
UPDATE
Up to this point, albeit a rather niche bug, I do believe that is exactly what this is, and I have posted it on MySQL bug forum. Reference post is here:
The answer actually has turned out to be a bug which is reproducible - Bug#: 100449

Kill Long Running Processes in MySQL

Scenario - you have hundreds of reports running on a slave machine. These reports are either scheduled by MySQL's event scheduler or are called via a Python/R or Shell script. Apart from that, there are fifty odd users who are connecting to MySQL slave running random queries. These people don't really know how to write good queries and that's fair. They are not supposed to. So, every now and then (read every day), you see some queries which are stuck because of read/write locks. How do you fix that.
What you do is that you don't kill whatever is being written. Instead, you kill all the read queries. Now, that is also tricky because, if you kill all the read queries, you will also let go off OUTFILE queries, which are actually write queries (they just don't write to MySQL, but write to disk).
Why killing is necessary (I'm only speaking for MySQL, do not take this out of context)
I have got two words for you - Slave lag. We don't want that to happen, because if that happens, all users, reports, consumers suffer.
I have written the following to kill processes in MySQL based on three questions
how long has the query been running?
who is running the query?
do you want to kill write/modify queries too?
What I have intentionally not done yet is that I have not maintained a history of the processes that have been killed. One should do that so as to analyse and find out who is running all the bad queries. But there are other ways to find that out.
I have create a procedure for this. Haven't spend much time on this. So, please suggest if this is a good way to do it or not.
GitHub Gist
Switch to MariaDB. Versions 10.0 and 10.1 implement several limits and timeouts: https://mariadb.com/kb/en/library/query-limits-and-timeouts/
Then write an API between what the users write and actually hitting the database. In this layer, add the appropriate limitations.

What's causing subsequent errors when restarting deadlocked transaction?

When restarting a failed transaction at commit stage I get a second failure when restarting the transaction. This is running Galera Cluster under MariaDB 10.2.6.
The sequence of events goes like this:
Commit a transaction (say a single insert).
COMMIT fails with error 1213 "Deadlock found when trying to get lock"
Begin a new transaction to replay the SQL statement[s].
BEGIN fails with error 1047 "WSREP has not yet prepared node for application use"
My application bails to avoid a more serious crash (see notes below)
This happens quite regularly and although the cluster recovers, individual threads receive failures. Yesterday this happened 15 times in one second.
I cannot identify any root cause for this. It seems that the deadlock is the initiator of the problem. The situation should be recoverable (and often is) But with multiple clients all trying to resolve their deadlocks at the same time, the whole thing seems to just fail.
Notes:
This is related to an earlier question where retrying failed transactions caused total crash of the cluster. I've managed to prevent crashes by retrying transactions only on deadlocks. i.e. if a different type of error occurs during a restart the application gives up.
I'm aware that 10.2.6 is not the latest version of MariaDB. I'm nervous to upgrade right now as I've had such bad experiences. I would like to understand the current problem before doing an upgrade and I've been unable to reproduce the errors in a test environment.
I'm not sure, but I suspect 3 tries (not 2) is appropriate. Committing involves two steps:
Checking for a Deadlock purely within the node you are connected to. (Eg: another query is touching the same row or gap.)
Checking with the other nodes to see if they will complain. (Eg: The same row has already been inserted into another node.)
Sure, either of those could happen repeatedly, and in any order. But making 3 tries seems reasonable.
Now, once you have failed "too many" times, it is right to abort and get a human (a DBA type) involved. I suspect that you could restructure your code / application logic / etc in some way to avoid most of the failures. Would you like to provide more details, so we can discuss that possibility...
What kind of table? (Queue, transactions, logging, etc)
SHOW CREATE TABLE. (auto_inc, unique keys, etc; too many UNIQUE keys can aggravate the situation)
What does the INSERT look like?
How often do you run inserts like this one? How often does it fail? (Instrument your code so you count even those that you can recover from.)
How spread out is the Cluster? (ping time)
What other queries are hitting the table? (They may be aggravating the issue.)

Lots of "Query End" states in MySQL, all connections used in a matter of minutes

This morning I noticed that our MySQL server load was going sky high. Max should be 8 but it hit over 100 at one point. When I checked the process list I found loads of update queries (simple ones, incrementing a "hitcounter") that were in query end state. We couldn't kill them (well, we could, but they remained in the killed state indefinitely) and our site ground to a halt.
We had loads of problems restarting the service and had to forcibly kill some processes. When we did we were able to get MySQLd to come back up but the processes started to build up again immediately. As far as we're aware, no configuration had been changed at this point.
So, we changed innodb_flush_log_at_trx_commit from 2 to 1 (note that we need ACID compliance) in the hope that this would resolve the problem, and set the connections in PHP/PDO to be persistent. This seemed to work for an hour or so, and then the connections started to run out again.
Fortunately, I set a slave server up a couple of months ago and was able to promote it and it's taking up the slack for now, but I need to understand why this has happened and how to stop it, since the slave server is significantly underpowered compared to the master, so I need to switch back soon.
Has anyone any ideas? Could it be that something needs clearing out? I don't know what, maybe the binary logs or something? Any ideas at all? It's extremely important that we can get this server back as the master ASAP but frankly I have no idea where to look and everything I have tried so far has only resulted in a temporary fix.
Help! :)
I'll answer my own question here. I checked the partition sizes with a simple df command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00' query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.
From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.
I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)
We've been having a very similar problem, where the mysql processlist showed that almost all of our connections were stuck in the "query end" state. Our problem was also related to replication and writing the binlog.
We changed the sync_binlog variable from 1 to 0, which means that instead of flushing binlog changes to disk on each commit, it allows the operating system to decide when to fsync() to the binlog. That entirely resolved the "query end" problem for us.
According to this post from Mats Kindahl, writing to the binlog won't be as much of a problem in the 5.6 release of MySQL.
In my case, it was indicative of maxing out the I/O on disk. I had already reduced fsyncs to a minimum, so it wasn't that. Another symptoms is "log*.tokulog*" files start accumulating because the system can't catch up all the writes.
I had met this problem in my production use case . I was using replace into select in the table for archival and it has 10 lakhs records init . I interrupted the query in the middle and it went to killed state .
i had set the innodb_flush_log_at_trx_commit to 1 as well as sync_binlog=1 and it completely recovered in my case . And i have triggered my archival script again without any issues .

How do I oversee my MySQL replication server?

I've had a tough time setting up my replication server. Is there any program (OS X, Windows, Linux, or PHP no problem) that lets me monitor and resolve replication issues? (btw, for those following, I've been on this issue here, here, here and here)
My production database is several megs in size and growing. Every time the database replication stops and the databases inevitably begin to slide out of sync i cringe. My last resync from dump took almost 4 hours roundtrip!
As always, even after sync, I run into this kind of show-stopping error:
Error 'Duplicate entry '252440' for key 1' on query.
I would love it if there was some way to closely monitor whats going on and perhaps let the software deal with it. I'm even all ears for service companies which may help me monitor my data better. Or an alternate way to mirror altogether.
Edit: going through my previous questions i found this which helps tremendously. I'm still all ears on the monitoring solution.
To monitor the servers we use the free tools from Maatkit ... simple, yet efficient.
The binary replication is available in 5.1, so I guess you've got some balls. We still use 5.0 and it works OK, but of course we had our share of issues with it.
We use a Master-Master replication with a MySql Proxy as a load-balancer in front, and to prevent it from having errors:
we removed all unique indexes
for the few cases where we really needed unique constraints we made sure we used REPLACE instead of INSERT (MySql Proxy can be used to guard for proper usage ... it can even rewrite your queries)
scheduled scripts doing intensive reports are always accessing the same server (not the load-balancer) ... so that dangerous operations are replicated safely
Yeah, I know it sounds simple and stupid, but it solved 95% of all the problems we had.
We use mysql replication to replicate data to close to 30 servers. We monitor them with nagios. You can probably check the replication status and use an event handler to restart it with 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; Start Slave;'. That will fix the error, but you'll lose the insert that caused the error.
About the error, do you use memory tables on your slaves? I ask this because the only time we ever got a lot of these error they where caused by a bug in the latests releases of mysql. 'Delete From Table Where Field = Value' will delete only one row in memory tables even though they where multiple rows.
mysql bug descritpion