how to avoid high CPU usage when no active queries - mysql

On my linux server I ran into a problem this morning. I realized that after I killed all background tasks that were making ~200k database entries (1gb in size) throughout the night, my CPU usage was still at 80%, only due to MySQL.
Neither a reboot or restarting MySQL or nginx worked.
"InnoDB saved the data in the rows it was changing, and queries against that changes that are still being rolled back should be answered transparently using the data that's still in the undo log."
I am not too familiar with this topic, but it seems like that this is the answer of why there is a high CPU usage even when there are no queries. When I SHOW PROCESSLIST it shows three connections and it says in the "state"-column "Copying to tmp table".
I guess right now I just have to wait out until all those rollback processes are finished, but why do they come up in the first place and how can I avoid this from happening again?

Writes and index-updates are "delayed". This can lead to I/O and CPU activity even after all queries are finished.
"copying to tmp table" in the PROCESSLIST implies that something is still running. Chase down that query. It can possibly be improved with a better index or a rewrite. Killing mysqld will lead to a costly rollback now and/or when mysqld is restarted.
Killing a process in the middle of a transaction leads to an immediate ROLLBACK. Change the application to intercept the 'kill' and gracefully wait until things are in a good position for shutting down.
UPDATEing a million rows in a single statement takes a loooong time. Perhaps you killed that (or something like it)? Consider breaking such up into chunks using 1000-row ranges on the PRIMARY KEY.

Related

Mysql debezium connector for rds in production caused deadlocks

We are creating a data pipeline from Mysql in RDS to elastic search for creating search indexes,
and for this using debezium cdc with its mysql source and elastic sink connector.
Now as the mysql is in rds we have to give the mysql user LOCK TABLE permission for two tables we wanted cdc, as mentioned in docs.
We also have various other mysql users performing transactions which may require any of the two tables.
As soon as we connected the mysql connector to our production database there was a lock created and our whole system went down, after realising this we soon stopped the kafka and also removed the connector, but the locks where still increasing and it only solved after we stop all the new queries by stopping our production code from running and manually killing the processes.
What could be the potential cause for this, and how could we prevent this ?
I'm only guessing because I don't know your query traffic. I would assume the locks you saw increasing were the backlog of queries that had been waiting for the table locks to be released.
I mean the following sequence is what I believe happened:
Debezium starts table locks on your two tables.
The application is still working, and it is trying to execute queries that access those locked tables. The queries begin waiting for the lock to be released. They will wait for up to 1 year (this is the default lock_wait_timeout value).
As you spend some minutes trying to figure out why your site is not responding, a large number of blocked queries accumulate. Potentially as many as max_connections. After all the allowed connections are full of blocked queries, then the application cannot connect to MySQL at all.
Finally you stop the Debezium process that is trying to read its initial snapshot of data. It releases its table locks.
Immediately when the table locks are released, the waiting queries can proceed.
But many of them do need to acquire locks too, if they are INSERT/UPDATE/DELETE/REPLACE or if they are SELECT ... FOR UPDATE or other locking statements.
Since there are so many of these queries queued up, it's more likely for them to be requesting locks that overlap, which means they have to wait for each other to finish and release their locks.
Also because there are hundreds of queries executing at the same time, they are overtaxing system resources like CPU, causing high system load, and this makes them all slow down too. So it will take longer for queries to complete, and therefore if they are blocked each other, they have to wait longer.
Meanwhile the application is still trying to accept requests, and therefore is adding more queries to execute. They are also subject to the queueing and resource exhaustion.
Eventually you stop the application, which at least allows the queue of waiting queries to gradually be finished. As the system load goes down, MySQL is able to process the queries more efficiently and finishes them all pretty soon.
The suggestion by the other answer to use a read replica for your Debezium snapshot is a good one. If your application can read from the master MySQL instance for a while, then no query will be blocked on the replica while Debezium has it locked. Eventually Debezium will finish reading all the data, and release the locks, and then go on to read only the binlog. Then the app can resume using the replica as a read instance.
If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that's done, switch to the master to read the binlog. But if you don't use GTID, that's a little more tricky. The tool would have to know the binlog position on the master corresponding to the snapshot on the replica.
If the locking is problem and you cannot afford to tradeoff locking vs consistency then please take a look at snapshot.locking.mode config option.
Use the replica to prevent lock table statement getting executed, why debezium need lock table? all CDC tool fetch the events from bin logs.
The reason is that debezium is not as written in the document (version 1.5). Once FTWRL acquisition fails, it will execute the lock table. It will be released after the snapshot is read. If you see in the log that "Unable to refresh and obtain the global read lock, the table read lock will be used after reading the table name", congratulations, lucky one

Lots of "Query End" states in MySQL, all connections used in a matter of minutes

This morning I noticed that our MySQL server load was going sky high. Max should be 8 but it hit over 100 at one point. When I checked the process list I found loads of update queries (simple ones, incrementing a "hitcounter") that were in query end state. We couldn't kill them (well, we could, but they remained in the killed state indefinitely) and our site ground to a halt.
We had loads of problems restarting the service and had to forcibly kill some processes. When we did we were able to get MySQLd to come back up but the processes started to build up again immediately. As far as we're aware, no configuration had been changed at this point.
So, we changed innodb_flush_log_at_trx_commit from 2 to 1 (note that we need ACID compliance) in the hope that this would resolve the problem, and set the connections in PHP/PDO to be persistent. This seemed to work for an hour or so, and then the connections started to run out again.
Fortunately, I set a slave server up a couple of months ago and was able to promote it and it's taking up the slack for now, but I need to understand why this has happened and how to stop it, since the slave server is significantly underpowered compared to the master, so I need to switch back soon.
Has anyone any ideas? Could it be that something needs clearing out? I don't know what, maybe the binary logs or something? Any ideas at all? It's extremely important that we can get this server back as the master ASAP but frankly I have no idea where to look and everything I have tried so far has only resulted in a temporary fix.
Help! :)
I'll answer my own question here. I checked the partition sizes with a simple df command and there I could see that /var was 100% full. I found an archive that someone had left that was 10GB in size. Deleted that, started MySQL, ran a PURGE LOGS BEFORE '2012-10-01 00:00:00' query to clear out a load of space and reduced the /var/lib/mysql directory size from 346GB to 169GB. Changed back to master and everything is running great again.
From this I've learnt that our log files get VERY large, VERY quickly. So I'll be establishing a maintenance routine to not only keep the log files down, but also to alert me when we're nearing a full partition.
I hope that's some use to someone in the future who stumbles across this with the same problem. Check your drive space! :)
We've been having a very similar problem, where the mysql processlist showed that almost all of our connections were stuck in the "query end" state. Our problem was also related to replication and writing the binlog.
We changed the sync_binlog variable from 1 to 0, which means that instead of flushing binlog changes to disk on each commit, it allows the operating system to decide when to fsync() to the binlog. That entirely resolved the "query end" problem for us.
According to this post from Mats Kindahl, writing to the binlog won't be as much of a problem in the 5.6 release of MySQL.
In my case, it was indicative of maxing out the I/O on disk. I had already reduced fsyncs to a minimum, so it wasn't that. Another symptoms is "log*.tokulog*" files start accumulating because the system can't catch up all the writes.
I had met this problem in my production use case . I was using replace into select in the table for archival and it has 10 lakhs records init . I interrupted the query in the middle and it went to killed state .
i had set the innodb_flush_log_at_trx_commit to 1 as well as sync_binlog=1 and it completely recovered in my case . And i have triggered my archival script again without any issues .

Started optimize table on Magento db, but sites are down. Can I kill it and how?

All my websites are running super slow because I tried to optimize the Magento DB in phpmyadmin in an effort to get the sites to speed UP. They're so slow they might as well be down. I want the pain to stop. Can I kill the optimize and how would I do that? Or is it better just to wait until it's finished.
For the record, I have a backup of the database.
Usually you can KILL any operation you don't want to finish, but be aware that the time to undo the operation may in fact be longer than it takes to simply complete.
Unless you're running a massive installation it sounds like your database needs to be tuned better. MySQL comes with a default my.cnf configuration that is terrible, barely any memory allocated to it, and runs slowly.
Secondly, you might be running a lot of queries that are slow because they're missing indexes. How much data is in this table you're trying to optimize? SHOW TABLE STATUS will give you an idea.

Flush InnoDB cache

I have some reporting queries that are rarely run, which I need to be performant without relying on them being cached anywhere in the system. In testing various schema and sproc changes I'll typically see the first run be very slow and subsequent runs fast, so I know there's some caching going on that's making it cumbersome to test changes. Restarting mysqld or running several other large queries are the only reliable ways to reproduce it. I'm wondering if there's a better way.
The MySQL Query Cache is turned OFF.
Monitoring the disk, I don't see any reads happening except on the first run. I'm not that familiar with disk cache but I would expect if that's where the caching is happening I'd still see disk reads, they'd just be very fast.
MONyog gives me what I think is the definitive proof, which is the InnoDB cache hit ratio. Monitoring it I see that when the query's fast it's hitting the InnoDB buffer, when it's slow it's hitting disk.
On a live system I'll gladly let InnoDB do this, but for development and test purposes I'm interested in worst case scenarios.
I'm using MySQL 5.5 on Windows Server 2008R2
I found a post on the Percona blog that says:
For MySQL Caches you can restart MySQL and this is the only way to clean all of the caches. You can do FLUSH TABLES to clean MySQL table cache (but not Innodb table meta data) or you can do “set global key_buffer_size=0; set global key_buffer_size=DEFAULT” to zero out key buffer but there is no way to clean Innodb Buffer Pool without restart.
In the comments he goes on to say:
Practically everything has caches. To do real profiling you need to profile real query mix which will have each query having appropriate cache/hit ratio not running one query in the loop and assuming results will be fine.
I guess that sums it up. It does make it hard to test individual queries. My case is that I want to try forcing different indices to make sure the query planner is picking the right one, and apparently I'll have to restart MySQL between tests to take the cache out of the equation!

mysql process in loop cannot be stopped

One process in mysql is take over 98% of cpu resources and whenever I kill this process it gets back with new PID i guess it's in loop. Is there a ways to stop this process permanently as it's causing my site to extremely slow down? Any suggestions?
I think this is caused by non-optimized or overly heavy queries. If you kill MySQL and someone restarts the script the load will, logically, go up again. Try seeing what queries are executing.
If mysql is used as a backed of a website you probably need to take a loot at the slow query log to check what query is causing this intensive CPU usage.
Also you probably need some new indexes.