Safely killing a mySQL query that will not respond to KILL PROCESS

Safely killing a mySQL query that will not respond to KILL PROCESS - mysql

Running a regular OPTIMIZE TABLE query on a MYISAM table that usually takes a few minutes has now taken about 3 days. This also happened a week or so ago, but it did complete successfully and I wrongly assumed that the table had been repaired okay, and this time it wouldn't present a problem. Not so...
Last time, the process did not respond to KILL PROCESS and I ended up having to restart mysqld. The error log reported warnings such as:
[Warning] /usr/sbin/mysqld: Forcing close of thread 24974085 user: 'XXXX'
And when mysqld restarted, a number of other tables had been left corrupted and had to be rebuilt. Some of these tables are large, and I want to avoid this happening again. I understood that restarting mysqld should safely close tables, but it didn't seem to.
The web services that use the mySQL server will be put into maintenance mode to prevent new queries hitting the DB. But how can I safely close all open tables given that FLUSH TABLES is being blocked by the OPTIMIZE query?
SHOW OPEN TABLES;
[etc...]
114 rows in set (0.01 sec)
I really don't want 114 tables to have to be repaired. Any advice?
NB: I'm quite aware that the table being optimized will be corrupted and can live with that. I just want to minimize problems with the other tables.
Thanks.
UPDATE:
So FLUSH TABLES accepts a comma-separated list of tables... Maybe it could be as simple as flushing all open tables, but I'm still open to reassurance about this, or any other ideas. Flushing a specific table does remove it from the open tables list so this is looking promising.
http://dev.mysql.com/doc/refman/5.1/en/flush.html

FWIW, this all went very smoothly. Steps followed were something like this. I'm not sure if flushing and closing each table separately was overkill, but no tables crashed and total downtime was about 45 mins, so it was worth it:
Make a full slave backup
Redirect FQDN for the master to a dead IP so all services are down and master receives no new queries
When master is receiving no new queries, flush and close all tables individually
Stop mysql on master
Ensure OPTIMIZE TABLE does not cause problems for slave if it is replicated out
Stop mysql on slave
Copy all blog tables from slave to master
Restart master
Restart slave
Check for any corrupt tables and allow them to be repaired
Map FQDN back to the master internal IP
Sanity check of posts on main blog
Restart replication on slave and masters
Make sure replication progresses correctly (multi-master)
Make a full master backup

Related

Sql databases get corrupt after every backup

I have a problem with my unix server. This started a week ago. One day after a backup (I used to keep 3 backup files) I visited a website on the server but it wouldn't work. I restarted the server and it seemed to be working fine except the mysql service. My attempts to restart it failed. Then I figured that was because the server was full, so I deleted one of the backups, cleaned up some space and the mysql service restarted successfully. Than I figured tables in one of the databases (MYIsam tables) were corrupt. So I repaired them through myisamchk command via ssh and all worked fine. However, the very next day I woke up they were corrupt again (despite mysql was working fine), and this time there was no disk space problem on the server. I repaired them again. The next day the same thing happenned; and this time innodb tables that were part of another database were corrupt as well. I've fixed them too, so now all is working well but I guess the same thing will happen after tonight's backup.
I can't identify the problem and I don't know what logs to look into to understand the problem. Can anyone please help me out? Thanks very much in advance.

No easy answer here. My immediate thought is that the dbase is still busy when the backups commence, possibly corrupting indexes, interferring with caches, etc. Turn on full logging and check for problems when the backup starts happens. Maybe you will find something.
Look for the my.cnf file. On my CentOs it is located in /etc/my.cnf. It will have a config setting for the location of the error log.

My strongest suspect is OOM kill by the kernel or some other issue that results from running the system out of memory. Try this:
Start top on the server and press M to sort by memory so the biggest memory user is at the top.
note the pid of mysqld
manually perform the backup as you observe the value of the RES column in the top output (resident memory size)
once the backup is over see if the pid of mysqld has changed
If the pid has changed (meaning restart took place), and you saw the memory footprint of mysqld take up something comparable to the total amount of system memory, then my suspicion is correct, and we need to lower some settings in my.cnf to make it use less memory, e.g key_buffer_size and innodb_buffer_pool_size.
EDIT - From the log you posted, there are additional issues although it is not clear how they could be contributing to the table corruption. Your server appears to be running with --skip-innodb and your backup script is not able to deal with the absence of InnoDB storage engine printing exception error messages, but nevertheless continuing. It is also attempting to do a repair, which is failing due to the lack of system privileges (error 1 is Operation not permitted). It is possible that encountering those errors triggers some faulty logic in your backup script that leaves the tables corrupted.
At this point I would recommend disabling MySQL backup using the cPanel tool, and using mysqldump or some other solution (e.g. Xtrabackup (https://www.percona.com/doc/percona-xtrabackup/2.3/index.html)) from a cron job instead.
EDIT2 - from the test results. The manual backup does not run the system out of memory and does not crash the server. The jury is still out on the automatic one.

Don't kill mysqld; shut it down gracefully.
Switch from MyISAM to InnoDB; the latter does not suffer from that 'error'.

Making new MYSQL replication

I need to make working mysql replication from master to slave. (tried it once already)
The database is quite large (over 100GB) and it will take some hours to make it ready for new slave.
The database has MyIsam and innoDB engine and both are being written
I think my only choice is to copy the data files from master to a new slave? (or make a database dump which im referring later in the topic of ROUND 2)
Before that I have to run down all the services which uses the database and
make writelock for tables or should i shut down the whole database?
After data directory sync to the new replication server I started it up and the database with the tables was there. First error that I got rid off by changing bin.log to 007324 and position to 0.
Error 1:
140213 4:52:07 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log
140213 4:52:07 [Note] Slave I/O thread exiting, read up to log 'bin-log.007323', position 46774422
After that I got new problems from database and this error came out from every table.
Error 2:
Error 'Incorrect information in file: './database/table.frm'' on query. Default database: 'database'.
Seems that something went wrong.
ROUND 2!
After this scene I started to think that can this be done without long service break.
Master database has been already configured and it works ok to another slave.
So i did some googling and this is what i came up with.
Making read lock to tables:
FLUSH TABLES WITH READ LOCK;
Taking dump:
mysqldump --skip-lock-tables --single-transaction --flush-logs --master-data=2 -A > dbdump.sql
Packaging and moving:
gzip (pigz) the the dbdump and moving it to slave server after that finding the MASTER_LOG_FILE and MASTER_LOG_POS from the dump.
After that i don't think that i want to import the dbdump.sql because its over 100GB and
will take time. So i think SOURCE would be ok option for it.
On SLAVE server:
CREATE DATABASE dbdump;
USE dbdump;
SOURCE dbdump.db;
CHANGE MASTER TO MASTER_HOST='x.x.x.x',MASTER_USER='replication',MASTER_PASSWORD='slavepass',
MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=X;
start slave;
SHOW SLAVE STATUS \G
I haven't tested this yet, am I on to something?
--bp

Realize that issuing a SOURCE command is the same as running an import of the dumped SQL from shell. Either way, it is going to take a long time. Outside of that, you have the steps correct - flush table with read lock on master, make a database dump of master, make sure you note master binlog coordinates, import dump on slave, set binlog coordinates, start replication. Do not work with the raw binaries unless you REALLY know what you are doing (especially for INNODB tables).
If you have a number of large tables (i.e. not just one big one), you could consider parallelizing your dumps/imports by table (or groups of tables) to speed things along. There are actually tools out there to help you do this.
You CAN work with the raw binaries, but it is not for the faint of heart. In the past, I have used rsync to differentially update the raw binaries between master and slave (you still must use flush table with read lock and gather master binlog coordinates before doing this). For MyISAM tables this works pretty well actually. For InnoDB, it can be more tricky. I prefer to use the option to set InnoDB to write index and data files per table. You would need to rsync the ibdata* files. You would delete ib_logfile* files from slave.
This whole thing is a bit of a high wire act, so I would not resort to doing this unless you have no other viable options. Absolutely take a traditional SQL dump before even thinking about attempting a binary file sync, and each time until you are VERY comfortable that you actually know what you are doing.

Replaying Mysql Replication error

We're running a standard Mysql Master/Slave replication system which has been working nicely for a couple of years. So nicely in fact that we were able to recover from a server outage (which rendered the Master offline) by using the Slave DB (after turning it into a Master).
Anyway, the issue that is causing our problem occurred after we recovered the server and returned the original Master to being a Master and the original slave back to being a Slave.
It was caused by me being an idiot - not by anything else!
Basically a write occurred on the Slave - meaning that a write on the same table on the Master cannot be replicated across due to a Duplicated Primary Key error.
As I said, my fault.
I can delete the record from the Slave that was erroneously written there - thus removing the conflicting ID from the Slave but I want the record from the Master DB.
My question is this: I know that I can 'fix' the replication by skipping over the relevant query from the Master (by setting the SQL_SLAVE_SKIP_COUNTER to 1) - but that will mean I lose the record from the Master.
So, is there a way of 'replaying' the errored replication write? Basically pointing the replication to be one query back?
Or do I have to go through the whole rigmarole of dropping my Slave, repopulating it from my last successful backup and set it to be a Slave again?
Many thanks in advance.

If it's a simple modification/update/... why don't you simply play it on the master (manually or taking it from the slave binary log if you have one), with slave's slave process off, then set the new replication pointer (file & position) on the slave (show master status; on master), and restart the slave?

The answer was actually staring me in the face (isn't it always!).
The duplicate Primary key error was caused by me doing an insert on the Slave and an insert to the same table on the master - meaning both new records had the same ID.
Luckily for me, I did not need the insert on the Slave, so I simply deleted it - meaning that there was only one record with the primary key value.
I just needed a way to get the replication to attempt to replicate the record across - and this was easy.
Just restart the Slave!
So:
mysql root:(none)>START SLAVE;
Query OK, 0 rows affected (0.03 sec)
mysql root:(none)> _

MySql Replication - slave lagging behind master

I have a master/slave replication on my MySql DB.
my slave DB was down for a few hours and is back up again (master was up all the time), when issuing show slave status I can see that the slave is X seconds behind the master.
the problem is that the slave dont seem to catch up with the master, the X seconds behind master dont seem to drop...
any ideas on how I can help the slave catch up?

Here is an idea
In order for you to know that MySQL is fully processing the SQL from the relay logs. Try the following:
STOP SLAVE IO_THREAD;
This will stop replication from downloading new entries from the master into its relay logs.
The other thread, known as the SQL thread, will continue processing the SQL statements it downloaded from the master.
When you run SHOW SLAVE STATUS\G, keep your eye on Exec_Master_Log_Pos. Run SHOW SLAVE STATUS\G again. If Exec_Master_Log_Pos does not move after a minute, you can go ahead run START SLAVE IO_THREAD;. This may reduce the number of Seconds_Behind_Master.
Other than that, there is really nothing you can do except to:
Trust Replication
Monitor Seconds_Behind_Master
Monitor Exec_Master_Log_Pos
Run SHOW PROCESSLIST;, take note of the SQL thread to see if it is processing long running queries.
BTW Keep in mind that when you run SHOW PROCESSLIST; with replication running, there should be two DB Connections whose user name is system user. One of those DB Connections will have the current SQL statement being processed by replication. As long as a different SQL statement is visible each time you run SHOW PROCESSLIST;, you can trust mysql is still replicating properly.

What binary log format are you using ? Are you using ROW or STATEMENT ?
SHOW GLOBAL VARIABLES LIKE 'binlog_format';
If you are using ROW as a binlog format make sure that all your tables has Primary or Unique Key:
SELECT t.table_schema,t.table_name,engine
FROM information_schema.tables t
INNER JOIN information_schema .columns c
on t.table_schema=c.table_schema
and t.table_name=c.table_name
and t.table_schema not in ('performance_schema','information_schema','mysql')
GROUP BY t.table_schema,t.table_name
HAVING sum(if(column_key in ('PRI','UNI'), 1,0)) =0;
If you execute e.g. one delete statement on the master to delete 1 million records on a table without a PK or unique key then only one full table scan will take place on the master's side, which is not the case on the slave.
When ROW binlog_format is being used, MySQL writes the rows changes to the binary logs (not as a statement like STATEMENT binlog_format) and that change will be applied on the slave's side row by row, which means a 1 million full table scan will take place on the slave's to reflect only one delete statement on the master and that is causing slave lagging problem.

"seconds behind" isn't a very good tool to find out how much behind the master you really is. What it says is "the query I just executed was executed X seconds ago on the master". That doesn't mean that you will catch up and be right behind the master the next second.
If your slave is normally not lagging behind and the work load on the master is roughly constant you will catch up, but it might take some time, it might even take "forever" if the slave is normally just barely keeping up with the master. Slaves operate on one single thread so it is by design much slower than the master, also if there are some queries that take a while on the master they will block replication while running on the slave.

Just check if you have same time and timezones on both the servers, i.e., Master as well as Slave.

If you are using INNODB tables, check that you have innodb_flush_log_at_trx_commit to a value different that 0 at SLAVE.
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

We had exactly the same issue after setting up our slave from a recent backup.
We had changed the configuration of our slave to be more crash-safe:
sync_binlog = 1
sync_master_info = 1
relay_log_info_repository = TABLE
relay_log_recovery = 1
I think that especially the sync_binlog = 1 causes the problem, as the specs of this slave is not so fast as in the master. This config option forces the slave to store every transaction in the binary lo before they are executed (instead of the default every 10k transactions).
After disabling these config options again to their default values I see that the slave is catching up again.

Just to add the findings in my similar case.
There were few bulk temporary table insert/update/delete were happening in master which occupied most of the space from relay log in slave. And in Mysql 5.5, since being single threaded, CPU was always in 100% and took lot of time to process these records.
All I did was to add these line in mysql cnf file
replicate-ignore-table=<dbname>.<temptablename1>
replicate-ignore-table=<dbname>.<temptablename2>
and everything became smooth again.
Inorder to figure out which tables are taking more space in relay log, try the following command and then open in a text editor. You may get some hints
cd /var/lib/mysql
mysqlbinlog relay-bin.000010 > /root/RelayQueries.txt
less /root/RelayQueries.txt

If u have multiple schema's consider using multi threaded slave replication.This is relatively new feature.
This can be done dynamically without stopping server.Just stop the slave sql thread.
STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;

I have an issue similar to this. and both of my MySQL server hosted on AWS EC2 (master and replication). by increasing EBS disk size (which automatically increased IOPS) for MySQL slave server, its turned out the solution for me. R/W Throughput and bandwidth is increased R/W latency were decreased.
now my MySQL database replication is catching up to the master. and Seconds_Behind_Master was decreased (it was got increased from day to day).
so if you have MySQL hosted on EC2. I suggest you tried to increase EBS disk size or its IOPS on the slave.

I know it's been a while since OP asked but it would have helped me to read the following answer.
In /etc/mysql/mysql.cnf :
[mysql]
disable_log_bin
innodb_flush_log_at_trx_commit=2
innodb_doublewrite = 0
sync_binlog=0
disable_log_bin REALLY carried the trick for me.

MySql - create replication with minimal downtime

I have a ~80GB MySql DB.
I want to create a replication on that DB while having the current DB as master and setting up a slave for it.
My main question is how can i move the data (all 80GB) of it from the master to the new slave with as minimal downtime as possible, preferably none.
my initial thought was to stop the DB (after taking the log position), and then copy the files from the mysqldata lib, and then re start the server but just copying the files would take ~2 hours.
any thoughts?

On July 8, 2011 I addressed a similar question. I wrote scripts that would zap binary logs and starting performing an rsync.
On June 16, 2011, I wrote a post contrasting doing an rsync versus using XtraBackup.
On May 23, 2011, I discussed what considerations to make when doing this kind of backup.
Rather than reinvent the wheel and rewrite in the information I already wrote in those posts, I simply provided the links to my own posts that address this question.
Please read them carefully.
Give it a Try !!!
CAVEAT
The only downtime in my rsync algorithm is when after you have performed multiple rsyncs as specified, you shutdown mysql, perform one more rsync, and then start up mysql.
I would like to clarify the reason for the shutdown:
When you shutdown mysql:
All open MyISAM tables are closed, There is a header that marks how many file handles are open to the MyISAM table. That must be at zero(0) for the table to be OK. Otherwise, a closed MyISAM tables with a nonzero value in this header field marks the table as crashed and in need of a table repair. Shutting down mysql cleans all of that up.
All InnoDB tables that have either data pages or index pages in the Buffer Pool that are marked dirty needs to be flushed to disk. Performing a shutdown triggers a full flush of the Buffer Pool. Naturally, the bigger the pool and the higher the number of dirty pages, the longer the Buffer Pool flush time will be. To shorten this phase of the mysqld's shutdown, run SET GLOBAL innodb_max_dirty_pages_pct = 0; before performing any of the rsyncs. All transactions are completed (either commited or rolled back).

I think you have some misunderstanding.
before it start, you must enable binary log on the master
restart mysql on master
login to master
lock ALL tables from write
record the master binary position
copy binary data from master (DIRECTLY copy *.MYI, *.MYD...etc, you can copy to another location in master database)
after copy is completed, remove write lock
scp data to slave (depends on the network distance)
setup relevant master information into slave (binary log position, and remember to disable binary log)
start slave
After that, it should have huge delay on slave,
and slave will try to catch up with master automatically,
once it catch up, your slave is ready!
So, the down-time is only when you locking table and copy the binary data into another location in your master database.
docs:- http://dev.mysql.com/doc/refman/5.1/en/replication-howto.html

I've found the following tool to be of GREAT help and efficiency. The author currently works for facebook and used to work for dema in japan.
It's quite easy to set-up and you will reach 4 9's HA. ;-)
MHA tool for MySQL replication high availability
I have to say though that MySQL cluster is better, lol ;-)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008