Sql databases get corrupt after every backup - mysql

I have a problem with my unix server. This started a week ago. One day after a backup (I used to keep 3 backup files) I visited a website on the server but it wouldn't work. I restarted the server and it seemed to be working fine except the mysql service. My attempts to restart it failed. Then I figured that was because the server was full, so I deleted one of the backups, cleaned up some space and the mysql service restarted successfully. Than I figured tables in one of the databases (MYIsam tables) were corrupt. So I repaired them through myisamchk command via ssh and all worked fine. However, the very next day I woke up they were corrupt again (despite mysql was working fine), and this time there was no disk space problem on the server. I repaired them again. The next day the same thing happenned; and this time innodb tables that were part of another database were corrupt as well. I've fixed them too, so now all is working well but I guess the same thing will happen after tonight's backup.
I can't identify the problem and I don't know what logs to look into to understand the problem. Can anyone please help me out? Thanks very much in advance.

No easy answer here. My immediate thought is that the dbase is still busy when the backups commence, possibly corrupting indexes, interferring with caches, etc. Turn on full logging and check for problems when the backup starts happens. Maybe you will find something.
Look for the my.cnf file. On my CentOs it is located in /etc/my.cnf. It will have a config setting for the location of the error log.

My strongest suspect is OOM kill by the kernel or some other issue that results from running the system out of memory. Try this:
Start top on the server and press M to sort by memory so the biggest memory user is at the top.
note the pid of mysqld
manually perform the backup as you observe the value of the RES column in the top output (resident memory size)
once the backup is over see if the pid of mysqld has changed
If the pid has changed (meaning restart took place), and you saw the memory footprint of mysqld take up something comparable to the total amount of system memory, then my suspicion is correct, and we need to lower some settings in my.cnf to make it use less memory, e.g key_buffer_size and innodb_buffer_pool_size.
EDIT - From the log you posted, there are additional issues although it is not clear how they could be contributing to the table corruption. Your server appears to be running with --skip-innodb and your backup script is not able to deal with the absence of InnoDB storage engine printing exception error messages, but nevertheless continuing. It is also attempting to do a repair, which is failing due to the lack of system privileges (error 1 is Operation not permitted). It is possible that encountering those errors triggers some faulty logic in your backup script that leaves the tables corrupted.
At this point I would recommend disabling MySQL backup using the cPanel tool, and using mysqldump or some other solution (e.g. Xtrabackup (https://www.percona.com/doc/percona-xtrabackup/2.3/index.html)) from a cron job instead.
EDIT2 - from the test results. The manual backup does not run the system out of memory and does not crash the server. The jury is still out on the automatic one.

Don't kill mysqld; shut it down gracefully.
Switch from MyISAM to InnoDB; the latter does not suffer from that 'error'.

Related

MySQL, recover database after mysqlbinlog loading error?

I am copying a MySQL DB from an initial dump file and the set of binlogs created after the dump.
The initial load from the dump is fine. Then, while loading the binlogs using mysqlbinlog, what happens is that one of the files will fail, for example, with a "server has gone away" error.
Is there any way to recover from a failed mysqlbinlog run, or is the database copy now irreparably corrupted? I know which log has failed, but I can't just rerun that log since the error could have occurred at any query within the log.
Is there a way to handle this moving forward?
I can look into minimizing the chances that there will be an error in the first place, but it doesn't seem like much of a recovery process (or master/slave process) if any MySQL issue during the loading completely ruins the database. I feel that I must be missing something.
I'd check the configuration value for max_allowed_packet. This is pretty small by default (4MB or 64MB depending on MySQL version). You might need to increase it.
Note that you need to increase that option both in the server and in the client that is applying binlogs. The effective limit on packet size is the lesser of the server and client's configuration value.
Even if the binlog succeeded through replication, it might not succeed when replaying binlogs, because you need to replay with mysql while specifying the --max-allowed-packet option.
See https://dev.mysql.com/doc/refman/8.0/en/gone-away.html for more explanation of the error you got.
If you don't know the binlog coordinates of the last binlog event that succeeded, you'll have to start over: remove the partially-restored instance and restore from the backup again, then apply the binlog.

Ran out of disk space: will my live application mysql database be corrupted?

I have a large Drupal 7 site with many concurrent logged-in users running on a server where disk space recently ran out (MySQL InnoDB behind Memcached on Ubuntu 16.04). How/why this happened is a discussion for another day.
I've cleared up the disk space issue, and the site seems to be running fine as far as general interaction indicates, but Drupal log is full of errors like this:
Uncaught exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 3 Error writing file '/tmp/MYGWmIvU' (Errcode: 28 - No space left on device)' in /var/www/pixelscrapper.com/public_html/includes/database/database.inc:2229
My question now is: will the mysql database that is running Drupal likely be corrupted/flaky at this point? i.e. On a scale of 0-10, how vital is it that I restore the database to a point before the disk space ran out?
(Anything other than 0 means I will likely restore the database--but there are other things that went wrong as well here, which means that we would lose quite a few days of data if I need to restore, which is a huge drag. C'est la vie, etc.).
My assumption is that the data in MySQL may be more or less fine, but that I cannot rely on the integrity of the actual Drupal data (users, nodes, etc.) which are made up of collections of many database rows...
Out of space crashes can cause serious damages to databases. The sole fact that your database is able to startup and to run apparently as usual is already a good indication that it was not totally messed up by the incident.
Next thing you can do is to perform an in-depth scan using the sqlcheck program :
The mysqlcheck client performs table maintenance: It checks, repairs, optimizes, or analyzes tables.
Analyze all tables in all databases :
$ mysqlcheck -A
Analyze and repair all tables in all databases :
$ mysqlcheck -A -r
NB : as explained in the documentation, you would better shutdown your application and make a backup before you run this.

AWS RDS - automatic backup vs snapshot with MyISAM tables

I have an AWS RDS MySQL 5.7 database with MyISAM tables that I would like to migrate to another RDS in a custom VPC, and once migrated, convert those MyISAM tables to InnoDB.
If I undertood correctly, the only way to create a correct automatic backup is using the following procedure explained here: "Automated Backups with Unsupported MySQL Storage Engines"
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html#Overview.BackupDeviceRestrictions
Stop all activity to your MyISAM tables (that is, close all sessions).
You can close all sessions by calling the mysql.rds_kill command for each process that is returned from the SHOW FULL PROCESSLIST command.
Lock and flush each of your MyISAM tables
Create a snapshot of your DB instance. When the snapshot has completed, release the locks and resume activity on the MyISAM tables
Has someone done this procedure before?
How is that the snapshots are being created successfully every night from the current RDS DBInstance, even though it contains MyISAM tables?
Thanks!
The problem isn't with snapshot creation. It's what can go wrong when you actually try to use one of the snapshots.
RDS snapshots work by capturing a snapshot if your RDS instance's underlying EBS volume (you can't see this volume, but it's there -- RDS runs on EC2, with "hidden" instances and volumes).
EBS snapshots capture the entire contents of the hard drive exactly as they happened to exist at the moment in time when the snapshot process starts.
What ends up on the snapshot is essentially the same thing that you would have on a MySQL Server if you executed sudo killall -9 mysqld -- it is as if the server had halted everything, immediately, without doing any of the things it normally does to clean up for a graceful shutdown. With RDS, things are not quite that dramatic, because RDS does take some precautions, but fundamentally, this is the nature of what is happening.
When you create an RDS instance from a snapshot, the first thing that happens when the instance starts up is the same thing your hypothetical server would do when you restarted the killed MySQL Server daemon: InnoDB Crash Recovery.
InnoDB Crash Recovery
To recover from a MySQL server crash, the only requirement is to restart the MySQL server. InnoDB automatically checks the logs and performs a roll-forward of the database to the present. InnoDB automatically rolls back uncommitted transactions that were present at the time of the crash.
https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery.html#innodb-crash-recovery
Crash recovery is InnoDB's mechanism for bringing everything back into harmony in it internal data structures and ensure that all data is intact, exactly as your application left it. It's possible because InnoDB is a transactional storage engine. That means a lot of different things, but what it specifically means in this case is that InnoDB doesn't just change table data when you change a table. It goes through a process that can be simplified something like this:
store the proposed changes to disk¹
actually make the changes
mark the changes as complete
What this means is that until the changes are finalized, InnoDB can be interrupted and will subsequently be able to pick up where it left off, without corrupting or losing data.
MyISAM has no such mechanisms. It just writes to the data files, directly. Even if a MyISAM table isn't actively being used, it may still need to be repaired when the server comes up, to clean up its structures. In some circumstances, repairing the table can be impossible, and all or part of the data in the table will be lost.
If your MyISAM tables are flushed and locked when the snapshot occurs, they are in a quiescent state on the disk, as though the server had actually been gracefully shut down before the snapshot had occurred, so they will be stable on the snapshot.
But the snapshot process will always appear to succeed, because the snapshot is just duplicating whatever is on the disk, as it appears at the moment in time when the snapshot gets underway.
The problem is that what the snapshot captured may not be usable, and you have no way of knowing whether the snapshot will be fully viable.
¹ Note that the first step, "store the proposed changes to disk" is related to the system variable innodb_flush_log_at_trx_commit which makes the system slower if set to 1 but also is the safest setting, because your query doesn't actually succeed until that first step is done. A setting of 2 is still reasonably safe, because it still writes the changes but continues without requiring that the operating system confirm that they have actually been written to the hard drive before your query returns success... but in a crash, a transaction your application thinks was committed may or may not have survived.

What is MySQL doing?? 100% disk utilization from boot

I have a large database on a Win10 machine, mysqld.exe does a lot of disk I/O, 100%, for hours and hours 100MB/s consistently - mostly writes - persists after numerous reboots. How can I find out what the hell it is actually doing, and stop it? I know the database is not being used at the moment, I want to figure out where this I/O comes from and stop it. The only solutions I found on the internet were general configuration advice, I don't need that, I need to shut this thing down now!
show processlist shows nothing.
UPDATE: The problem was a huge background rollback operation on a table. The solution is:
1) kill mysqld.exe
2) add innodb_force_recovery=3 to my.ini
3) start mysqld.exe
4) export the table (96GB table resulted in about 40GB .sql file)
5) drop the table
6) kill mysqld.exe
7) set innodb_force_recovery=0 to my.ini
8) reboot and import the table back
No idea about data integrity yet, but seems fine.
Thanks to Milney.
If you view the Disk tab of resource monitor from Task Manager you can see which files are being written, this will hint you as to which Database it is;
You can then use something like SELECT * FROM information_schema.innodb_trx\G to view open Transactions and see which statements are causing this
simply Increase InnoDB Buffer Pool Size if default 8MB just increase to 512MB
SET GLOBAL Innodb_buffer_pool_size = 5168709120

MySQL backup strategy for high-traffic sites

I'm currently using mysqldump to back up databases that are growing rapidly in size. Though I run it late at night, there have been occasional problems when it happens to run during a moment of high traffic (which happens at night sometimes). For example, last night one of my sites locked up just after the time of the database backup with a completely full (and non-clearing) processlist.
Does anyone have a suggestion for a better way to approach this? Putting the site in a temporary maintenance state during backup is not an option as the goal is to maximize availability (some sql dumps take awhile). One idea that comes to mind is to run both master and slave copies and shut down + back up the slave copy, leaving the master copy alone during the process. Hopefully there is a simpler solution though - I'd rather not run a slave copy for backup purposes only unless absolutely necessary. Any suggestions?
Thanks.
Two thoughts:
run the slave. If nothing else, it gives you a warm spare for your production traffic in case of failure. You can also run reports and tools from it, freeing up cycles from your production server.
get to innodb and use mysqldump --single-transaction (see man page)
Good luck!
I use Percona Xtrabackup, which is similar to InnoDB Hot Backup with more functionality and is distributed for free. Xtrabackup takes snapshots without locking innodb tables and will record the current master logfile info and, if requested, the slave info if you are taking a backup from a slave.
I would recommend running a slave and doing a backup like this or with mysqldump. The slave gives you a hot backup that you can quickly switch over to and be up and running within minutes if your master blows up due to a hardware issue or various software or user error issues that take out the server. The backup with xtrabackup or mysqldump gives you a backup that you can use to restore data in case you accidentally drop a table or delete some rows you shouldn't have, since the replicated server wouldn't save you there.