Major causes of mysql/Innodb crash and restart

Major causes of mysql/Innodb crash and restart - mysql

After a recent DB crash with no RCA from our vendor. I'm left want to expand my knowledge on what sort of things can cause a database to crash.
In our specific case the logs show record index mis-matches just before the crash. We believe this was due to copying .frm and .idb files to another db rather than using mysqldump. There are logged warnings about that for about a week prior to the crash, starting when they were copied. But would it really take so long for the DB to crash?

In my experience, most crashes are due to hardware errors, i.e. your disk is failing. Second most common is user error, like moving InnoDB tablespaces around like they're normal files (you already know, don't do that). Third is bugs in MySQL, because all software has bugs.
It's certainly possible to take an indefinite amount of time before user activity accesses the code path that causes a crash. Can't make any conclusion from that.
Ultimately, you have to create redundancy to protect against crashes. This is especially important for databases. Examples of redundancy:
Use RAID 1 or RAID 10 to do disk mirroring
Use replication to copy data another MySQL instance continuously
Host the other MySQL instance on another physical computer, best if it's on a separate rack in your data center
You might even have another replica in another region of the country

Related

How to reduce the startup time for MySQL 5.7 with many databases?

I have MySQL 5.7.24 running on a Windows VM. It has a few thousand databases (7000). I understand this is not the recommended set up for MySQL but some business requirements have necessitated this multi-tenant db structure and I cannot change that unfortunately.
The server works fine when it is running but the startup time can get pretty long, almost 20-30 mins after a clean shutdown of the MySQL service and 1+ hours after a restart of the Windows VM.
Is there any way to reduce the startup time?
In my configuration, I observed that innodb_file_per_table = ON (which is the default for MySQL 5.7 I believe) and so I think that at startup it is scanning every .ibd file.
Would changing innodb_file_per_table = OFF and then altering each table to get rid of the .ibd files be a viable option. One thing to note is that in general, every database size is pretty small and even with 7000 databases, the total size of the data is about 60gb only. So to my understanding, innodb_file_per_table = ON is more beneficial when there are single tables that can get pretty large which is not the case for my server.
Question: Is my logic reasonable and could this innodb_file_per_table be the reason for the slow startup? Or is there some other config variable that I can change so that each .ibd file is not scanned before the server starts accepting connections.
Any help to guide me in the right direction would be much appreciated. Thanks in advance!

You should upgrade to MySQL 8.0.
I was working on a system with the same problem as yours. In our case, we had about 1500 schemas per MySQL instance, and a little over 100 tables per schema. So it was about 160,000+ tables per instance. It caused lots of problems trying to use innodb_file_per_table, because the mysqld process couldn't work with that many open file descriptors efficiently. The only way to make the system work was to abandon file-per-table, and move all the tables into the central tablespace.
But that causes a different problem. Tablespaces never shrink, they only grow. The only way to shrink a tablespace is to move the tables to another tablespace, and drop the big one.
One day one of the developers added some code that used a table like a log, inserting a vast number of rows very rapidly. I got him to stop logging that data, but by then it was too late. MySQL's central tablespace had expanded to 95% of the size of the database storage, leaving too little space for binlogs and other files. And I could never shrink it without incurring downtime for our business.
I asked him, "Why were you writing to that table so much? What are you doing with the data you're storing?" He shrugged and said casually, "I dunno, I thought the data might be interesting sometime, but I had no specific use for them." I felt like strangling him.
The point of this story is that one naïve developer can cause a lot of inconvenience if you disable innodb_file_per_table.
When MySQL 8.0 was being planned, the MySQL Product Manager solicited ideas for scalability criteria. I told him about the need to support instances with a lot of tables, like 160k or more. MySQL 8.0 included an all-new implementation of internal code for handling metadata about tables, and he asked the engineers to test the scalability with up to 1 million tables (with file-per-table enabled).
So the best solution to your problem is not to turn off innodb_file_per_table. That will just lead to another kind of crisis. The best solution is to upgrade to 8.0.
Re your comment:
As far as I know, InnoDB does not open tables at startup time. It opens tables when they are first queried.
Make sure you have table_open_cache and innodb_open_files tuned for your scale. Here is some reading:
https://dev.mysql.com/doc/refman/5.7/en/table-cache.html
https://www.percona.com/blog/2009/11/18/how-innodb_open_files-affects-performance/
https://www.percona.com/blog/2018/11/28/what-happens-if-you-set-innodb_open_files-higher-than-open_files_limit/
https://www.percona.com/blog/2017/10/01/one-million-tables-mysql-8-0/
I hope you are using an SSD for storage, not a spinning disk. This makes a huge difference when doing a lot of small I/O operations. SSD storage devices have been a standard recommendation for database servers for about 10 years.
Also this probably doesn't help you but I gave up on using Windows around 2007. Not as a server nor a desktop.

Restore truncated table from online server

I accidentally truncated my table from online server and I wasn't able to back up it. Please anyone help me on what should I do.

Most viable, least work:
From a backup
Check again if you have one
Ask your hoster if they do backups; their default configuration for some setups might include a backup that you are unaware of, e.g. a database backup for wordpress or a file backup if you have a vm
Viable in some situations, little work if applicable:
From binary logs. Check if they are enabled (maybe as part of your hosters default configuration, also maybe only the hoster can access them, so you may need to ask them). They contain the most recent changes to your database, and, if you are lucky, "recent" might be long enough to include everything
Less viable, more work:
Try to recover from related data, e.g. history tables, other related tables or log files (e.g. the mysql general query log or log files that your application created); you can try to analyze them to figure out what should be in your table
Least viable, most work, most expensive:
In theory, since the data is still stored on the harddrive until it is overwritten by new data, you can try to recover the data, similar to tools that find lost blocks or deleted files on your harddrive
You need to stop any activity on your harddrive to increase probability of success. This will depend on your configuration and setup. E.g., in shared hosting, freed diskspace might be overwritten by other users beyond you control, on the other hand, if you are using innodb and disabled innodb_file_per_table, the data is stored in a single file (and the disk space is not freed), so stopping your mysql server should prevent any remaining recoverable data from being overwritten.
While there are some tools to help you with that, you will likely have to pay someone to do it for you (and even then you only get back the data that hasn't been overwritten so far), so this option is most likely only viable if your data is very valuable

Innodb table is corrupt when doing mysqlcheck but mysql server does not crash now or upon restart

I have a database that has InnoDB tables in it, and one of the InnoDB tables is marked as corrupt (and I know data is missing, etc.). However, when I restart MySQL, it doesn't crash.
I expected it to crash, however it doesn't . ( I read before that if innodb table is corrupted, mysql server will be stopped )
should it not be crashing now?
innodb is my default db engine.

A corrupt table doesn't necessarily cause a crash. You ought to repair the table and, if possible, reload the table from a backup, though. Operation on a corrupt table is flaky at best, and anyway it's wont to not give you the correct results, as you have already discovered.
Do not trust the fact that the system is not "exploding" -- a database has several intermediate states. The one you're in now could well be "I'm not exploding yet, I'm waiting for the corruption to spread and contaminate other tables' data". If you know the table is corrupt, act now.
About repairing InnoDB tables, see How do I repair an InnoDB table? .
To verify if an InnoDB table is corrupt, see https://dba.stackexchange.com/questions/6191/how-do-you-identify-innodb-table-corruption .
Detecting corruption
To do this you need an acceptance test that will examine a bunch of data and give it a clean bill of health -- or not. Exporting a table to SQL and seeing whether it's possible, and/or running checks on tuple cardinality and/or relations and... you get my drift.
On a table where no one is expected to write, so that any modification equals to a corruption, a MD5 of the disk file could be quicker.
To make things more efficient (e.g. in production systems) you can think file snapshots, or database replication, or even High Availability. These methods will detect programmatic corruption (e.g. a rogue UPDATE), but may not detect some kinds of hardware corruption on the master (giving a false negative: the checks on the slave pan out, and the data is still corrupt on the master) or may suffer mishaps in the slave (which fails and raises a false positive, since the data on the master is actually untainted).
It is important (and efficient) to monitor system vital statistics, both to catch the first symptoms of an impending failure (e.g. with SMART) and to supply data for forensic investigation ("Funny that every time the DB failed it was always shortly after a sudden peak in system load -- what if we ferreted out what caused that?").
And of course rely on full and adequate backups (and run a test restore every now and then. Been there, done that, got my ass handed to me).
Corruption causes [not related to original question]
Corruption source varies with the software setup. In general, of course, something must intrude in the server memory representation-writer process-OS handle-journaling-IOSS-OS cache-disk-disk cache-internal file layout chain and wreak havoc.
Improper system shutdown may mess at several levels, preventing data from being written at any stage of the pipeline.
Manhandling the files on disk messes with the very last stage (using a pipeline of its own, of which the server knows nothing).
Other more esotheric possibilities exist:
subtle firmware/hardware failure in the hard disk itself,
accidental and probably unrecoverable, due to disk wear and tear or defective firmware or even a defective firmware update (I seem to remember some years back, a Hitachi update for acoustic management that could be run against a slightly different model. After the update the disk "thought" it had more cache than it actually had, and writes to the nonexistent areas of the cache of course went directly to bit heaven).
"intentional" and probably recoverable: it is sometimes possible to stretch your hard disk too thin using hdparm. Setting the disk for the very top performance is all well and good if every component is suited to that level of performance and knows it or at least is able to signal if it is not. Sometimes all the "warning" you get is a system malfunction.
process space or IOSS corruption: saw this on a Apache installation where somehow, probably thanks to a CGI that was suid root, the access.log file was filling with a stream of GIF images supposed to go to the user's browser. Fixed and nothing happened, but if it had been a more vital file instead of a log...? Such problems may be difficult to diagnose, and you might need to inspect all log files to see whether some application noticed or did anything untoward.
hard disk sector relocation: fabled to happen, never seen it myself, but modern hard disks have "spare" sectors they will swap for defective sectors to keep sporting a "zero defect" surface. Except that if the defective sector happens to no longer be readable and is swapped for an empty one, the net effect is the same as that sector suddenly being zeroed. This you can easily check using SMART reporting (hddhealth or smartctl).
Many more other possibilities exist, of course, depending on setup. Googling for file corruption finds a jillion pages; useful terms to add to the query are filesystem (ext4, NTFS, brtfs, ...), hard disk make and model, OS, software suffering problems, other software installed.

How long should a 20GB restore take in MySQL? (A.k.a. Is something broken?)

I'm trying to build a dev copy of a production MySQL database by loading one of the backups. How long should it take to do this if the uncompressed dump is ~20G?
This command has been running for something like 24h with 10% CPU load and I'm wondering if it's just slow or if it/I am doing something wrong.
mysql -u root -p < it_mysql_dump.sql
BTW it's on a beefy desktop dev machine with plenty of ram, but it might be reading and writing the same HDD. I think I'm using InnoDB.

Restoring MySQL dumps can take a long time. This is because it does really rebuild the entire tables.
Exactly what you need to do to fix it depends on the engine, but in general
I would say, do the following:
Zeroth rule: Only use a 64-bit OS.
Make sure that you have enough physical ram to fit the biggest single table into memory; include any overhead for the OS in this calculation (NB: On operating systems that use 4k pages i.e. all of them, the page tables take up a lot of memory themselves on large-memory systems - don't forget this)
Tune the innodb_buffer_pool such that it is bigger than the largest single table; or if using MyISAM, tune the key_buffer so that it is big enough to hold the indexes of the largest table.
Be patient.
Now, if you are still finding that it is slow having done the above, it may be that your particular database has a very tricky structure to restore.
Personally I've managed to rebuild a server with ~ 2Tb in < 48 hours, but that was a particular case.
Be sure that your development system has production-grade hardware if you intend to load production data into it.
In particular, if you think that you can bulk-load data into tables which don't fit into memory (or at least, mostly into memory), forget it.
If this all seems like too much, remember that you can just use a filesystem or LVM snapshot online with InnoDB, and then just copy the files. With MyISAM it's a bit trickier but can still be done.

Open another terminal, run mysql, and count the rows in some of the tables in your dump (SELECT COUNT(*) FROM table). Compare to the source database. That'll tell you the progress.
I INSERTed about 80GB of data into MySQL over a network in about 14 hours. They were one-insert-per-row dumps (slow) with a good bit of overhead, inserting on a server with fast disks.
24 hours is possible if the hardware is old enough, or your import is competing with something else for disk IO and memory.

I just went through the experience of restoring a 51.8 Gb database from a 36.8 Gb mysqldump file to create an imdb database. For me the restore which was not done over the network but was done from a file on the native machine took a little under 4 hours.
The machine is a Quad Core Server running Windows Server 2008. People have wondered if there is a way to monitor progress. There actually is. You can watch the restore create the database files by going to the Program Data directory and finding the MYSQL subdirectory and then finding the subdirectory with your database name.
The files are gradually built in the directory and you can watch them build up. No small comfort when you have a production issue and you are wondering if the restore job is hung up or just taking a long time.

How many databases can MySQL handle?

My MySql server currently has 235 databases. Should I worry?
They all have same structure with MyISAM tables.
The hardware is a virtual machine with 2 GB RAM running on a Quad-Core AMD Opteron 2.2GHz.
Recently cPanel sent me an email saying that MySql has failed and a restart has been made.
New databases are being expected to be created and I wonder if I should add more memory or if I should simply add another virtual machine.

The "databases" in mysql are really catalogues, is has no effect on its limits whether you put all the tables in one or each in its own.
The main problem is the table cache. Without tuning it, you're going to have the default table cache (=64 typically), which means you will be closing a table every time you open one. This is incredibly bad.
Except in MyISAM, it's even worse, because closing a table throws its key blocks out of the key cache, which means subsequent index lookups or scans will be reading actual blocks from disc, which is horrible and slow and really needs to be avoided.
My advice is:
If possible, immediately increase the table cache to > the total number of tables
Monitor the global status variable Opened_Tables in your monitoring; if it increases rapidly, this is bad.
Carry out performance and robustness testing on your the same hardware in a non-production environment (if you are not doing so already).

(reposting my comment for better visibility)
Thank you all for your comments. The system is something similar with Google Analytics. Users website's visits are being logged into a "master" table. A native application is monitoring the master table and processes the registered visits and writes them to users' database. Each user has its own DB. This has been decided for sharding. Various reports and statistics are being run for each user. And it is faster if it only runs on specific DB (less data) I know this is not the best setup. But we have to deal with it for a while.

I dont believe there is a hard limit, the only thing that's really limiting you will be your hardware and the traffic these databases will be getting.
You seem to have very little memory, which probably means you dont have massive numbers of connections...
You should start by profiling usage for each database (or set of databases, depending on how they are used of course).
My suggestion - MySQL (or any database server for that matter) could use more memory. You can never have enough.

You are doing it wrong.
Comment with some specifics about your databases, and we can probably fill you in on where your design went wrong.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008