How long should a 20GB restore take in MySQL? (A.k.a. Is something broken?) - mysql

I'm trying to build a dev copy of a production MySQL database by loading one of the backups. How long should it take to do this if the uncompressed dump is ~20G?
This command has been running for something like 24h with 10% CPU load and I'm wondering if it's just slow or if it/I am doing something wrong.
mysql -u root -p < it_mysql_dump.sql
BTW it's on a beefy desktop dev machine with plenty of ram, but it might be reading and writing the same HDD. I think I'm using InnoDB.

Restoring MySQL dumps can take a long time. This is because it does really rebuild the entire tables.
Exactly what you need to do to fix it depends on the engine, but in general
I would say, do the following:
Zeroth rule: Only use a 64-bit OS.
Make sure that you have enough physical ram to fit the biggest single table into memory; include any overhead for the OS in this calculation (NB: On operating systems that use 4k pages i.e. all of them, the page tables take up a lot of memory themselves on large-memory systems - don't forget this)
Tune the innodb_buffer_pool such that it is bigger than the largest single table; or if using MyISAM, tune the key_buffer so that it is big enough to hold the indexes of the largest table.
Be patient.
Now, if you are still finding that it is slow having done the above, it may be that your particular database has a very tricky structure to restore.
Personally I've managed to rebuild a server with ~ 2Tb in < 48 hours, but that was a particular case.
Be sure that your development system has production-grade hardware if you intend to load production data into it.
In particular, if you think that you can bulk-load data into tables which don't fit into memory (or at least, mostly into memory), forget it.
If this all seems like too much, remember that you can just use a filesystem or LVM snapshot online with InnoDB, and then just copy the files. With MyISAM it's a bit trickier but can still be done.

Open another terminal, run mysql, and count the rows in some of the tables in your dump (SELECT COUNT(*) FROM table). Compare to the source database. That'll tell you the progress.
I INSERTed about 80GB of data into MySQL over a network in about 14 hours. They were one-insert-per-row dumps (slow) with a good bit of overhead, inserting on a server with fast disks.
24 hours is possible if the hardware is old enough, or your import is competing with something else for disk IO and memory.

I just went through the experience of restoring a 51.8 Gb database from a 36.8 Gb mysqldump file to create an imdb database. For me the restore which was not done over the network but was done from a file on the native machine took a little under 4 hours.
The machine is a Quad Core Server running Windows Server 2008. People have wondered if there is a way to monitor progress. There actually is. You can watch the restore create the database files by going to the Program Data directory and finding the MYSQL subdirectory and then finding the subdirectory with your database name.
The files are gradually built in the directory and you can watch them build up. No small comfort when you have a production issue and you are wondering if the restore job is hung up or just taking a long time.

Related

How to reduce the startup time for MySQL 5.7 with many databases?

I have MySQL 5.7.24 running on a Windows VM. It has a few thousand databases (7000). I understand this is not the recommended set up for MySQL but some business requirements have necessitated this multi-tenant db structure and I cannot change that unfortunately.
The server works fine when it is running but the startup time can get pretty long, almost 20-30 mins after a clean shutdown of the MySQL service and 1+ hours after a restart of the Windows VM.
Is there any way to reduce the startup time?
In my configuration, I observed that innodb_file_per_table = ON (which is the default for MySQL 5.7 I believe) and so I think that at startup it is scanning every .ibd file.
Would changing innodb_file_per_table = OFF and then altering each table to get rid of the .ibd files be a viable option. One thing to note is that in general, every database size is pretty small and even with 7000 databases, the total size of the data is about 60gb only. So to my understanding, innodb_file_per_table = ON is more beneficial when there are single tables that can get pretty large which is not the case for my server.
Question: Is my logic reasonable and could this innodb_file_per_table be the reason for the slow startup? Or is there some other config variable that I can change so that each .ibd file is not scanned before the server starts accepting connections.
Any help to guide me in the right direction would be much appreciated. Thanks in advance!
You should upgrade to MySQL 8.0.
I was working on a system with the same problem as yours. In our case, we had about 1500 schemas per MySQL instance, and a little over 100 tables per schema. So it was about 160,000+ tables per instance. It caused lots of problems trying to use innodb_file_per_table, because the mysqld process couldn't work with that many open file descriptors efficiently. The only way to make the system work was to abandon file-per-table, and move all the tables into the central tablespace.
But that causes a different problem. Tablespaces never shrink, they only grow. The only way to shrink a tablespace is to move the tables to another tablespace, and drop the big one.
One day one of the developers added some code that used a table like a log, inserting a vast number of rows very rapidly. I got him to stop logging that data, but by then it was too late. MySQL's central tablespace had expanded to 95% of the size of the database storage, leaving too little space for binlogs and other files. And I could never shrink it without incurring downtime for our business.
I asked him, "Why were you writing to that table so much? What are you doing with the data you're storing?" He shrugged and said casually, "I dunno, I thought the data might be interesting sometime, but I had no specific use for them." I felt like strangling him.
The point of this story is that one naïve developer can cause a lot of inconvenience if you disable innodb_file_per_table.
When MySQL 8.0 was being planned, the MySQL Product Manager solicited ideas for scalability criteria. I told him about the need to support instances with a lot of tables, like 160k or more. MySQL 8.0 included an all-new implementation of internal code for handling metadata about tables, and he asked the engineers to test the scalability with up to 1 million tables (with file-per-table enabled).
So the best solution to your problem is not to turn off innodb_file_per_table. That will just lead to another kind of crisis. The best solution is to upgrade to 8.0.
Re your comment:
As far as I know, InnoDB does not open tables at startup time. It opens tables when they are first queried.
Make sure you have table_open_cache and innodb_open_files tuned for your scale. Here is some reading:
https://dev.mysql.com/doc/refman/5.7/en/table-cache.html
https://www.percona.com/blog/2009/11/18/how-innodb_open_files-affects-performance/
https://www.percona.com/blog/2018/11/28/what-happens-if-you-set-innodb_open_files-higher-than-open_files_limit/
https://www.percona.com/blog/2017/10/01/one-million-tables-mysql-8-0/
I hope you are using an SSD for storage, not a spinning disk. This makes a huge difference when doing a lot of small I/O operations. SSD storage devices have been a standard recommendation for database servers for about 10 years.
Also this probably doesn't help you but I gave up on using Windows around 2007. Not as a server nor a desktop.

Seeking a Faster way to update MySql database

I have a database with data that is read-only as far as the application using it is concerned. However, different groups of tables within the database need to be refreshed weekly, or monthly (the organization generating the data provides entirely new flat files for this purpose). Most of the updates are small but some are large (more than 5 million rows with a large number of fields). I like to load the data to a test database and then just replace the entire database in production. So far I have been doing this by exporting the data using mysqldump and then importing it into production. The problem is that the import takes 6-8 hours and the system is unusable during that time.
I would like to get the downtime as short as possible. I’ve tried all the tips I could find to speed mysqldump, such as those listed here: http://vitobotta.com/smarter-faster-backups-restores-mysql-databases-with-mysqldump/#sthash.cQ521QnX.hpG7WdMH.dpbs. I know that many people recommend Percona’s XtraBackup, but unfortunately I’m on a Windows 8 Server and Perconia does not run on Windows. Other fast backups/restore options are too expensive (e.g., MySql Enterprise). Since my test server and production server are both 64 bit Windows machines and are both running the same version of MySql,(5.6) I thought I could just zip up the database files and copy them over to swap out the whole database at once (all are innodb). However, that didn’t work. I saw the tables in MySql Workbench, but couldn’t access them.
I’d like to know if copying the database files is a viable option (I may have done it wrong) and if it is not, then what low cost options are available to reduce my downtime?

mysql windows 2003 defrag harddisk

We have a windows 2003 server with MySQL 5.1 installed. Since a few days the mysqldump is almost 1 hour slower than normal. The total size of the databases was growing with approx. 1 MB per day. Not much differences.
Our technical support guy can't find strange things on our machine. He says that it would be a good idea to defrag our server with MySQL on it. I never thought about defragmentation but I have Defraggler installed. After anaylizing it says 80% fragmentation!
Could that be the issue / reason why the dump's are getting slower with almost 1 hour ?
My first idea was that some windows update may be the reason.
Please sent me your thoughts
And is it wise to do a defragmentation on a MySQL production server??
Defraggler says deframentation could take more than 1 day ?
Your database dump is probably disk-bound, meaning that the is limited by accessing data on the disk.
If the data on the disk is highly fragmented, it can significantly affect the performance. Depending on the layout of the data on the disk, MySQL may be filling in lots of small gaps in the disk with the new data, which will be much slower to work with than large contiguous blocks of data.
Regular defragmentation is a good idea. However, since MySQL likely has a lock on the data files for your databases, it may be necessary to shut down MySQL to defragment its files.
I suggest at least running defraggler without shutting down MySQL to defragment as much as possible. I'm not familiar with defraggler, but if it makes it possible, try to find out if MySQL's data is fragmented. It might have a table of fragmented files or a graphical view of the data on the disk. The MySQL data files will probably be some of the largest files so that might make it easier to find them. If you find that its files are heavily fragmented, you may then want to try to find some time to shut down MySQL and defragment its files.

Why should we store log files and bin-log files on different path or disks in mysql

I have replication setup mysql databases....the log file location the bin-log file all are at one path that is default my data directory of mysql.
I have read that for better performance one should store them separately.
Can anyone provide me how this improves the performance. Is there is documentation available for the same. The reason why one should do so?
Mainly because then, reads and writes can be made almost in parallel. Stored separately meaning on different disks.
Linux and H/W optimizations for MySQL is a nice presentation of ways to improve MySQL performance - it presents benchmarks and conclusions of when to use SSD disks and when to use SCSI disks, what kind of processors are better for what tasks.
Very good presentation, a must read for any DBA!!
It also can be really embarrassing to have your log files fill the file system and bring the database to a halt.
One consideration is that using a separate disk for binlogging introduces another SPOF since if MySQL cannot write the binlog it will croak the same as if it couldn't write to the data files. Otherwise, adding another disk just better separates the two tasks so that binlog writes and data file writes don't have to contend for resources. With SSDs this is much less of an issue unless you have some crazy heavy write load and are already bound by SSD performance.
It's mostly for cases where your database write traffic is so high that a single disk volume can't keep up while writing for both data files and log files. Disks have a finite amount of throughput, and you could have a very busy database server.
But it's not likely that separating data files from binlogs will give better performance for queries, because MySQL writes to the binlog at commit time, not at query time. If your disks were too slow to keep up with the traffic, you'd see COMMIT become a bottleneck.
The system I currently support stores binlogs in the same directory as the datadir. The datadir is on a RAID10 volume over 12 physical drives. This has plenty of throughput to support our workload. But if we had about double our write traffic, this RAID array wouldn't be able to keep up.
You don't need to do every tip that someone says gives better performance, because any given tip might make no difference to your application's workload. You need to measure many metrics of performance and resource use, and come up with the right tuning or configuration to help the bottlenecks under your workload.
There is no magic configuration that makes everything have high performance.

Reliable backups for huge mysql databases?

I have 200GB / 400Mrows mysql/innodb database - far beyond what's reasonable as I found out.
One surprising problem is restoring backups. mysqldump generates huge sql files, and they take about a week to import back into a fresh database (attempts at making it faster like bigger/smaller transactions, turning off keys during import etc., network compression etc. failed so far, myisam import seems 2x faster but then there would be no transactions).
What's worse - and I hope to get some help with this - a network connection which transfers >200GB over a time period of a week has a non-trivial chance of breaking, and sql import process cannot be continued in any non-trivial way.
What would be the best way of dealing with it? Right now if I notice a broken connection I manually try to figure out when it ended by checking highest primary key of the last imported table, and then have a perlscript which basically does this:
perl -nle 'BEGIN{open F, "prelude.txt"; #a=<F>; print #a; close F;}; print if $x; $x++ if /INSERT.*last-table-name.*highest-primary-key/'
This really isn't the way to go, so what would be the best way?
Does your MySQL box have enough hard drive space for all the data doubled? Local storage would be best here, but if it's not an option, you could also try some sort of NAS device utilizing iSCSI. It's still happening over the network, but in this case you get more throughput and reliability, because you're only relying on a NAS which has a pretty slim OS and almost never has to be rebooted.
You can't use mysqldump to backup large databases - 200G is feasible but bigger ones it gets worse and worse.
Your best bet is to take a volume snapshot of the database directory and zip that up somehow - that's what we've generally done - or rsync it somewhere else.
If your filesystem or block device does not support snapshots then you're basically in trouble. You can shut the db down to take a backup, but I don't imagine you want to do that.
To restore it, just do the opposite then restart and wait (possibly some time) for innodb recovery to fix things up.
The maatkit mk-parallel-dump and restore tools are a bit better than mysqldump, speed-wise - but I'm not 100% convinced of their correctness
Edit: re-reading the question, I think filesystem snapshot + rsync is probably the best way to go; you can do this without impacting the live system (you'll only need to transfer what changed since the last backup) too much and you can resume the rsync if the connection fails, and it'll continue where it left off.
Do you need everything in the database?
Can you push some of the information to an archive database and add something into your application that would allow people to view records in the archive,
Obviously this depends a lot upon your application and set up, but it may be a solution? Your DB is probably only going to get bigger....