I have a 42Gb mysqldump of my database (which is like 100gb). I have searched over the web if there is any way of reducing the disk size of the database, I mean, once the dump is restored, I want the disk size to reduce from 100gb to 87-90Gb of space. I haven't found any relevant information yet.
I would appreciatte if anyone could guide me a little bit about this.
Thanks
You could filter the CREATE TABLE statements so they create compressed tables as they restore:
sed -e 's/ENGINE=InnoDB/& ROW_FORMAT=COMPRESSED/' dump.sql | mysql ...
Another idea is to drop some or all of the indexes in large tables before restoring data. Insert ALTER TABLE <tablename> DROP KEY <indexname>; statements after the CREATE TABLE and before the subsequent INSERT statements.
Even if you decide later that you want the indexes after all, creating the index after data has been loaded often results in a more compact index.
Removing indexes might impact the performance of some of your queries that need those indexes. But if it's more important to make the database smaller, then it's up to you how much you sacrifice query performance.
I'll leave it to you to figure out how you want to edit a 42GB file. Different solutions exist depending on your environment (Mac, Windows, Linux).
Related
My client has a store on Woocommerce with 1.2 Gb database. I know that similar store (count by product ) should have approx 700Mb.
The biggest table is wp_posts (760Mb) alone ! Which, I think is strange. Usually biggest table is wp_postmete or wp_options.
I tried optimize this database by plugins: WP-Sweep and wp-optimize so there is no revisions and draft left.
I also tried SQL:
OPTIMIZE TABLE
but it is innoDB so it do not support it. I get this message:
Table does not support optimize,doing recreate + analyze instead
So it is done? I mean:”recreate + analyze” or I should do it? And how?
I read that in innoDB i should dump table and restore but when I do this by DBeaver - i get same size.
Any Idea what should I do?
The error message is a bit misleading, because it dates back to the days when MyISAM was the default storage engine, and OPTIMIZE TABLE does a few things in MyISAM that are different from what it does in InnoDB. For example, MyISAM can't reclaim space from deleted rows until you do OPTIMIZE TABLE (whereas InnoDB does reclaim space dynamically).
InnoDB does support OPTIMIZE TABLE and it does useful things. It does basically the same as an ALTER TABLE when using the COPY algorithm. That is, it creates a new file, and copies the data row by row into the new file. This accomplishes defragmentation and rebuilding the indexes, just as if you had done a dump and restore. So you don't need to dump and restore.
After OPTIMIZE TABLE, the InnoDB table may be close to the same size it was before, if there was little fragmentation.
Frankly, a table 1.2GB in size is not so large by the standards of most MySQL projects I've worked on. We start to get concerned if a table is larger than 500GB, and we start alerting developers if the table is larger than 800GB, or larger than the remaining free disk space.
I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.
Looking at the SQL file, it appears that the indexes are being created in the CREATE TABLE statement, prior to inserting all the data. Based on my (limited) research and personal experience, I've found that it's faster to add the indexes after inserting all the data. Does it not have to check the indexes for every INSERT? I know that mysqldump does have a --disable-keys option which does exactly that – disable the keys prior to inserting, but apparently this only works with MyISAM tables and not InnoDB.
But why couldn't mysqldump not include the keys with the CREATE TABLE statement for InnoDB tables, then do an ALTER TABLE after all the data is inserted? Or does InnoDB work differently, and there is no speed difference?
Thanks!
I experimented with this concept a bit at a past job, where we needed a fast method of copying schemas between MySQL servers.
There is indeed a performance overhead when you insert to tables that have secondary indexes. Inserts need to update the clustered index (aka the table), and also update secondary indexes. The more indexes a table has, the more overhead it causes for inserts.
InnoDB has a feature called the change buffer which helps a bit by postponing index updates, but they have to get merged eventually.
Inserts to a table with no secondary indexes are faster, so it's tempting to try to defer index creation until after your data is loaded, as you describe.
Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See https://www.percona.com/doc/percona-server/LATEST/management/innodb_expanded_fast_index_creation.html
But in my experience, the net improvement in performance was small. It still takes a while to insert a lot of rows, even for tables with no indexes. Then the restore needs to run an ALTER TABLE to build the indexes. This takes a while for a large table. When you count the time of INSERTs plus the extra time to build indexes, it's only a few (low single-digit) percents faster than inserting the traditional way, into a table with indexes.
Another benefit of this post-processing index creation is that the indexes are stored more compactly, so if you need to save disk space, that's a better reason to use this technique.
I found it much more beneficial to performance to restore by loading several tables in parallel.
The new MySQL 8.0 tool mysqlpump supports multi-threaded dump.
The open-source tool mydumper supports multi-threaded dump, and also has a multi-threaded restore tool, called myloader. The worst downside of mydumper/myloader is that the documentation is virtually non-existant, so you need to be an intrepid power user to figure out how to run it.
Another strategy is to use mysqldump --tab to dump CSV files instead of SQL scripts. Bulk-loading CSV files is much faster than executing SQL scripts to restore the data. Well, it dumps an SQL file for the table definition, and a CSV for the data to import. It creates separate files for each table. You have to manually recreate the tables by loading all the SQL files (this is quick), and then use mysqlimport to load the CSV data files. The mysqlimport tool even has a --use-threads option for parallel execution.
Test carefully with different numbers of parallel threads. My experience is that 4 threads is the best. With greater parallelism, InnoDB becomes a bottleneck. But your experience may be different, depending on the version of MySQL and your server hardware's performance capacity.
The fastest restore method of all is when you use a physical backup tool, the most popular is Percona XtraBackup. This allows for fast backups and even faster restores. The backed up files are literally ready to be copied into place and used as live tablespace files. The downside is that you must shut down your MySQL Server to perform the restore.
I've recently been thrust into the position of db admin for our server so I'm having to learn as I go. We recently found that one of our tables had maxed out the id column and needs to be migrated to bigint.
This is for an INNODB table with roughly roughly 301GB of data. We are running mysql version 5.5.38. The command I'm running to migrate the table is
ALTER TABLE tb_name CHANGE id id BIGINT NOT NULL;
I kicked off the migration and we are now 18 hours into the migration, but I'm not seeing our disk space on the server change at all which makes me think nothing is happening. We have plenty of memory so no concern there, but it still shows the following message state when I run "show processlist;"
copy to tmp table
Does anyone have any ideas or know what I'm doing incorrectly? Please ask if you need more information.
Yes, it will take a looooong time. The disks are probably spinning as fast as they can. (SSDs employ faster hamsters.)
You can kill the ALTER, since all it is doing is, as it says, "copying to tmp table", after which it will rename the tmp table to be the real table and drop the old copy.
I hope you had innodb_file_per_table = ON when you started the ALTER. Else it will be expanding ibdata1, which won't shrink afterwards.
pt-online-schema-change is an alternative. It will still take a loooooong time (with one extra 'o' because it will be slightly slower). It will do the job without blocking other activity.
This might have been a good time to check all the columns and indexes in the table:
Could some INTs be turned into MEDIUMINT or something smaller?
Are some of the INDEXes unused?
How about normalizing some of the VARCHARs?
Maybe even PARTITIONing (but not without a good reason)? Time-series is a typical use for Data Warehousing.
Summarize the data, and toss at least the older data?
If you would like further guidance, please provide SHOW CREATE TABLE.
I use select * into outfile option in mysql to backup the data into text files in tab separated format. i call this statement against each table.
And I use load data infile to import data into mysql for each table.
I have not yet done any lock or disable keys while i perform this operation
Now I face some issues:
While it is taking backup the other, updates and selects are getting slow.
It takes too much time to import data for huge tables.
How can I improve the method to solve the above issues?
Is mysqldump an option? I see that it uses insert statements, so before I try it, I wanted to request advice.
Does using locks and disable keys before each "load data" improve speed in import?
If you have a lot of databases/tables, it will definitely be much easier for you to use mysqldump, since you only need to run it once per database (or even once for all databases, if you do a full backup of your system). Also, it has the advantage that it also backs up your table structure (something you cannot do using only select *).
The speed is probably similar, but it would be best to test both and see which one works best in your case.
Someone here tested the options, and mysqldump proved to be faster in his case. But again, YMMV.
If you're concerned about speed, also take a look at the mysqldump/mysqlimport combination. As mentioned here, it is faster than mysqldump alone.
As for locks and disable keys, I am not sure, so I will let someone else answer that part :)
Using mysqldump is important if you want your data backup to be consistent. That is, the data dumped from all tables represents the same instant in time.
If you dump tables one by one, they are not in sync, so you could have data for one table that references rows in another table that aren't included in the second table's backup. When you restore, it won't be pretty.
For performance, I'm using:
mysqldump --single-transaction --tab mydatabase
This dumps for each table, one .sql file for table definition, and one .txt file for data.
Then when I import, I run the .sql files to define tables:
mysqladmin create mydatabase
cat *.sql | mysql mydatabase
Then I import all the data files:
mysqlimport --local --use-threads=4 mydatabase *.txt
In general, running mysqlimport is faster than running the insert statements output by default by mysqldump. And running mysqlimport with multiple threads should be faster too, as long as you have the CPU resources to spare.
Using locks when you restore does not help performance.
The disable keys is intended to defer index creation until after the data is fully loaded and keys are re-enabled, but this helps only for non-unique indexes in MyISAM tables. But you shouldn't use MyISAM tables.
For more information, read:
https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html
https://dev.mysql.com/doc/refman/5.7/en/mysqlimport.html
I have about 100 databases (all the same structure, just on different servers) with approx a dozen tables each. Most tables are small (lets say 100MB or less). There are occasional edge-cases where a table may be large (lets say 4GB+).
I need to run a series of ALTER TABLE commands on just about every table in each database. Mainly adding some rows to the structure, but a few changes like change a row from a varchar to tinytext (or vice versa). Also adding a few new indexes (but indexing new rows, not existing ones, so assuming that isn't a big deal).
I am wondering how safe this is to do, and if there are any best practices to this process.
First, is there any chance I may corrupt or delete data in the tables. I suspect no, but need to be certain.
Second, I presume for the larger tables (4GB+), this may be a several-minutes to several-hours process?
Anything and everything I should know about performing ALTER TABLE commands on a production database I am interested in learning.
If its of any value knowing, I am planning on issuing commands via PHPMYADMIN for the most part.
Thanks -
First off before applying any changers, make backups. Two ways you can do it: mysqldump everything or you can copy your mysql data folder.
Secondly, you may want to use mysql from the command line. PHPMyAdmin will probably time out. Most PHP server has timeout less than 10 minutes. Or you accidently close the browser.
Here is my suggestion.
You can do fail-over the apps.(make sure no connections on all dbs) .
You can create indexes by using "create index statements" .don't use alter table add index statements.
Do these all using script like(keep all these statements in a file and run from source).
Looks like table sizes are very small so it wont create any headache.