mysqldump vs select into outfile

mysqldump vs select into outfile - mysql

I use select * into outfile option in mysql to backup the data into text files in tab separated format. i call this statement against each table.
And I use load data infile to import data into mysql for each table.
I have not yet done any lock or disable keys while i perform this operation
Now I face some issues:
While it is taking backup the other, updates and selects are getting slow.
It takes too much time to import data for huge tables.
How can I improve the method to solve the above issues?
Is mysqldump an option? I see that it uses insert statements, so before I try it, I wanted to request advice.
Does using locks and disable keys before each "load data" improve speed in import?

If you have a lot of databases/tables, it will definitely be much easier for you to use mysqldump, since you only need to run it once per database (or even once for all databases, if you do a full backup of your system). Also, it has the advantage that it also backs up your table structure (something you cannot do using only select *).
The speed is probably similar, but it would be best to test both and see which one works best in your case.
Someone here tested the options, and mysqldump proved to be faster in his case. But again, YMMV.
If you're concerned about speed, also take a look at the mysqldump/mysqlimport combination. As mentioned here, it is faster than mysqldump alone.
As for locks and disable keys, I am not sure, so I will let someone else answer that part :)

Using mysqldump is important if you want your data backup to be consistent. That is, the data dumped from all tables represents the same instant in time.
If you dump tables one by one, they are not in sync, so you could have data for one table that references rows in another table that aren't included in the second table's backup. When you restore, it won't be pretty.
For performance, I'm using:
mysqldump --single-transaction --tab mydatabase
This dumps for each table, one .sql file for table definition, and one .txt file for data.
Then when I import, I run the .sql files to define tables:
mysqladmin create mydatabase
cat *.sql | mysql mydatabase
Then I import all the data files:
mysqlimport --local --use-threads=4 mydatabase *.txt
In general, running mysqlimport is faster than running the insert statements output by default by mysqldump. And running mysqlimport with multiple threads should be faster too, as long as you have the CPU resources to spare.
Using locks when you restore does not help performance.
The disable keys is intended to defer index creation until after the data is fully loaded and keys are re-enabled, but this helps only for non-unique indexes in MyISAM tables. But you shouldn't use MyISAM tables.
For more information, read:
https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html
https://dev.mysql.com/doc/refman/5.7/en/mysqlimport.html

Related

InnoDB indexes before and after importing

I'm trying to import a large SQL file that was generated by mysqldump for an InnoDB table but it is taking a very long time even after adjusting some parameters in my.cnf and disabling AUTOCOMMIT (as well as FOREIGN_KEY_CHECKS and UNIQUE_CHECKS but the table does not have any foreign or unique keys). But I'm wondering if it's taking so long because of the several indexes in the table.
Looking at the SQL file, it appears that the indexes are being created in the CREATE TABLE statement, prior to inserting all the data. Based on my (limited) research and personal experience, I've found that it's faster to add the indexes after inserting all the data. Does it not have to check the indexes for every INSERT? I know that mysqldump does have a --disable-keys option which does exactly that – disable the keys prior to inserting, but apparently this only works with MyISAM tables and not InnoDB.
But why couldn't mysqldump not include the keys with the CREATE TABLE statement for InnoDB tables, then do an ALTER TABLE after all the data is inserted? Or does InnoDB work differently, and there is no speed difference?
Thanks!

I experimented with this concept a bit at a past job, where we needed a fast method of copying schemas between MySQL servers.
There is indeed a performance overhead when you insert to tables that have secondary indexes. Inserts need to update the clustered index (aka the table), and also update secondary indexes. The more indexes a table has, the more overhead it causes for inserts.
InnoDB has a feature called the change buffer which helps a bit by postponing index updates, but they have to get merged eventually.
Inserts to a table with no secondary indexes are faster, so it's tempting to try to defer index creation until after your data is loaded, as you describe.
Percona Server, a branch of MySQL, experimented with a mysqldump --optimize-keys option. When you use this option, it changes the output of mysqldump to have CREATE TABLE with no indexes, then INSERT all data, then ALTER TABLE to add the indexes after the data is loaded. See https://www.percona.com/doc/percona-server/LATEST/management/innodb_expanded_fast_index_creation.html
But in my experience, the net improvement in performance was small. It still takes a while to insert a lot of rows, even for tables with no indexes. Then the restore needs to run an ALTER TABLE to build the indexes. This takes a while for a large table. When you count the time of INSERTs plus the extra time to build indexes, it's only a few (low single-digit) percents faster than inserting the traditional way, into a table with indexes.
Another benefit of this post-processing index creation is that the indexes are stored more compactly, so if you need to save disk space, that's a better reason to use this technique.
I found it much more beneficial to performance to restore by loading several tables in parallel.
The new MySQL 8.0 tool mysqlpump supports multi-threaded dump.
The open-source tool mydumper supports multi-threaded dump, and also has a multi-threaded restore tool, called myloader. The worst downside of mydumper/myloader is that the documentation is virtually non-existant, so you need to be an intrepid power user to figure out how to run it.
Another strategy is to use mysqldump --tab to dump CSV files instead of SQL scripts. Bulk-loading CSV files is much faster than executing SQL scripts to restore the data. Well, it dumps an SQL file for the table definition, and a CSV for the data to import. It creates separate files for each table. You have to manually recreate the tables by loading all the SQL files (this is quick), and then use mysqlimport to load the CSV data files. The mysqlimport tool even has a --use-threads option for parallel execution.
Test carefully with different numbers of parallel threads. My experience is that 4 threads is the best. With greater parallelism, InnoDB becomes a bottleneck. But your experience may be different, depending on the version of MySQL and your server hardware's performance capacity.
The fastest restore method of all is when you use a physical backup tool, the most popular is Percona XtraBackup. This allows for fast backups and even faster restores. The backed up files are literally ready to be copied into place and used as live tablespace files. The downside is that you must shut down your MySQL Server to perform the restore.

Mysqldump taking too much time

MySQLdump and upload process taking too long time (~8 hr) to complete the whole process.
I am dumping active database into mysqldump.tar file and almost 3gb. When I load into new database its taking 6-8 hr to complete the process (upload into new database).
What will be the recommended solution for me to complete the process?

If I understand correctly, your main problem is that loading the data into your new database is the step that's taking a lot of time. Besides reading the link provided by asdf in his comment ("How can I optimize a mysqldump of a large database?"), I suggest you some things:
Use the --disable-keys option; this will add alter table your_table DISABLE KEYS before the inserts, and alter table your_table ENABLE KEYS after the inserts are done. When I've used this option, the insertion time is about 30% faster
If possible, use the --delayed-insert option; whis will use INSERT DELAYED insted of the "normal" INSERT.
If possible, dump the data of different tables into different files; that way you may upload them concurrently.
Check the reference manual for further information.

mysqldump without interrupting live production INSERT

I'm about to migrate our production database to another server. It's about 38GB large and it's using MYISAM tables. Due to I have no physical access to the new server file system, we can only use mysqldump.
I have looked through this site and see whether will mysqldump online backup bring down our production website. From this post: Run MySQLDump without Locking Tables , it says obviously mysqldump will lock the db and prevent insert. But after a few test, I'm curious to find out it shows otherwise.
If I use
mysqldump -u root -ppassword --flush-logs testDB > /tmp/backup.sql
mysqldump will eventually by default do a '--lock-tables', and this is a READ LOCAL locks (refer to mysql 5.1 doc), where concurrent insert still available. I have done a for loop to insert into one of the table every second while mysqldump take one minute to complete. Every second there will be record inserted during that period. Which mean, mysqldump will not interrupt production server and INSERT can still go on.
Is there anyone having different experience ? I want to make sure this before carry on to my production server, so would be glad to know if I have done anything wrong that make my test incorrect.
[My version of mysql-server is 5.1.52, and mysqldump is 10.13]

Now, you may have a database with disjunct tables, or a data warehouse - where everything isn't normalized (at all), and where there are no links what so ever between the tables. In that case, any dump would work.
I ASSUME, that a production database containing 38G data is containing graphics in some form (BLOB's), and then - ubuquitously - you have links from other tables. Right?
Therefore, you are - as far as I can see it - at risk of loosing serious links between tables (usually primary / foreign key pairs), thus, you may capture one table at the point of being updated/inserted into, while its dependant (which uses that table as its primary source) has not been updated, yet. Thus, you will loose the so called integrity of your database.
More often than not, it is extremely cumbersome to restablish integrity, most often due to that the system using/generating/maintaining the database system has not been made as a transaction oriented system, thus, relationships in the database cannot be tracked except via the primary/foreign key relations.
Thus, you may surely get away with copying your table without locks and many of the other proposals here above - but you are at risk of burning your fingers, and depending on, how sensitive the operations are of the system - you may burn yourself severely or just get a surface scratch.
Example: If your database is a critical mission database system, containing recommended heart beat rate for life support devices in an ICU, I would think more than twice, before I make the migration.
If, however, the database contains pictures from facebook or similar site = you may be able to live with the consequences of anything from 0 up to 129,388 lost links :-).
Now - so much for analysis. Solution:
YOU WOULD HAVE to create a software which does the dump for you with full integrity, table-set by table-set, tuple by tuple. You need to identify that cluster of data, which can be copied from your current online 24/7/365 base to your new base, then do that, then mark that it has been copied.
IFFF now changes occur to the records you have already copied, you will need to do subsequent copy of those. It can be a tricky affair to do so.
IFFF you are running a more advanced version of MYSQL - you can actually create another site and/or a replica, or a distributed database - and then get away with it, that way.
IFFF you have a window of lets say 10 minutes, which you can create if you need it, then you can also just COPY the physical files, located on the drive. I am talking about the .stm .std - and so on - files - then you can close down the server for a few minutes, then copy.
Now to a cardinal question:
You need to do maintenance of your machines from time to time. Haven't your system got space for that kind of operations? If not - then what will you do, when the hard disk crashes. Pay attention to the 'when' - not 'if'.

1) Use of --opt is the same as specifying --add-drop-table, --add-locks, --create-options, --disable-keys, --extended-insert, --lock-tables, --quick, and --set-charset. All of the options that --opt stands for also are on by default because --opt is on by default.
2) mysqldump can retrieve and dump table contents row by row, or it can retrieve the entire content from a table and buffer it in memory before dumping it. Buffering in memory can be a problem if you are dumping large tables. To dump tables row by row, use the --quick option (or --opt, which enables --quick). The --opt option (and hence --quick) is enabled by default, so to enable memory buffering, use --skip-quick.
3) --single-transaction This option issues a BEGIN SQL statement before dumping data from the server (transactional tables InnoDB).
If your schema is a combination of both InnoDB and MyISAM , following example will help you:
mysqldump -uuid -ppwd --skip-opt --single-transaction --max_allowed_packet=512M db > db.sql

I've never done it before but you could try --skip-add-locks when dumping.
Though it might take longer, you could dump in several patches, each of which would take very little time to complete. Adding --skip--add-drop-table would allow you to upload these multiple smaller dumps into the same table without re-creating it. Using --extended-insert would make the sql file smaller to boot.
Possibly try something like mysqldump -u -p${password} --skip-add-drop-table --extended-insert --where='id between 0 and 20000' test_db test_table > test.sql. You would need to dump the table structures and upload them first in order to do it this way, or remove the --skip-add-drop-table for the first dump

mysqldump doesn't add --lock-tables by default. Try to use --lock-tables
Let me know if it helped
BTW - you should also use add-locks which will make your import faster!

Is there a MySql binary dump format? Or anything better than plain text INSERT statements?

Is there anything better (faster or smaller) than pages of plain text CREATE TABLE and INSERT statements for dumping MySql databases? It seems awfully inefficient for large amounts of data.
I realise that the underlying database files can be copied, but I assume they will only work in the same version of MySql that they come from.
Is there a tool I don't know about, or a reason for this lack?

Not sure if this is what you're after, but I usually pipe the output of mysqldump directly to gzip or bzip2 (etc). It tends to be a considerably faster than dumping to stdout or something like that, and the output files are much smaller thanks to the compression.
mysqldump --all-databases (other options) | gzip > mysql_dump-2010-09-23.sql.gz
It's also possible to dump to XML with the --xml option if you're looking for "portability" at the expense of consuming (much) more disk space than the gzipped SQL...

Sorry, no binary dump for MySQL. However the binary logs of MySQL are specifically for backup and database replication purposes http://dev.mysql.com/doc/refman/5.5/en/binary-log.html . They are not hard to configure. Only changes such as update and delete are logged, so each log file (created authomatically by MySQL) is also an incremental backup of the changes in the DB. This way you can save from time to time a whole snapshot of the db (once in a month?), and then store just the log files, and in case of crash, restore the latest snapshot and run through the logs.

It's worth noting that MySQL has a special syntax for doing bulk inserts. From the manual:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Would insert 3 rows in a single operation. So loading this way isn't as inefficient as it might otherwise be with one statement per row, and instead of 129 bytes in 3 INSERT statements, this is 59 bytes, and that advantage only gets bigger the more rows you have.

I've never tried this, but aren't mysql tables just binary files on the hard drive? Couldn't you just copy the table files themselves? Presumably that's essentially what you are asking for.
I don't know how to stitch that together, but it seems to me a copy of /var/lib/mysql would do the trick

How to dump a single table in MySQL without locking?

When I run the following command, the output only consists of the create syntax for 'mytable', but none of the data:
mysqldump --single-transaction -umylogin -p mydatabase mytable > dump.sql
If I drop --single-transaction, I get an error as I can't lock the tables.
If I drop 'mytable' (and do the DB), it looks like it's creating the INSERT statements, but the entire DB is huge.
Is there a way I can dump the table -- schema & data -- without having to lock the table?
(I also tried INTO OUTFILE, but lacked access for that as well.)

The answer might depend on the database engine that you are using for your tables. InnoDB has some support for non-locking backups. Given your comments about permissions, you might lack the permissions required for that.
The best option that comes to mind would be to create a duplicate table without the indicies. Copy all of the data from the table you would like to backup over to the new table. If you create your query in a way that can easily page through the data, you can adjust the duration of the locks. I have found that 10,000 rows per iteration is usually pretty darn quick. Once you have this query, you can just keep running it until all rows are copied.
Now that you have a duplicate, you can either drop it, truncate it, or keep it around and try to update it with the latest data and leverage it as a backup source.
Jacob

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008