Avoiding InnoDB transaction overhead in table copy using INSERT ... SELECT - mysql

Instead of doing ALTER TABLE I prefer to create a new table, copy the data to it, and then move to use it. When doing so in InnoDB I always have a hard time performing:
INSERT INTO new_huge_tbl (SELECT * FROM old_huge_tbl)
Because of the natures of transactions, if at any time I need to stop this operation, the rollback isn't easy, to say the least. Is there any way I can perform this operation in InnoDB without it being a transaction?

No, it's not possible to avoid the transactional overhead in a simple way. You would perhaps have two options:
In your own application, use many smaller transactions (of e.g. 10k rows each) to copy the data in small batches.
Use an existing tool which does the copy for you using the same strategy. I could suggest pt-archiver from the Percona Toolkit.
Internally, when doing table copies for e.g. ALTER TABLE, InnoDB does in fact do exactly that, batching the copy into many smaller transactions.

Related

MySQL: What is the best way to do these multiple batch INSERTs?

I have a MySQL database (InnoDB, if that matters) and I want to add a lot of rows. I want to do this on a production database so there can be no downtime. Each time (about once a day) I want to add about 1M rows to the database, in batches of 10k (from some tests I ran this seemed to be the optimal batch size to minimize time). While I'm doing these inserts the table needs to be readable. What is the "correct" way to do this? For starters you can assume there are no indexes.
Option A: https://dev.mysql.com/doc/refman/5.7/en/commit.html
START TRANSACTION;
INSERT INTO my_table (etc etc batch insert);
INSERT INTO my_table (etc etc batch insert);
INSERT INTO my_table (etc etc batch insert);
INSERT INTO my_table (etc etc batch insert);
(more)
COMMIT;
SET autocommit = 0;
Options B
copy my_table into my_table_temp
INSERT INTO my_table_temp (etc etc batch insert);
INSERT INTO my_table_temp (etc etc batch insert);
INSERT INTO my_table_temp (etc etc batch insert);
INSERT INTO my_table_temp (etc etc batch insert);
(more)
RENAME my_table TO my_table_old;
RENAME my_table_temp TO my_table;
I've used the second method before and it works. There's only a tiny amount of time where something might be wrong which is the time it takes to rename the tables.
But my confusion is: if this were the best solution, then what's the point of START TRANSACTION/COMMIT? Surely that was invented to take care of the thing I'm describing, no?
Bonus question: What if we have indexes? My case is easily adaptable, just turn off the indexes in the temp table and turn them back on after the inserts were finished and before the rename. What about option A? Seems hard to reconciliate with doing inserts with indexes.
then what's the point of START TRANSACTION/COMMIT? Surely that was invented to take care of the thing I'm describing, no?
Yes, exactly. In InnoDB, thanks to its MVCC architecture, writers never block readers. You don't have to worry about bulk inserts blocking readers.
The exception is if you're doing locking reads with SELECT...FOR UPDATE or SELECT...LOCK IN SHARE MODE. Those might conflict with INSERTs, depending on the data you're selecting, and whether it requires gap locks where the new data is being inserted.
Likewise LOAD DATA INFILE does not block non-locking readers of the table.
You might like to see the results I got for bulk loading data in my presentation, Load Data Fast!
There's only a tiny amount of time where something might be wrong which is the time it takes to rename the tables.
It's not necessary to do the table-swapping for bulk INSERT, but for what it's worth, if you ever do need to do that, you can do multiple table renames in one statement. The operation is atomic, so there's no chance any concurrent transaction can sneak in between.
RENAME my_table TO my_table_old, my_table_temp TO my_table;
Re your comments:
what if I have indexes?
Let the indexes be updated incrementally as you do the INSERT or LOAD DATA INFILE. InnoDB will do this while other concurrent reads are using the index.
There is overhead to updating an index during INSERTs, but it's usually preferable to let the INSERT take a little longer instead of disabling the index.
If you disable the index, then all concurrent clients cannot use it. Other queries will slow down. Also, when you re-enable the index, this will lock the table and block other queries while it rebuilds the index. Avoid this.
why do I need to wrap the thing in "START TRANSACTION/COMMIT"?
The primary purpose of a transaction is to group changes that should be committed as one change, so that no other concurrent query sees the change in a partially-complete state. Ideally, we'd do all your INSERTs for your bulk-load in one transaction.
The secondary purpose of the transaction is to reduce overhead. If you rely on autocommit instead of explicitly starting and committing, you're still using transactions—but autocommit implicitly starts and commits one transaction for every INSERT statement. The overhead of starting and committing is small, but it adds up if you do it 1 million times.
There's also a practical, physical reason to reduce the number of individual transactions. InnoDB by default does a filesystem sync after each commit, to ensure data is safely stored on disk. This is important to prevent data loss if you have a crash. But a filesystem sync isn't free. You can only do a finite number of syncs per second (this varies based on what type of disk you use). So if you are trying to do 1 million syncs for individual transactions, but your disk can only physically do 100 syncs per second (this typical for a single hard disk of the non-SSD type), then your bulk load will take a minimum of 10,000 seconds. This is a good reason to group your bulk INSERT into batches.
So for both logical reasons of atomic updates, and physical reasons of being kind to your hardware, use transactions when you have some bulk work to do.
However, I don't want to scare you into using transactions to group things inappropriately. Do commit your work promptly after you do some other type of UPDATE. Leaving a transaction hanging open for an unbounded amount of time is not a good idea either. MySQL can handle the rate of commits of ordinary day-to-day work. I am suggesting batching work when you need to do a bunch of bulk changes in rapid succession.
I think that the best way is LOAD DATA IN FILE

Create a table both in-memory and transaction-safe in MySQL

I know I should use engine=MEMORY to make the table in memory and engine=INNODB to make the table transaction safe. However, how can I achieve both objectives? I tried engine=MEMORY, INNODB, but I failed. My purpose is to access tables fast and allow multiple threads to change contents of tables.
You haven't stated your goals above. I guess you're looking for good performance, and you also seem to want the table to be transactional. Your only option really is InnoDB. As long as you have configured InnoDB to use enough memory to hold your entire table (with innodb_buffer_pool_size), and there is not excessive pressure from other InnoDB tables on the same server, the data will remain in memory. If you're concerned about write performance (and again barring other uses of the same system) you can reduce durability to drastically increase write performance by setting innodb_flush_log_at_trx_commit = 0 and disabling binary logging.
Using any sort of triggers with temporary tables will be a mess to maintain, and won't give you any benefits of transactionality on the temporary tables.
You are asking for a way to create the table with 2 (or more) engines, that is not possible with mysql.
However, I will guess that you want to use memory because you don't think innodb will be fast enough for your need. I think innodb is pretty fast and will be probably enough, but if you really need it, I think you should try creating 2 tables:
table1 memory <-- here is where you will make all the SELECTs
table2 innodb <-- here you will make the UPDATE, INSERT, DELETE, etc and add a TRIGGER so when this one is updated, the table1 gets the same modification.
as i know the there are two ways
1st way
create a temp table as ( these are stored in memory with a small diff they will get deleted as the session is logged out )
create temporary table sample(id int) engine=Innodb;
2nd way
you have to create two tables one with memory engine and other with innodb or bdb
first insert all the data into your innodb table and then trigger the data to be copied into memory table
and if you want to empty the data in the innodb table you can do it with same trigger
you can achieve this using events also

Inserting New Column in MYSQL taking too long

We have a huge database and inserting a new column is taking too long. Anyway to speed up things?
Unfortunately, there's probably not much you can do. When inserting a new column, MySQL makes a copy of the table and inserts the new data there. You may find it faster to do
CREATE TABLE new_table LIKE old_table;
ALTER TABLE new_table ADD COLUMN (column definition);
INSERT INTO new_table(old columns) SELECT * FROM old_table;
RENAME table old_table TO tmp, new_table TO old_table;
DROP TABLE tmp;
This hasn't been my experience, but I've heard others have had success. You could also try disabling indices on new_table before the insert and re-enabling later. Note that in this case, you need to be careful not to lose any data which may be inserted into old_table during the transition.
Alternatively, if your concern is impacting users during the change, check out pt-online-schema-change which makes clever use of triggers to execute ALTER TABLE statements while keeping the table being modified available. (Note that this won't speed up the process however.)
There are four main things that you can do to make this faster:
If using innodb_file_per_table the original table may be highly fragmented in the filesystem, so you can try defragmenting it first.
Make the buffer pool as big as sensible, so more of the data, particularly the secondary indexes, fits in it.
Make innodb_io_capacity high enough, perhaps higher than usual, so that insert buffer merging and flushing of modified pages will happen more quickly. Requires MySQL 5.1 with InnoDB plugin or 5.5 and later.
MySQL 5.1 with InnoDB plugin and MySQL 5.5 and later support fast alter table. One of the things that makes a lot faster is adding or rebuilding indexes that are both not unique and not in a foreign key. So you can do this:
A. ALTER TABLE ADD your column, DROP your non-unique indexes that aren't in FKs.
B. ALTER TABLE ADD back your non-unique, non-FK indexes.
This should provide these benefits:
a. Less use of the buffer pool during step A because the buffer pool will only need to hold some of the indexes, the ones that are unique or in FKs. Indexes are randomly updated during this step so performance becomes much worse if they don't fully fit in the buffer pool. So more chance of your rebuild staying fast.
b. The fast alter table rebuilds the index by sorting the entries then building the index. This is faster and also produces an index with a higher page fill factor, so it'll be smaller and faster to start with.
The main disadvantage is that this is in two steps and after the first one you won't have some indexes that may be required for good performance. If that is a problem you can try the copy to a new table approach, using just the unique and FK indexes at first for the new table, then adding the non-unique ones later.
It's only in MySQL 5.6 but the feature request in http://bugs.mysql.com/bug.php?id=59214 increases the speed with which insert buffer changes are flushed to disk and limits how much space it can take in the buffer pool. This can be a performance limit for big jobs. the insert buffer is used to cache changes to secondary index pages.
We know that this is still frustratingly slow sometimes and that a true online alter table is very highly desirable
This is my personal opinion. For an official Oracle view, contact an Oracle public relations person.
James Day, MySQL Senior Principal Support Engineer, Oracle
usually new line insert means that there are many indexes.. so I would suggest reconsidering indexing.
Michael's solution may speed things up a bit, but perhaps you should have a look at the database and try to break the big table into smaller ones. Take a look at this: link. Normalizing your database tables may save you loads of time in the future.

Problematic performance with continuous UPDATE / INSERT in Mysql

Currently we have a database and a script which has 2 update and 1 select, 1 insert.
The problem is we have 20,000 People who run this script every hour. Which cause the mysql to run with 100% cpu.
For the insert, it's for logging, we want to log all the data to our mysql, but as the table scale up, application become slower and slower. We are running on InnoDB, but some people say it should be MyISAM. What should we use? In this log table, we do sometimes pull out the log for statistical purpose. 40->50 times a day only.
Our solution is to use Gearman [http://gearman.org/] to delay insert to the database. But how about the update.
We need to update 2 table, 1 from the customer to update the balance(balance = balance -1), and the other is to update the count from another table.
How should we make this faster and more CPU efficient?
Thank you
but as the table scale up, application become slower and slower
This usually means that you're missing an index somewhere.
MyISAM is not good: in addition to being non ACID compliant, it'll lock the whole table to do an insert -- which kills concurrency.
Read the MySQL documentation carefully:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Especially "innodb_flush_log_at_trx_commit" -
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html
I would stay away from MyISAM as it has concurrency issues when mixing SELECT and INSERT statements. If you can keep your insert tables small enough to stay in memory, they'll go much faster. Batching your updates in a transaction will help them go faster as well. Setting up a test environment and tuning for your actual job is important.
You may also want to look into partitioning to rotate your logs. You'd drop the old partition and create a new one for the current data. This is much faster than than deleting the old rows.

How to improve MySQL INSERT and UPDATE performance?

Performance of INSERT and UPDATE statements in our database seems to be degrading and causing poor performance in our web app.
Tables are InnoDB and the application uses transactions. Are there any easy tweaks that I can make to speed things up?
I think we might be seeing some locking issues, how can I find out?
You could change the settings to speed InnoDB inserts up.
And even more ways to speed up InnoDB
...and one more optimization article
INSERT and UPDATE get progressively slower when the number of rows increases on a table with an index. Innodb tables are even slower than MyISAM tables for inserts and the delayed key write option is not available.
The most effective way to speed things up would be to save the data first into a flat file and then do LOAD DATA , this is about 20x faster.
The second option would be create a temporary in memory table, load the data into it and then do a INSERT INTO SELECT in batches. That is once you have about 100 rows in your temp table, load them into the permanent one.
Additionally you can get an small improvement in speed by moving the Index file into a separate physical hard drive from the one where the data file is stored. Also try to move any bin logs into a different device. Same applies for the temporary file location.
I would try setting your tables to delay index updates.
ALTER TABLE {name} delay_key_write='1'
If you are not using indexes, they can help improve performance of update queries.
I would not look at locking/blocking unless the number of concurrent users have been increasing over time.
If the performance gradually degraded over time I would look at the query plans with the EXPLAIN statement.
It would be helpful to have the results of these from the development or initial production environment, for comparison purposes.
Dropping or adding an index may be needed,
or some other maintenance action specified in other posts.