Rebuild InnoDB Index ONLINE to prevent timeouts? - mysql

When I update a particularly large table, the update times out because the table is locked while the indexes rebuild. Is there any way to rebuild the index ONLINE (i.e. Oracle) so the update does not timeout?

Simple answer: No, there is no way.
More complex answer: You can emulate online index addition by using statement-based replication and adding the index to the slave first, then making it the master. This is why people use packages like http://mysql-mmm.org/.

optimize table rebuilds indexes online
OPTIMIZE TABLE uses online DDL for regular and partitioned InnoDB tables, which reduces downtime for concurrent DML operations. The table rebuild triggered by OPTIMIZE TABLE and performed under the cover by ALTER TABLE ... FORCE is completed in place. An exclusive table lock is only taken briefly during the prepare phase and the commit phase of the operation. During the prepare phase, metadata is updated and an intermediate table is created. During the commit phase, table metadata changes are committed.
Source: https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html
But! I've heard much about that you should not rebuild indexes in InnoDB because they are always up to date. Google a bit about that.

pt-online-schema-change can be used to optimize a table. OPTIMIZE TABLE is effectively a noop ALTER TABLE.
pt-online-schema-change --alter "ENGINE=InnoDB" D=sakila,t=actor

Related

Is it possible to run OPTIMIZE TABLE without having replication lag/downtime?

I had a table with 100,000,000 records and 500GB of data. I have been backing up a lot of the older records into a backup DB and deleting them from main DB. However the disk space hasn't reduced, and I noticed the data_free has grown a lot for that table.
My understanding is I need to run OPTIMIZE TABLE to reduce the disk size, however I have read this causes replication lag. I am using mysql 5.7 InnoDB.
So my question is, can I run OPTIMIZE TABLE without causing replication lag? For example running OPTIMIZE TABLE on master such as:
OPTIMIZE NO_WRITE_TO_BINLOG TABLE tblname;
Then run the same command on the slaves one by one. Would that work? Are there some risks in doing that? Or is there any other way?
At my company we use Percona's free tool pt-online-schema-change.
It doesn't literally do an OPTIMIZE TABLE, but for InnoDB tables, any table-copy operation will accomplish the same result. That is, it makes a new InnoDB tablespace, copies all the rows to that tablespace, and rebuilds all the indexes for that table. The new tablespace will be a defragmented version of the original tablespace.
Any alter will work, you don't have to change anything in the table. I use the no-op ALTER TABLE <name> FORCE.
The advantage of pt-online-schema-change is that while it's working, you can continue to read and write the table. It only needs a brief metadata lock to create triggers as it starts, and another brief metadata lock at the end to swap the new table for the old.
If you use OPTIMIZE TABLE, this causes long replication lag, because it won't start running on the replica until after it's finished on the source.
Whereas with pt-online-schema-change, it starts running the table-copy immediately, and this continues along with other concurrent transactions, and when it's done on the source, it's only a moment until it's also done on the replica.
It actually takes longer than OPTIMIZE TABLE, but since it doesn't prevent you from using the table, that doesn't matter as much.
I ended up making the tests on my local by setting up a replication environment.
It seems possible to run OPTIMIZE TABLE tblname; without causing any downtime or replication lag.
You need to run OPTIMIZE NO_WRITE_TO_BINLOG TABLE tblname; on master, to avoid writing to the bin logs and replicating the query to the slaves.
Then you have to run OPTIMIZE TABLE tblname; individually in every slave.
Here is more detailed explanation of what happens: https://dev.mysql.com/doc/refman/5.7/en/optimize-table.html#optimize-table-innodb-details
It says:
an exclusive table lock is only taken briefly during the prepare phase
and the commit phase of the operation.
So there is almost no lock time.
There are edge cases to worry about that could cause downtime (due to table lock caused by copy method over online DDL), some of those are listed in the link above.
Another thing to consider is disk space. With InnoDB I observed it recreates the table. So if the contents of your table add up to 100GB, you would need at least an extra 100GB of free space to run the command successfully.
As Bill suggested it may be a safer alternative to use the pt-online-schema-change, however if you cant use it, with careful operation seems no replication lag and no downtime is possible.

Analyze + Optimize on InnoDB Tables

Back then when i was working heavily with MyISAM Tables i always had a cronjob which ran
~# mysqlanalyze -o database
I know that MyISAM benefit from this in certain ways e.g.: fragmentation and whatnot
Now, when running the same command on a databse where the majority of tables is InnoDB i wonder if this "does any good" to the tables and is considered a good practice to do so every now and then or if its rather counter productive. Reading alot of :
Table does not support optimize, doing recreate + analyze instead
Which sounds expensive with regards to Disk IO / CPU time ?!
would appreciate some input on this.
https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html says:
For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE ... FORCE, which rebuilds the table to update index statistics and free unused space in the clustered index.
This does do some good in cases when you had too much fragmentation. Pages will be filled more efficiently, indexes will be rebuilt, and disk space occupied by the table will be reduced if you use innodb_file_per_table (which is the default in recent versions).
It does take time, depending on the size of your table. It will lock the table while it's running. It will require extra disk space while it's running, as it creates a copy of the table.
Doing optimize table on an InnoDB table is usually not necessary to do frequently, but only after you do a lot of insert/update/delete against the table in a way that could result in fragmentation.
ANALYZE TABLE is much less impact for InnoDB. This doesn't require building a copy of the table. It's a read-only action, it just reads a random sample of pages from the table and uses that to estimate the number of rows, average size of rows, and it update statistics about the indexes, to guide the query optimizer. This is safe to run anytime, it will lock that table for moment, but that won't be any greater regardless of the size of the table.
Don't bother. InnoDB almost never needs either ANALYZE or OPTIMIZE; don't waste your time unless you have identified a need.
An exception is a FULLTEXT index on an InnoDB table. Such can benefit from DROP INDEX, then ADD INDEX.
If you are "reloading" the table from new data, then the following avoids downtime:
CREATE TABLE new LIKE real;
load `new`
RENAME TABLE real TO old, new TO real; -- fast, atomic
DROP TABLE old;
(Caveat: The above technique probably has issues if there are FOREIGN KEYS.)

Create index locks MySQL 5.6 table. How to avoid that?

I need to create an index on a large InnoDB production table and want to do this without locking the table in any way. I am using MySQL 5.6 (.38-83.90).
I tried
create index my_index on my_table(col1, col2);
Neither columns are primary keys. col1 is a foreign key.
Well, this totally locked the table. Other queries were stalled with "Waiting for table metadata lock" bringing my website to its knees. I had to kill the create index query.
From this https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html I thought that it would not lock the table: "... no syntax changes are required... The table remains available for read and write operations while the index is being created or dropped."
I see that I can set LOCK=NONE or LOCK=SHARED, but I don't see that it should be necessary or, if it is, which one I need to use.
"You can specify LOCK=NONE to assert that concurrent DML is permitted during the DDL operation. MySQL automatically permits concurrent DML when possible."
"You can specify LOCK=SHARED to assert that concurrent queries are permitted during a DDL operation. MySQL automatically permits concurrent queries when possible."
None of the limitations https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-limitations.html seem to apply to my case.
What am I missing?
My guess (just a guess) is that you are missing the ALGORITHM=INPLACE clause on the CREATE INDEX statement.
CREATE INDEX my_index ON my_table(col1, col2) ALGORITHM=INPLACE ;
^^^^^^^^^^^^^^^^^
Also be aware of transactions acquiring and holding metadata locks.
https://dev.mysql.com/doc/refman/5.6/en/metadata-locking.html
Any transaction that has referenced my_table will continue to hold a metadata lock on that table until the transaction is committed or rolled back. I suggest checking the TRANSACTIONS section of SHOW ENGINE INNODB STATUS output.

Database Table very slow after delete

I have a MySQL InnoDB database running on the Google App-Engine.
One of the tables has the current date and a user_id as primary key stored with some additional data.
The table had around 7 million rows and I deleted 6 million of them with a DELETE query. Since that any query using this table is much slower than before.
Any ideas what could cause this behavior or how to solve this?
Thanks in advance!
After such a massive delete on innodb you would better to use OPTIMISE table statement
Use OPTIMIZE TABLE in these cases, depending on the type of table:
After doing substantial insert, update, or delete operations on an InnoDB table that has its own .ibd file because it was created with
the innodb_file_per_table option enabled. The table and indexes are
reorganized, and disk space can be reclaimed for use by the operating
system.
After doing substantial insert, update, or delete operations on columns that are part of a FULLTEXT index in an InnoDB table. Set the
configuration option innodb_optimize_fulltext_only=1 first. To keep
the index maintenance period to a reasonable time, set the
innodb_ft_num_word_optimize option to specify how many words to
update in the search index, and run a sequence of OPTIMIZE TABLE
statements until the search index is fully updated.
Prior to optimize, check the table's state using ANALYSE TABLE, and it's indexes using SHOW INDEX. These instructions will provide you with information regarding the "flaws" that OPTIMIZE can fix.
All this is easy to do in phpmyadmin.

Optimize mysql table to avoid locking

How do I optimize mysql tables to not use locking? Can I alter table to 'turn off' locking all the time.
Situation:
I have app which use database of 15M records. Once weekly scripts doing some task (insert/update/delete) for 20 hours, and app servers that feed data to front end (web server), and that is fine, very small performance loss I see during that time.
Problem:
Once monthly I need to optimize table, since huge number of records is out there it take 1-2 hours to finish this task (starting optimize from mysql command line, or phpMyAdmin, same) and in that period mysql DOESN'T SERVE data to front end (I suppose it is about locking tables for optimize)
Question:
So how to optmize tables to avoid locking, since there is only reading of data (no insert or update) so I suppose 'unlocking' while optimize, in this case can't make any damage?
In case your table engine is InnoDB and MySQL version is > 5.6.17 - the lock won't happen. Actually there will be lock, but for VERY short period.
Prior to Mysql 5.6.17, OPTIMIZE TABLE does not use online DDL.
Consequently, concurrent DML (INSERT, UPDATE, DELETE) is not permitted
on a table while OPTIMIZE TABLE is running, and secondary indexes are
not created as efficiently.
As of MySQL 5.6.17, OPTIMIZE TABLE uses online DDL for regular and
partitioned InnoDB tables. The table rebuild, triggered by OPTIMIZE
TABLE and performed under the cover by ALTER TABLE ... FORCE, is
performed in place and only locks the table for a brief interval,
which reduces downtime for concurrent DML operations.
Optimize Tables Official Ref.
Just better prepare free space that is > than the space currently occupied by your table, because whole table copy can happen for index rebuild.