Changing Large MySQL InnoDB Tables - mysql

Adding a new column or adding a new index can take hours and days for large innodb tables in MySQL with more than 10 million rows. What is the best way to increase the performance on large innodb tables in these two cases? More memory, tweaking the configuration (for example increasing the sort_buffer_size or innodb_buffer_pool_size), or some kind of trick? Instead of altering a table directly, one could create a new one, change it, and copy the old data the new, like this which is useful for ISAM tables and multiple changes:
CREATE TABLE tablename_tmp LIKE tablename;
ALTER TABLE tablename_tmp ADD fieldname fieldtype;
INSERT INTO tablename_tmp SELECT * FROM tablename;
ALTER TABLE tablename RENAME tablename_old;
ALTER TABLE tablename_tmp RENAME tablename;
Is it recommendable for innodb tables, too, or is it just what the ALTER TABLE command does anway?

Edit 2016: we've recently (August 2016) released gh-ost, modifying my answer to reflect it.
Today there are several tools which allow you to do online alter table for MySQL. These are:
edit 2016: gh-ost: GitHub's triggerless schema migration tool (disclaimer: I am author of this tool)
oak-online-alter-table, as part of the openark-kit (disclaimer: I am author of this tool)
pt-online-schema-change, as part of the Percona Toolkit
Facebook's online schema change for MySQL
Let's consider the "normal" `ALTER TABLE`:
A large table will take long time to ALTER. innodb_buffer_pool_size is important, and so are other variables, but on very large table they are all negligible. It just takes time.
What MySQL does to ALTER a table is to create a new table with new format, copy all rows, then switch over. During this time the table is completely locked.
Consider your own suggestion:
It will most probably perform worst of all options. Why is that? Because you're using an InnoDB table, the INSERT INTO tablename_tmp SELECT * FROM tablename makes for a transaction. a huge transaction. It will create even more load than the normal ALTER TABLE.
Moreover, you will have to shut down your application at that time so that it does not write (INSERT, DELETE, UPDATE) to your table. If it does - your whole transaction is pointless.
What the online tools provide
The tools do not all work alike. However, the basics are shared:
They create a "shadow" table with altered schema
They create and use triggers to propagate changes from original table to ghost table
They slowly copy all the rows from your table to shadow table. They do so in chunks: say, 1,000 rows at a time.
They do all the above while you are still able to access and manipulate the original table.
When satisfied, they swap the two, using a RENAME.
The openark-kit tool has been in use for 3.5 years now. The Percona tool is a few months old, but possibly more tested then the former. Facebook's tool is said to work well for Facebook, but does not provide with a general solution to the average user. I haven't used it myself.
Edit 2016: gh-ost is a triggerless solution, which significantly reduces master write-load on the master, decoupling the migration write load from the normal load. It is auditable, controllable, testable. We've developed it internally at GitHub and released it as open source; we're doing all our production migrations via gh-ost today. See more here.
Each tool has its own limitations, look closely at documentation.
The conservative way
The conservative way is to use an Active-Passive Master-Master replication, do the ALTER on the standby (passive) server, then switch roles and do the ALTER again on what used to be the active server, now turned passive. This is also a good option, but requires an additional server, and deeper knowledge of replication.

Rename screws up referenced tables.
If you have say table_2 which is child to tablename, on ALTER TABLE tablename RENAME tablename_old;
table_2 will start pointing to tablename_old.
Now without altering table_2 you cannt point it back to tablename. You have to keep on going making alters in every child and referenced table.

Related

ALTER TABLE table_name ENGINE= / mysql_convert_table_format WHICH IS BETTER

I have big InnoDB databases in MariaDB (> 1T each). I want to turn them to MyISAM. I can google 2 methods:
https://dev.mysql.com/doc/refman/5.0/en/mysql-convert-table-format.html
http://dev.mysql.com/doc/refman/5.6/en/converting-tables-to-innodb.html
The second one is for InnoDB but the arg can be changed to MyISAM.
Which one is more efficient (I've installed the Perl modules mysql_convert_table_format needs)? Thank you!
mysql-convert-table-format is just a tool that uses ALTER TABLE to do the job. It provides a nice interface and a help but other than that it doesn't do anything special.
You can use your favorite MySQL client program to connect to the database and run:
ALTER TABLE table_name ENGINE='MyISAM'
to change the table format (this is what mysql-convert-table-format also does, in the end).
Warning!
During the changing of the table format, the table will be locked, i.e. inaccessible to other queries. This may translate to your site not responding, behaving strange or even being offline.
Other option
pt-online-schema-change, part of Percona Tookit, is a tool that aims to change the table definition of a large table while keeping it online. It creates a new table having the new structure then intelligently copies the data from the old table to the new table in chunks.
I didn't try it but I think it can help you.

which is the better way to change the character set for huge data tables?

In my production database Alerts related tables are created with default CharSet of "latin", due to this we are getting error when we try
to insert Japanese characters in the table. We need to change the table and columns default charset to UTF8.
As these tables are having huge data, Alter command might take so much time (it took 5hrs in my local DB with same amount of data)
and lock the table which will cause data loss. Can we plan a mechanism to change the Charset to UTF8, without data loss.
which is the better way to change the charset for huge data tables?
I found this on mysql manual http://dev.mysql.com/doc/refman/5.1/en/alter-table.html:
In most cases, ALTER TABLE makes a temporary copy of the original
table. MySQL waits for other operations that are modifying the table,
then proceeds. It incorporates the alteration into the copy, deletes
the original table, and renames the new one. While ALTER TABLE is
executing, the original table is readable by other sessions. Updates
and writes to the table that begin after the ALTER TABLE operation
begins are stalled until the new table is ready, then are
automatically redirected to the new table without any failed updates
So yes -- it's tricky to minimize downtime while doing this. It depends on the usage profile of your table, are there more reads/writes?
One approach I can think of is to use some sort of replication. So create a new Alert table that uses UTF-8, and find a way to replicate original table to the new one without affecting availability / throughput. When the replication is complete (or close enough), switch the table by renaming it ?
Ofcourse this is easier said than done -- need more learning if it's even possible.
You may take a look into Percona Toolkit::online-chema-change tool:
pt-online-schema-change
It does exactly this - "alters a table’s structure without blocking reads or writes" - with some
limitations(only InnoDB tables etc) and risks involved.
Create a replicated copy of your database on an other machine or instance, when you setup the replication issue stop slave command and alter the table. If you have more than one table, between each conversation you may consider issuing again start slave to synchronise two databases. (If you do not this it may take longer to synchronise) When you complete the conversion the replicated copy can replace your old production database and you remove the old one. This is the way i found out to minimize downtime.

Converting a big MyISAM to InnoDB

I'm trying to convert a 10million rows MySQL MyISAM table into InnoDB.
I tried ALTER TABLE but that made my server get stuck so I killed the mysql manually. What is the recommended way to do so?
Options I've thought about:
1. Making a new table which is InnoDB and inserting parts of the data each time.
2. Dumping the table into a text file and then doing LOAD FILE
3. Trying again and just keep the server non-responsive till he finishes (I tried for 2hours and the server is a production server so I prefer to keep it running)
4. Duplicating the table, Removing its indexes, then converting, and then adding indexes
Changing the engine of the table requires rewrite of the table, and that's why the table is not available for so long. Removing indexes, then converting, and adding indexes, may speed up the initial convert, but adding index creates a read lock on your table, so the effect in the end will be the same. Making new table and transferring the data is the way to go. Usually this is done in 2 parts - first copy records, then replay any changes that were done while copying the records. If you can afford disabling inserts/updates in the table, while leaving the reads, this is not a problem. If not, there are several possible solutions. One of them is to use facebook's online schema change tool. Another option is to set the application to write in both tables, while migrating the records, than switch only to the new record. This depends on the application code and crucial part is handling unique keys / duplicates, as in the old table you may update record, while in the new you need to insert it. (here transaction isolation level may also play crucial role, lower it as much as you can). "Classic" way is to use replication, which, as far as I know is also done in 2 parts - you start replication, recording the master position, then import dump of the database in the second server, then start it as a slave to catch up with changes.
Have you tried to order your data first by the PK ? e.g:
ALTER TABLE tablename ORDER BY PK_column;
should speed up the conversion.

Why does it take so long to rename a column in mysql?

I have a 12 GB table full of pictures, I'm trying to rename the blob column that holds the data, and it is taking forever. Can someone give me a blow by blow account of why it is taking so long to rename the column? I would have thought that this operation would be pretty quick, no matter the size of the table?
EDIT: The query I ran is as follows
alter table `rails_production`.`pictures` change `data` `image_file_data` mediumblob NULL
It appears that most of the time is spent waiting for mysql to make a temporary copy of the pictures table, which since it is very large is taking a while to do.
It is on the list of things to do, to change the picture storage from the database to the filesystem.
EDIT2: Mysql Server version: 5.0.51a-24+lenny2 (Debian)
I can't give you the blow-by-blow (feature request #34354 would help, except that it probably wouldn't be back-ported to MySQL 5.0), but the extra time is due to the fact that an ALTER ... CHANGE may change the type of the column (and column attributes, if any), which necessitates converting the values stored in the column and other checks. MySQL 5.0 doesn't include optimizations for when the new type and attributes are the same as the old. From the documentation for ALTER under MySQL 5.0:
In most cases, ALTER TABLE works by making a temporary copy of the original table. The alteration is performed on the copy, and then the original table is deleted and the new one is renamed. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table are stalled until the new table is ready, and then are automatically redirected to the new table without any failed updates.
[...]
If you use any option to ALTER TABLE other than RENAME, MySQL always creates a temporary table, even if the data wouldn't strictly need to be copied (such as when you change the name of a column).
Under 5.1, ALTER has some additional optimizations:
In some cases, no temporary table is necessary:
Alterations that modify only table metadata and not table data can be made immediately by altering the table's .frm file and not touching table contents. The following changes are fast alterations that can be made this way:
Renaming a column, except for the InnoDB storage engine.
[...]
Because MySQL will rebuild the entire table when you make schema changes.
This is done because it's the only way of doing it in some cases, and it makes it much easier for the server to rebuild it anyway.
Yes mysql does a temporary copy of the table. I don't think there's an easy way around that. You should really think about to store the pictures on the filesystem and only store paths in mysql. That's the only way to fasten it up, I guess.

How to update DB structure when updating production system without doing a teardown / rebuild

If I'm working on a development server and have updates to the database structure for some of our releases, what is the best way to update the structure on the production server?
Currently we create a new production database containing the structure only, do a SQL dump of the data on the 'old' production database, then run a SQL query to insert the data into the new database.
I know there is an easier way to do these updates, right?
Thanks in advance.
We don't run anything on prod without a script and that script must be in source control. Additionally we have to write a rollback script in case the initial script goes bad and we have to back it out. And when we move to prod configuration management does a differential compare between prod and dev to see if we have missed anything in the production script (any differences have to be traceable to development we are not yet ready to move to prod and documented). A product like Red-gate's SQL compare can do this. Our process is very formalized so that we can maintain a certification required by our larger clients.
If you have large tables even alter table can be slow, but it's still generally more efficient in total time than making a copy of the table with a new name and structure, copying the data to that table, renaming the old table, then naming the new table the name of the orginal table, then deleting the old table.
However, there are times when that is a preferable process as the total down time apparent to the user in this case is the time it takes to rename two tables, so this is good for tables where the data only is filled from the backend not the application (if the application can update the tables, it is a dangerous practice to do this as you may lose changes made while the tables were in transition). A lot of what process to use depends on the nature of the change you are making. Some changes should be done in a maintenance window where the users are not allowed to access the database. For instance if you are adding a new field with a default value to a table with 100,000,000 records, you are liable to lock up the users from using the table while the update happens. It is better to do this in single user mode during off hours (and when the users are told in advance the database will not be available). Other changes only take milliseconds and can happen easily while users are logged in.
Look at alter table to change the schema
It might not be easier than your method but it means less copying of the database
This is actually quite a deep question. If the only changes you've made are to add some columns then ALTER TABLE is probably sufficient. But if you're renaming or deleting columns then ALTER statements may break various foreign key constraints. In addition, sometimes you need to make changes both to the database and the data, which is pretty much unscriptable.
Most likely the best way to automate this would be to write a simple script for each deployment (along with a script to roll back!) which is basically what some systems like Rails will do for you I believe. Some scripts might be simply ALTER statements, some might temporarily disable foreign-key checking and triggers etc, some might run some update statements as well. And some might be dumping the db and rebuilding it. I don't think there's a one-size-fits-all solution here, sorry :)
Use the ALTER TABLE command: http://dev.mysql.com/doc/refman/5.0/en/alter-table.html