Is it faster to alter multiple columns in the same query? - mysql

Is it any faster to add or drop multiple columns in one query, rather than executing a query for each column? For example, is this:
ALTER TABLE t2 DROP COLUMN c, DROP COLUMN d;
any faster than this?
ALTER TABLE t2 DROP COLUMN c;
ALTER TABLE t2 DROP COLUMN d;

Yes, it should be faster to run a single ALTER TABLE statement than two.
In my experience (with InnoDB on 5.1 and 5.5), MySQL doesn't seem to modify the table "in place". MySQL actually creates a new table, as copy of the old table with the specified modifications.
Two separate statements would require MySQL to do that copy operation twice.
With a single statement, you give MySQL the opportunity to make all the changes with just one copy operation. (I don't know the details of the MySQL internals, but it's possible that MySQL actually does the copy two times.)
Other database engines (MyISAM et al.) may get processed differently.
I believe the InnoDB plugin and/or newer versions of InnoDB in the MySQL (>5.5) have some algorithms other than the "copy" method, at least for some changes, which allow for the table to still be available while the ALTER TABLE is running (for read queries). But I don't know all the details.

Yes, it's faster. You only have to make one call to the database API, and it only has to parse the query once.
However, for ALTER TABLE queries, performance usually isn't a concern. You shouldn't be doing this frequently, only when you redesign your schema.
But if your question were about UPDATE queries, for instance, it would probably be significant. E.g. you should do:
UPDATE table
SET col1 = foo, col2 = bar
WHERE <condition>;
rather than
UPDATE table
SET col1 = foo
WHERE <condition>;
UPDATE table
SET col2 = bar
WHERE <condition>;

Related

MySql - How to insert and on duplicate key update without explicitly specifying all non key columns

I have a table which was created as a select * from a view (and then added a PK).
I want to periodically update the table with all the data from the view.
I thought the best option is to do this using: INSERT INTO table_a SELECT * FROM view_a ON DUPLICATE KEY UPDATE VALUES(non_key_col_1), VALUES(non_key_col_1), .... ;
Since there are quite a lot of columns, and they might change in the future (then I can re-create the table, but I wish I won't have to edit the periodic insert, I was wondering if there is a way to avoid the explicit specification of all columns?
There no such syntax in mysql unfortunately. You'll have to update all the columns one by one.
You can go with a trigger on insert operation, that is if the primary key exists update the row otherwise insert it. But definitely it is going to impact the performance in case of large data
One thing i can think of is get the column names from INFORMATION_SCHEMA.COLUMNS and use those to dynamically compose your query in your app.
SELECT * FROM information_schema.columns WHERE table_name = 'view_a';
Now you have the columns no matter if the view changes.
Do the same for the table and you have the column differences.
Use those differences to run ALTER TABLE statements or drop it and recreate it all together.
Of course this is probably even more laborious then dropping and recreating the table manually.

Table traversing with multiple operations in ALTER TABLE

Some databases, like MySQL [1] and PostgreSQL [2], support bundling of certain compatible ALTER TABLE statements (as non-standard SQL).
For example we can have:
ALTER TABLE `my_table`
DROP COLUMN `column_1`,
DROP COLUMN `column_2`,
...
or
ALTER TABLE
MODIFY `column_1` ... ,
MODIFY `column_2` ... ,
instead of having individual statements:
ALTER TABLE `my_table` DROP COLUMN `column_1`;
ALTER TABLE `my_table` DROP COLUMN `column_2`;
or
ALTER TABLE `my_table` MODIFY `column_1` ... ;
ALTER TABLE `my_table` MODIFY `column_2` ... ;
etc
For comparison of the same feature, PostgreSQL [2], which also implements this, will perform all operations in a single scan:
The main reason for providing the option to specify multiple changes in a single ALTER TABLE is that multiple table scans or rewrites can thereby be combined into a single pass over the table.
Although for DROP COLUMN specifically it will often not even need do that:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations...
Questions:
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
Assumptions:
Using InnoDB
No indexes/complex defaults are involved in any of the columns we want to change/drop (so basically changes that would not require a temporary table when run as individual alter statements)
References:
[1] MySQL ALTER TABLE docs
[2] PostgreSQL ALTER TABLE docs
MySQL's InnoDB:
(This does not really answer the Questions, but provides a little more insight in the the bigger question of ALTER.)
If any of the alters needs to copy the table over, you are probably better off putting all alters into the same statement. Changing the PRIMARY KEY, for example, requires rebuilding the data that is clustered with the PK.
Some alters can be achieved by simply altering the schema; these are virtually instantaneous, and could be done via separate alter statements. Adding an option to ENUM was implemented long ago.
Some alters need some form of scan, but can do it "in the background". DROP INDEX can be done by quickly "hiding" it, then freeing up the BTree in the background.
I have left out a grey area in which you batch 'simple' alters. One would hope that ALTER is smart enough to simply go through them quickly, rather than deciding to copy the table over.
I got some useful feedback but decided to respond to my own question to provide a more concrete set of answers.
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
Yes, if the alter statement results in rebuilding the table then it only needs to do it once.*
* This answer comes from my own testing and other mostly anecdotal evidence (including #Uueerdo 's in this post). It would be useful to have some official docs for this...
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
MySQL will rebuild the table in place (rather than create a copy or just change metadata) for most column operations. Each specific case can be found in the Online DDL docs for InnoDB.
A few operations like renaming a column or setting a default value will just alter metadata, so they don't require a table rebuild.
However, dropping a column DOES require a full table rebuild.

How do I efficiently change a MySQL table structure on a table with millions of entries?

I have a MySQL database that is up to about 17 GB in size and has 38 million entries. At the moment, I need to both increase the size of one column (varchar 40 to varchar 80) and add more columns.
Many of the fields are indexed including the one that I need to change. It is part of a unique pair that is necessary for the applications to work. In attempting to just make the change yesterday, the query ran for almost four hours without finishing, when I decided to cut our outage and just bring the service back up.
What is the most efficient way to make changes to something of this size?
Many of these entries are also old and if there is a good way to sort of shard off entries but still have them available that might help with this problem by making the table a much more manageable size.
You have some choices.
In any case you should take a backup before you do this stuff.
One possibility is to take your service offline and do it in place, as you have tried. If you do that you should disable key checks and constraints.
ALTER TABLE bigtable DISABLE KEYS;
SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE (whatever);
ALTER TABLE (whatever else);
...
SET FOREIGN_KEY_CHECKS=1;
ALTER TABLE bigtable ENABLE KEYS;
This will allow the ALTER TABLE operation to go faster. It will regenerate the indexes all at once when you do ENABLE KEYS.
Another possibility is to create a new table with the new schema you want, then disable the keys on the new table, then do as #Bader suggested and insert the contents of the old table.
After your new table is built you will re-enable the keys on it, then rename the old table to some name like "old_bigtable" then rename the new table to "bigtable".
It's possible that you can keep your service online while you're populating the new table. But that might work poorly.
A third possibility is to dump your giant table (to a flat file) and then load it to a new table with the new layout. That is pretty much like the second possibility except that you get a table backup for free. You can make this go pretty fast with SELECT DATA INTO OUTFILE and LOAD DATA INFILE. You'll need to have access to your server machine's file system to do this.
In all cases, disable, then re-enable, the constraints and keys to get things to go fast.
Create a new table with the new structure you want with a different name for example NewTable.
Then insert data into this new table from the old table using the following query:
INSERT INTO NewTable (field1, field2, etc...) SELECT field1, field2, ... FROM OldTable
After this is done, you can drop the old table and rename the new table to the original name
DROP TABLE `OldTable`;
RENAME TABLE `NewTable` TO `OldTable` ;
I have tried this approach on a very large table and it's much much faster than altering the table.
With MySQL 5.1 and again with 5.5 certain alter statements were enhanced to just modify the structure without rewriting the entire table ( http://dev.mysql.com/doc/refman/5.5/en/alter-table.html - search for in-place). The availability of this though varies by the type of change you are making and the engine in use, the most value comes from InnoDB Plugin. In the case of your specific changes though the entire table would be rewritten.
When we encounter these issues, we typically try to leverage replica databases. As long as you are adding and not removing you can run your DDL against the replica first and then schedule a brief outage for promoting the replica to the master role. If you happen to be on RDS this is even one of their suggested uses for their replica instances http://aws.amazon.com/about-aws/whats-new/2012/10/11/amazon-rds-mysql-rr-promotion/.
Some other alternatives include:
Selecting out a subset of records into a new table with the desired structure (use INTO OUTFILE to avoid a table lock). Once complete you can schedule a maintenance window and REPLACE INTO or UPDATE any records that have changed in the origin table since the initial data copy. Once the update is complete a RENAME TABLE... of both tables wraps the changes up.
Using a tool like Percona's pt-online-schema-change: http://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html. This tool works with triggers so if you already have triggers on the tables you want to change this may not fit your needs.

Change column name without recreating the MySQL table

Is there a way to rename a column on an InnoDB table without a major alter?
The table is pretty big and I want to avoid major downtime.
Renaming a column (with ALTER TABLE ... CHANGE COLUMN) unfortunately requires MySQL to run a full table copy.
Check out pt-online-schema-change. This helps you to make many types of ALTER changes to a table without locking the whole table for the duration of the ALTER. You can continue to read and write the original table while it's copying the data into the new table. Changes are captured and applied to the new table through triggers.
Example:
pt-online-schema-change h=localhost,D=databasename,t=tablename \
--alter 'CHANGE COLUMN oldname newname NUMERIC(9,2) NOT NULL'
Update: MySQL 5.6 can do some types of ALTER operations without rebuilding the table, and changing the name of a column is one of those supported as an online change. See http://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html for an overview of which types of alterations do or don't support this.
If there aren't any constraints on it, you can alter it without a hassle as far as I know. If there are you'll have to remove the constraints first, alter and add the constraints back.
Altering a table with many rows can take a long time (though if the columns involved are not indexed, it may be trivial).
If you specifically want to avoid using the ALTER TABLE syntax created specifically for that purpose, you can always create a table with almost the exact same structure (but different name) and copy all the data into it, like so:
CREATE TABLE `your_table2` ...;
-- (using the query from SHOW CREATE TABLE `your_table`,
-- but modified with your new column changes)
LOCK TABLES `your_table` WRITE;
INSERT INTO `your_table2` SELECT * FROM `your_table`;
RENAME TABLE `your_table` TO `your_table_old`, `your_table2` TO `your_table`;
For some ALTER TABLE queries, the above can be quite a bit faster. However, for a simple column name change, it could be trivial. I might try creating an identical table and performing the change on it in order to see how much time you're actually looking at.

create an index without locking the DB

I have a table with 10+ million rows. I need to create an index on a single column, however, the index takes so long to create that I get locks against the table.
It may be important to note that the index is being created as part of a 'rake db:migrate' step... I'm not adverse to creating the index manually if that will work.
UPDATE: I suppose I should have mentioned that this a write often table.
MySQL NDBCLUSTER engine can create index online without locking the writes to the table. However, the most widely used InnoDB engine does not support this feature. Another free and open source DB Postgres supports 'create index concurrently'.
you can prevent the blockage with something like this (pseudo-code):
create table temp like my_table;
update logger to log in temp;
alter table my_table add index new_index;
insert into my_table select * from temp;
update logger to log in my_table;
drop table temp
Where logger would be whatever adds rows/updates to your table in regular use(ex.: php script). This will set up a temporary table to use while the other one updates.
Try to make sure that the index is created before the records are inserted. That way, the index will also be filled during the population of the table. Although that will take longer, at least it will be ready to go when the rake task is done.