I have a large table with a column that has currently NOT NULL
I now tried to alter the table to allow nulls for this column. However, the query never finishes. Is this to be expected or is there a better way to do this?
That happens and I think the best approach is to create a new table schema and load the old table into that new table.
Related
Some databases, like MySQL [1] and PostgreSQL [2], support bundling of certain compatible ALTER TABLE statements (as non-standard SQL).
For example we can have:
ALTER TABLE `my_table`
DROP COLUMN `column_1`,
DROP COLUMN `column_2`,
...
or
ALTER TABLE
MODIFY `column_1` ... ,
MODIFY `column_2` ... ,
instead of having individual statements:
ALTER TABLE `my_table` DROP COLUMN `column_1`;
ALTER TABLE `my_table` DROP COLUMN `column_2`;
or
ALTER TABLE `my_table` MODIFY `column_1` ... ;
ALTER TABLE `my_table` MODIFY `column_2` ... ;
etc
For comparison of the same feature, PostgreSQL [2], which also implements this, will perform all operations in a single scan:
The main reason for providing the option to specify multiple changes in a single ALTER TABLE is that multiple table scans or rewrites can thereby be combined into a single pass over the table.
Although for DROP COLUMN specifically it will often not even need do that:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations...
Questions:
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
Assumptions:
Using InnoDB
No indexes/complex defaults are involved in any of the columns we want to change/drop (so basically changes that would not require a temporary table when run as individual alter statements)
References:
[1] MySQL ALTER TABLE docs
[2] PostgreSQL ALTER TABLE docs
MySQL's InnoDB:
(This does not really answer the Questions, but provides a little more insight in the the bigger question of ALTER.)
If any of the alters needs to copy the table over, you are probably better off putting all alters into the same statement. Changing the PRIMARY KEY, for example, requires rebuilding the data that is clustered with the PK.
Some alters can be achieved by simply altering the schema; these are virtually instantaneous, and could be done via separate alter statements. Adding an option to ENUM was implemented long ago.
Some alters need some form of scan, but can do it "in the background". DROP INDEX can be done by quickly "hiding" it, then freeing up the BTree in the background.
I have left out a grey area in which you batch 'simple' alters. One would hope that ALTER is smart enough to simply go through them quickly, rather than deciding to copy the table over.
I got some useful feedback but decided to respond to my own question to provide a more concrete set of answers.
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
Yes, if the alter statement results in rebuilding the table then it only needs to do it once.*
* This answer comes from my own testing and other mostly anecdotal evidence (including #Uueerdo 's in this post). It would be useful to have some official docs for this...
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
MySQL will rebuild the table in place (rather than create a copy or just change metadata) for most column operations. Each specific case can be found in the Online DDL docs for InnoDB.
A few operations like renaming a column or setting a default value will just alter metadata, so they don't require a table rebuild.
However, dropping a column DOES require a full table rebuild.
Similar questions have been asked, but I have had issues in the past by using
ALTER TABLE tablename MODIFY columnname SMALLINT
I had a server crash and had to recover my table when I ran this the last time. Is it safe to use this command when there is that much data in the table? What if there are other queries that may be running on the table in parallel? Should I copy the table and run the query on the new table? Should I copy the column and move the data to the new column?
Please let me know if there are any best or "safest" practices when doing this.
Also, I know this depends on a lot of factors, but does anyone know how long the query should take on an InnoDB table with ~5.5 million rows (rough estimate)? The column in question is a TINYINT and has data in it. I want to upgrade to a SMALLINT to handle larger values.
Thanks!
On a slow disk, and with lots of columns in the table, it could take hours to finish.
The ALTER is "safe" because it used to do the following:
Lock the table
Create a similar table, but with SMALLINT instead of TINYINT.
Copy all the rows over to the new table.
Rename the tables and drop the old one.
Unlock
Step 3 is the slow part. The only vulnerability is in step 4, which is very fast.
A server crash during steps 1-3 should have left the old table intact, but possibly left behind a partially created tmp table named something like #sql....
Percona's pt-online-schema-change has the advantage of being virtually lockless.
This cannot be easily answered.
It depends on things like
Has the table its own file, or is it shared with others?
How big is the table in terms of bytes?
etc.
It can last from some minutes to, indeed, some hours and can involve copying over the whole content of the table, so you have quite big needs of disk space.
You can add a new SMALLINT column to the table:
ALTER TABLE tablename ADD columnname_new SMALLINT AFTER columnname;
then copy the data from old column to new one:
UPDATE tablename SET columnname_new = columnname WHERE columnname_new IS NULL LIMIT 100000
repeat above until all records done
then you can drop old column:
ALTER TABLE tablename DROP COLUMN columnname
and finally rename new column:
ALTER TABLE tablename CHANGE columnname_new columnname SMALLINT
you could do the copy of values from old column to new column in batch of 100000 rows, just to be sure not to have any issue
I would add a new column, change the code to check if a value exists in the new column and to read/write it if it does. Also change the code to read from the old column and write to the new column. At this point you can migrate the data at will, copying over values from the old column into the new column where a value does not exist in the new column.
Once all of the data has been migrated you can drop the old column.
I have this table in MYSQL databse which has about 10 million records/rows. I want to insert a new column in the table. However a simple insert column query doesn't seem to work well for me.
This is what I have tried,
ALTER TABLE contacts ADD processed INT(11);
I waited for about 5 hours, but nothing happened. Is there any way to insert a new column in such a huge table?
Hope I am clear with my question. Any help would be appreciated.
If it's production:
You should use pt-online-schema-change of Percona Toolkit.
pt-online-schema-change emulates the way that MySQL alters tables internally, but it works on a copy of the table you wish to alter. This means that the original table is not locked, and clients may continue to read and change data in it.
pt-online-schema-change works by creating an empty copy of the table to alter, modifying it as desired, and then copying rows from the original table into the new table. When the copy is complete, it moves away the original table and replaces it with the new one. By default, it also drops the original table.
Or oak-online-alter-table which is part of openark kit
oak-online-alter-table allows for non blocking ALTER TABLE operations, table rebuilds and creating a table's ghost.
Altering tables will be slower, but it doesn't lock tables.
If it's not production and downtime is okay, try this approach:
CREATE TABLE contacts_tmp LIKE contacts;
ALTER TABLE contacts_tmp ADD COLUMN ADD processed INT UNSIGNED NOT NULL;
INSERT INTO contacts_tmp (contact_table_fields) SELECT * FROM contacts;
RENAME TABLE contacts_tmp TO contacts, contacts TO contacts_old;
DROP TABLE contacts_old;
Is there a way to rename a column on an InnoDB table without a major alter?
The table is pretty big and I want to avoid major downtime.
Renaming a column (with ALTER TABLE ... CHANGE COLUMN) unfortunately requires MySQL to run a full table copy.
Check out pt-online-schema-change. This helps you to make many types of ALTER changes to a table without locking the whole table for the duration of the ALTER. You can continue to read and write the original table while it's copying the data into the new table. Changes are captured and applied to the new table through triggers.
Example:
pt-online-schema-change h=localhost,D=databasename,t=tablename \
--alter 'CHANGE COLUMN oldname newname NUMERIC(9,2) NOT NULL'
Update: MySQL 5.6 can do some types of ALTER operations without rebuilding the table, and changing the name of a column is one of those supported as an online change. See http://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html for an overview of which types of alterations do or don't support this.
If there aren't any constraints on it, you can alter it without a hassle as far as I know. If there are you'll have to remove the constraints first, alter and add the constraints back.
Altering a table with many rows can take a long time (though if the columns involved are not indexed, it may be trivial).
If you specifically want to avoid using the ALTER TABLE syntax created specifically for that purpose, you can always create a table with almost the exact same structure (but different name) and copy all the data into it, like so:
CREATE TABLE `your_table2` ...;
-- (using the query from SHOW CREATE TABLE `your_table`,
-- but modified with your new column changes)
LOCK TABLES `your_table` WRITE;
INSERT INTO `your_table2` SELECT * FROM `your_table`;
RENAME TABLE `your_table` TO `your_table_old`, `your_table2` TO `your_table`;
For some ALTER TABLE queries, the above can be quite a bit faster. However, for a simple column name change, it could be trivial. I might try creating an identical table and performing the change on it in order to see how much time you're actually looking at.
In my application, I make some changes and upload them to a testing server. Because I have no access to the server database I run ALTER commands to make changes on it.
Using a method I ran the following command on server:
ALTER TABLE `blahblahtable` ADD COLUMN `newcolumn` INT(12) NOT NULL
After that, I found that the all the data of the table has been removed. Now the table is blank.
So I need to alter the table without removing his data. Is there any way to do that?
Your question is quite obvious. You're adding a new column to the table, and setting it to NOT NULL.
To make things clearer, I will explain the reaction of the server when you run the command:
You add a new column, so every row of the table has to set a value for that column.
As you don't declare any default value, all the rows set null for this new column.
The server notices that the rows of the table have a null value on a column that doesn't allow nulls. This is illegal.
To solve the conflict, the invalid rows are deleted.
There are some good fixes for this issue:
Set a default value (recommended) for the column you're creating.
Create the column without the NOT NULL, set the appropiate values, and then make the column NOT NULL.
You can create a temp table, pass all the information from the table you want to alter, and then return the info to the altered table.