Drop Column from Large Table - mysql

I have this largish table with three columns as such:
+-----+-----+----------+
| id1 | id2 | associd |
+-----+-----+----------+
| 1 | 38 | 73157604 |
| 1 | 112 | 73157605 |
| 1 | 113 | 73157606 |
| 1 | 198 | 31936810 |
| 1 | 391 | 73157607 |
+-----+-----+----------+
This continues for 38m rows. The problem is I want to remove the 'associd' column but running ALTER TABLE table_name DROP COLUMN associd; simply takes too long. I wanted to do something like: ALTER TABLE table_name SET UNUSED associd; and ALTER TABLE table_name DROP UNUSED COLUMNS CHECKPOINT 250; then which apparently speeds up the process but it isn't possible in MySQL?
Is there an alternative to remove this column-- maybe creating a new table with only the two columns or getting a drop with checkpoints?

Anything that you do is going to require reading and writing 38m rows, so nothing is going to be real fast. Probably the fastest method is probably to put the data into a new table:
create table newTable as
select id1, id2
from oldTable;
Or, if you want to be sure that you preserve types and indexes:
create table newTable like oldTable;
alter table newTable drop column assocId;
insert into newTable(id1, id2)
select id1, id2
from oldTable;
However, it is usually faster to drop all index on a table before loading a bunch of data and then recreate the indexes afterwards.

Disclaimer: this answer is MySQL oriented and might not work for other databases.
I think in the accepted answer there are some things missing, I have tried to expose here a generic sequence I use to do this kind of operations in a production environment, not only for adding/removing columns but also to add indexes for example.
We call it the Indiana Jones' movement.
Create a new table
A new table using the old one as template:
create table my_table_new like my_table;
Remove the column in the new table
In the new table:
alter table my_table_new drop column column_to_delete;
Add the foreign keys to the new table
The are not generate automatically in the create table like command.
You can check the actual foreign keys:
mysql> show create table my_table;
Then apply them to the new table:
alter table my_table_new
add constraint my_table_fk_1 foreign key (field_1) references other_table_1 (id),
add constraint my_table_fk_2 foreign key (field_2) references other_table_2 (id)
Clone the table
Copy all fields but the one you want to delete.
I use a where sentence to be able to run this command many times if necessary.
As I suppose this is a production environment the my_table will have new records continuously so we have to keep synchronizing until we are capable to do the name changing.
Also I have added a limit because if the table is too big and the indexes are too heavy making a one-shot clone can shut down the performance of your database. Plus, if in the middle of the process you want to cancel the operation it will must to rollback all the already done insertions which means your database won't be recovered instantly (https://dba.stackexchange.com/questions/5654/internal-reason-for-killing-process-taking-up-long-time-in-mysql)
insert my_table_new select field_1, field_2, field_3 from my_table
where id > ifnull((select max(id) from my_table_new), 0)
limit 100000;
As I was doing this several times I created a procedure: https://gist.github.com/fguillen/5abe87f922912709cd8b8a8a44553fe7
Do the name changing
Be sure you run this commands inmediately after you have replicate the last records from your table. Idealy run all commands at once.
rename table my_table to my_table_3;
rename table my_table_new to my_table;
Delete the old table
Be sure you have a back up before you do this ;)
drop table my_table_3
Disclaimer: I am not sure what will happen with foreign keys that were pointing to the old table.

The best solution in this case in MySQL is to:
1) change the table Engine to MyISAM
2) change whatever you want to do (Drop column, alter data type,etc..)
3) change it back to InnoDB
In this case the DBMS will not be locking/unlocking at each record iteration.
However note that this solution would be good if you have several things you want to change in your table/database, because once you revert it back to InnoDB, this will take the same amount of time to drop one column.
So only consider this solution if you have multiple things to change in your database.

You can speed up the process by temporarily turning off unique checks and foreign key checks. You can also change the algorithm that gets used.
SET unique_checks=0;
SET foreign_key_checks=0;
ALTER TABLE table_name DROP COLUMN column_name, algorithm=inplace;
SET unique_checks=1;
SET foreign_key_checks=1;
Using the above code, it took my PC about 2 minutes to drop a column from a 20 million row table.
If you're using a program like Workbench, then you may want to increase the default timeout period in your settings before starting the operation.
If you find that the operation is hanging indefinitely, then you may need to look through the list of processes and kill whatever process has a lock on the table. You can do that using these commands:
SHOW FULL PROCESSLIST;
KILL PROCESS_NUMBER_GOES_HERE;

Related

MySQL renaming and create table at the same time

I need to rename MySQL table and create a new MySQL table at the same time.
There is critical live table with large number of records. master_table is always inserted records from scripts.
Need to backup the master table and create a another master table with same name at the same time.
General SQL is is like this.
RENAME TABLE master_table TO backup_table;
Create table master_table (id,value) values ('1','5000');
Is there a possibility to record missing data during the execution of above queries?
Any way to avoid missing record? Lock the master table, etc...
What I do is the following. It results in no downtime, no data loss, and nearly instantaneous execution.
CREATE TABLE mytable_new LIKE mytable;
...possibly update the AUTO_INCREMENT of the new table...
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;
By renaming both tables in one statement, they are swapped atomically. There is no chance for any data to be written "in between" while there is no table to receive the write. If you don't do this atomically, some writes may fail.
RENAME TABLE is virtually instantaneous, no matter how large the table. You don't have to wait for data to be copied.
If the table has an auto-increment primary key, I like to make sure the new table starts with an id value greater than the current id in the old table. Do this before swapping the table names.
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA='mydatabase' AND TABLE_NAME='mytable';
I like to add some comfortable margin to that value. You want to make sure that the id values inserted to the old table won't exceed the value you queried from INFORMATION_SCHEMA.
Change the new table to use this new value for its next auto-increment:
ALTER TABLE mytable_new AUTO_INCREMENT=<increased value>;
Then promptly execute the RENAME TABLE to swap them. As soon as new rows are inserted to the new, empty table, it will use id values starting with the increased auto-increment value, which should still be greater than the last id inserted into the old table, if you did these steps promptly.
Instead of renaming the master_backup table and recreating it, you could
just create a backup_table with the data from the master_table for the first backup run.
CREATE TABLE backup_table AS
SELECT * FROM master_table;
If you must add a primary key to the backup table then run this just once, that is for the first backup:
ALTER TABLE backup_table ADD CONSTRAINT pk_backup_table PRIMARY KEY(id);
For future backups do:
INSERT INTO backup_table
SELECT * FROM master_table;
Then you can delete all the data in the backup_table found in the master_table like:
DELETE FROM master_table A JOIN
backup_table B ON A.id=B.id;
Then you can add data to the master_table with this query:
INSERT INTO master_table (`value`) VALUES ('5000'); -- I assume the id field is auto_incrementable
I think this should work perfectly even without locking the master table, and with no missing executions.

Mysql delete duplicates

I'm able to display duplicates in my table
table name reportingdetail and column name ReportingDetailID
SELECT DISTINCT ReportingDetailID from reportingdetail group by ReportingDetailID HAVING count(ReportingDetailID) > 1;
+-------------------+
| ReportingDetailID |
+-------------------+
| 664602311 |
+-------------------+
1 row in set (2.81 sec)
Dose anyone know how can I go about deleting duplicates and keep only one record?
I tired the following
SELECT * FROM reportingdetail USING reportingdetail, reportingdetail AS vtable WHERE (reportingdetailID > vtable.id) AND (reportingdetail.reportingdetailID=reportingdetailID);
But it just deleted everything and kept single duplicates records!
The quickest way (that I know of) to remove duplicates in MySQL is by adding an index.
E.g., assuming reportingdetailID is going to be the PK for that table:
mysql> ALTER IGNORE TABLE reportingdetail
-> ADD PRIMARY KEY (reportingdetailID);
From the documentation:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table or
if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key errors
occur. If IGNORE is specified, only the first row is used of rows with
duplicates on a unique key. The other conflicting rows are deleted.
Incorrect values are truncated to the closest matching acceptable
value.
Adding this index will both remove duplicates and prevent any future duplicates from being inserted. If you do not want the latter behavior, just drop the index after creating it.
The following MySQL commands will create a temporary table and populate it with all columns GROUPED by one column name (the column that has duplicates) and order them by the primary key ascending. The second command creates a real table from the temporary table. The third command drops the table that is being used and finally the last command renames the second temporary table to the current being used table name.
Thats a really fast solution. Here are the four commands:
CREATE TEMPORARY TABLE videos_temp AS SELECT * FROM videos GROUP by
title ORDER BY videoid ASC;
CREATE TABLE videos_temp2 AS SELECT * FROM videos_temp;
DROP TABLE videos;
ALTER TABLE videos_temp2 RENAME videos;
This should give you duplicate entries.
SELECT `ReportingDetailID`, COUNT(`ReportingDetailID`) AS Nummber_of_Occurrences FROM reportingdetail GROUP BY `ReportingDetailID` HAVING ( COUNT(`ReportingDetailID`) > 1 )

ALTER TABLE ADD COLUMN takes a long time

I was just trying to add a column called "location" to a table (main_table) in a database. The command I run was
ALTER TABLE main_table ADD COLUMN location varchar (256);
The main_table contains > 2,000,000 rows. It keeps running for more than 2 hours and still not completed.
I tried to use mytop
to monitor the activity of this database to make sure that the query is not locked by other querying process, but it seems not. Is it supposed to take that long time? Actually, I just rebooted the machine before running this command. Now this command is still running. I am not sure what to do.
Your ALTER TABLE statement implies mysql will have to re-write every single row of the table including the new column. Since you have more than 2 million rows, I would definitely expect it takes a significant amount of time, during which your server will likely be mostly IO-bound. You'd usually find it's more performant to do the following:
CREATE TABLE main_table_new LIKE main_table;
ALTER TABLE main_table_new ADD COLUMN location VARCHAR(256);
INSERT INTO main_table_new SELECT *, NULL FROM main_table;
RENAME TABLE main_table TO main_table_old, main_table_new TO main_table;
DROP TABLE main_table_old;
This way you add the column on the empty table, and basically write the data in that new table that you are sure no-one else will be looking at without locking as much resources.
I think the appropriate answer for this is using a feature like pt-online-schema-change or gh-ost.
We have done migration of over 4 billion rows with this, though it can take upto 10 days, with less than a minute of downtime.
Percona works in a very similar fashion as above
Create a temp table
Creates triggers on the first table (for inserts, updates, deletes) so that they are replicated to the temp table
In small batches, migrate data
When done, rename table to new table, and drop the other table
You can speed up the process by temporarily turning off unique checks and foreign key checks. You can also change the algorithm that gets used.
If you want the new column to be at the end of the table, use algorithm=instant:
SET unique_checks = 0;
SET foreign_key_checks = 0;
ALTER TABLE main_table ADD location varchar(256), algorithm=instant;
SET unique_checks = 1;
SET foreign_key_checks = 1;
Otherwise, if you need the column to be in a specific location, use algorithm=inplace:
SET unique_checks = 0;
SET foreign_key_checks = 0;
ALTER TABLE main_table ADD location varchar(256) AFTER othercolumn, algorithm=inplace;
SET unique_checks = 1;
SET foreign_key_checks = 1;
For reference, it took my PC about 2 minutes to alter a table with 20 million rows using the inplace algorithm. If you're using a program like Workbench, then you may want to increase the default timeout period in your settings before starting the operation.
If you find that the operation is hanging indefinitely, then you may need to look through the list of processes and kill whatever process has a lock on the table. You can do that using these commands:
SHOW FULL PROCESSLIST;
KILL PROCESS_NUMBER_GOES_HERE;
Alter table takes a long time with a big data like in your case, so avoid to use it in such situations, and use some code like this one:
select main_table.*,
cast(null as varchar(256)) as null_location, -- any column you want accepts null
cast('' as varchar(256)) as not_null_location, --any column doesn't accept null
cast(0 as int) as not_null_int, -- int column doesn't accept null
into new_table
from main_table;
drop table main_table;
rename table new_table TO main_table;
DB2 z/OS does a virtual add of the column instantly. And puts the table into Advisory-Reorg status. Anything that runs before the reorg gets the default value or null if no default. When updates are done, they expand the rows updated. Inserts are done expanded. The next reorg expands every unexpanded row and assigns the default value to anything it expands.
Only a real database handles this well. DB2 z/OS.

Optimize mySql for faster alter table add column

I have a table that has 170,002,225 rows with about 35 columns and two indexes. I want to add a column. The alter table command took about 10 hours. Neither the processor seemed busy during that time nor were there excessive IO waits. This is on a 4 way high performance box with tons of memory.
Is this the best I can do? Is there something I can look at to optimize the add column in tuning of the db?
I faced a very similar situation in the past and i improve the performance of the operation in this way :
Create a new table (using the structure of the current table) with the new column(s) included.
execute a INSERT INTO new_table (column1,..columnN) SELECT (column1,..columnN) FROM current_table;
rename the current table
rename the new table using the name of the current table.
ALTER TABLE in MySQL is actually going to create a new table with new schema, then re-INSERT all the data and delete the old table. You might save some time by creating the new table, loading the data and then renaming the table.
From "High Performance MySQL book" (the percona guys):
The usual trick for loading MyISAM table efficiently is to disable keys, load the data and renalbe the keys:
mysql> ALTER TABLE test.load_data DISABLE KEYS;
-- load data
mysql> ALTER TABLE test.load_data ENABLE KEYS;
Well, I would recommend using latest Percona MySQL builds plus since there is the following note in MySQL manual
In other cases, MySQL creates a
temporary table, even if the data
wouldn't strictly need to be copied.
For MyISAM tables, you can speed up
the index re-creation operation (which
is the slowest part of the alteration
process) by setting the
myisam_sort_buffer_size system
variable to a high value.
You can do ALTER TABLE DISABLE KEYS first, then add column and then ALTER TABLE ENABLE KEYS. I don't see anything can be done here.
BTW, can't you go MongoDB? It doesn't rebuild anything when you add column.
Maybe you can remove the index before alter the table because what is take most of the time to build is the index?
Combining some of the comments on the other answers, this was the solution that worked for me (MySQL 5.6):
create table mytablenew like mytable;
alter table mytablenew add column col4a varchar(12) not null after col4;
alter table mytablenew drop index index1, drop index index2,...drop index indexN;
insert into mytablenew (col1,col2,...colN) select col1,col2,...colN from mytable;
alter table mytablenew add index index1 (col1), add index index2 (col2),...add index indexN (colN);
rename table mytable to mytableold, mytablenew to mytable
On a 75M row table, dropping the indexes before the insert caused the query to complete in 24 minutes rather than 43 minutes.
Other answers/comments have insert into mytablenew (col1) select (col1) from mytable, but this results in ERROR 1241 (21000): Operand should contain 1 column(s) if you have the parenthesis in the select query.
Other answers/comments have insert into mytablenew select * from mytable;, but this results in ERROR 1136 (21S01): Column count doesn't match value count at row 1 if you've already added a column.

Quickest way to delete enormous MySQL table

I have an enormous MySQL (InnoDB) database with millions of rows in the sessions table that were created by an unrelated, malfunctioning crawler running on the same server as ours. Unfortunately, I have to fix the mess now.
If I try to truncate table sessions; it seems to take an inordinately long time (upwards of 30 minutes). I don't care about the data; I just want to have the table wiped out as quickly as possible. Is there a quicker way, or will I have to just stick it out overnight?
(As this turned up high in Google's results, I thought a little more instruction might be handy.)
MySQL has a convenient way to create empty tables like existing tables, and an atomic table rename command. Together, this is a fast way to clear out data:
CREATE TABLE new_foo LIKE foo;
RENAME TABLE foo TO old_foo, new_foo TO foo;
DROP TABLE old_foo;
Done
The quickest way is to use DROP TABLE to drop the table completely and recreate it using the same definition. If you have no foreign key constraints on the table then you should do that.
If you're using MySQL version greater than 5.0.3, this will happen automatically with a TRUNCATE. You might get some useful information out of the manual as well, it describes how a TRUNCATE works with FK constraints. http://dev.mysql.com/doc/refman/5.0/en/truncate-table.html
EDIT: TRUNCATE is not the same as a drop or a DELETE FROM. For those that are confused about the differences, please check the manual link above. TRUNCATE will act the same as a drop if it can (if there are no FK's), otherwise it acts like a DELETE FROM with no where clause.
EDIT: If you have a large table, your MariaDB/MySQL is running with a binlog_format as ROW and you execute a DELETE without a predicate/WHERE clause, you are going to have issues to keep up the replication or even, to keep your Galera nodes running without hitting a flow control state. Also, binary logs can get your disk full. Be careful.
The best way I have found of doing this with MySQL is:
DELETE from table_name LIMIT 1000;
Or 10,000 (depending on how fast it happens).
Put that in a loop until all the rows are deleted.
Please do try this as it will actually work. It will take some time, but it will work.
Couldn't you grab the schema drop the table and recreate it?
drop table should be the fastest way to get rid of it.
Have you tried to use "drop"? I've used it on tables over 20GB and it always completes in seconds.
If you just want to get rid of the table altogether, why not simply drop it?
Truncate is fast, usually on the order of seconds or less. If it took 30 minutes, you probably had a case of some foreign keys referencing the table you were truncating. There may also be locking issues involved.
Truncate is effectively as efficient as one can empty a table, but you may have to remove the foreign key references unless you want those tables scrubbed as well.
We had these issues. We no longer use the database as a session store with Rails 2.x and the cookie store. However, dropping the table is a decent solution. You may want to consider stopping the mysql service, temporarily disable logging, start things up in safe mode and then do your drop/create. When done, turn on your logging again.
I'm not sure why it's taking so long. But perhaps try a rename, and recreate a blank table. Then you can drop the "extra" table without worrying how long it takes.
searlea's answer is nice, but as stated in the comments, you lose the foreign keys during the fight.
this solution is similar: the truncate is executed within a second, but you keep the foreign keys.
The trick is that we disable/enable the FK checks.
SET FOREIGN_KEY_CHECKS=0;
CREATE TABLE NewFoo LIKE Foo;
insert into NewFoo SELECT * from Foo where What_You_Want_To_Keep
truncate table Foo;
insert into Foo SELECT * from NewFoo;
SET FOREIGN_KEY_CHECKS=1;
Extended answer - Delete all but some rows
My problem was: Because of a crazy script, my table was for with 7.000.000 junk rows. I needed to delete 99% of data in this table, this is why i needed to copy What I Want To Keep in a tmp table before deleteting.
These Foo Rows i needed to keep were depending on other tables, that have foreign keys, and indexes.
something like that:
insert into NewFoo SELECT * from Foo where ID in (
SELECT distinct FooID from TableA
union SELECT distinct FooID from TableB
union SELECT distinct FooID from TableC
)
but this query was always timing out after 1 hour.
So i had to do it like this:
CREATE TEMPORARY TABLE tmpFooIDS ENGINE=MEMORY AS (SELECT distinct FooID from TableA);
insert into tmpFooIDS SELECT distinct FooID from TableB
insert into tmpFooIDS SELECT distinct FooID from TableC
insert into NewFoo SELECT * from Foo where ID in (select ID from tmpFooIDS);
I theory, because indexes are setup correctly, i think both ways of populating NewFoo should have been the same, but practicaly it didn't.
This is why in some cases, you could do like this:
SET FOREIGN_KEY_CHECKS=0;
CREATE TABLE NewFoo LIKE Foo;
-- Alternative way of keeping some data.
CREATE TEMPORARY TABLE tmpFooIDS ENGINE=MEMORY AS (SELECT * from Foo where What_You_Want_To_Keep);
insert into tmpFooIDS SELECT ID from Foo left join Bar where OtherStuff_You_Want_To_Keep_Using_Bar
insert into NewFoo SELECT * from Foo where ID in (select ID from tmpFooIDS);
truncate table Foo;
insert into Foo SELECT * from NewFoo;
SET FOREIGN_KEY_CHECKS=1;