ALTER TABLE ADD COLUMN takes a long time - mysql

I was just trying to add a column called "location" to a table (main_table) in a database. The command I run was
ALTER TABLE main_table ADD COLUMN location varchar (256);
The main_table contains > 2,000,000 rows. It keeps running for more than 2 hours and still not completed.
I tried to use mytop
to monitor the activity of this database to make sure that the query is not locked by other querying process, but it seems not. Is it supposed to take that long time? Actually, I just rebooted the machine before running this command. Now this command is still running. I am not sure what to do.

Your ALTER TABLE statement implies mysql will have to re-write every single row of the table including the new column. Since you have more than 2 million rows, I would definitely expect it takes a significant amount of time, during which your server will likely be mostly IO-bound. You'd usually find it's more performant to do the following:
CREATE TABLE main_table_new LIKE main_table;
ALTER TABLE main_table_new ADD COLUMN location VARCHAR(256);
INSERT INTO main_table_new SELECT *, NULL FROM main_table;
RENAME TABLE main_table TO main_table_old, main_table_new TO main_table;
DROP TABLE main_table_old;
This way you add the column on the empty table, and basically write the data in that new table that you are sure no-one else will be looking at without locking as much resources.

I think the appropriate answer for this is using a feature like pt-online-schema-change or gh-ost.
We have done migration of over 4 billion rows with this, though it can take upto 10 days, with less than a minute of downtime.
Percona works in a very similar fashion as above
Create a temp table
Creates triggers on the first table (for inserts, updates, deletes) so that they are replicated to the temp table
In small batches, migrate data
When done, rename table to new table, and drop the other table

You can speed up the process by temporarily turning off unique checks and foreign key checks. You can also change the algorithm that gets used.
If you want the new column to be at the end of the table, use algorithm=instant:
SET unique_checks = 0;
SET foreign_key_checks = 0;
ALTER TABLE main_table ADD location varchar(256), algorithm=instant;
SET unique_checks = 1;
SET foreign_key_checks = 1;
Otherwise, if you need the column to be in a specific location, use algorithm=inplace:
SET unique_checks = 0;
SET foreign_key_checks = 0;
ALTER TABLE main_table ADD location varchar(256) AFTER othercolumn, algorithm=inplace;
SET unique_checks = 1;
SET foreign_key_checks = 1;
For reference, it took my PC about 2 minutes to alter a table with 20 million rows using the inplace algorithm. If you're using a program like Workbench, then you may want to increase the default timeout period in your settings before starting the operation.
If you find that the operation is hanging indefinitely, then you may need to look through the list of processes and kill whatever process has a lock on the table. You can do that using these commands:
SHOW FULL PROCESSLIST;
KILL PROCESS_NUMBER_GOES_HERE;

Alter table takes a long time with a big data like in your case, so avoid to use it in such situations, and use some code like this one:
select main_table.*,
cast(null as varchar(256)) as null_location, -- any column you want accepts null
cast('' as varchar(256)) as not_null_location, --any column doesn't accept null
cast(0 as int) as not_null_int, -- int column doesn't accept null
into new_table
from main_table;
drop table main_table;
rename table new_table TO main_table;

DB2 z/OS does a virtual add of the column instantly. And puts the table into Advisory-Reorg status. Anything that runs before the reorg gets the default value or null if no default. When updates are done, they expand the rows updated. Inserts are done expanded. The next reorg expands every unexpanded row and assigns the default value to anything it expands.
Only a real database handles this well. DB2 z/OS.

Related

MySQL: Waiting for metadata lock with ALGORITHM=COPY

I'm trying to execute an ALTER TABLE in MySQL. MySQL only lets me execute it with the ALGORITHM=COPY (because I need to change the type of a column).
There aren't queries using that table (to write neither to read).
But, I don't know why, when I execute the ALTER there are queries (UPDATES) which are not using this table (they are in a transaction) locked. MySQL says "mysql waiting for metadata lock".
So the question is, why the query is waiting for metadata lock if the UPDATE is not using the table altered?
I read some doc:
https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-performance
https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-performance.html#innodb-online-ddl-locking-options
But I don't understand why the queries are locked for metadata.
Reproduction of the problem in dev environment:
First, do the alter:
ALTER TABLE API.SEARCHES_ELEMENTS
MODIFY COLUMN TYPE ENUM('A', 'B') NOT NULL,
ALGORITHM=COPY;
Second, change values in other tables (there isn't transaction):
UPDATE CLIENTS
SET NAME = CONCAT('test-', RAND())
WHERE ID_CLIENT = 1;
The locks:
SELECT *
FROM performance_schema.metadata_locks
INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
WHERE
PROCESSLIST_ID <> CONNECTION_ID();
Maybe the problem is due the lock over the SCHEMA?
Seems like the idea is to avoid the ALGORITHM=COPY (rebuilds without in-place mode)
So instead of modify the column type
ALTER TABLE API.SEARCHES_ELEMENTS
MODIFY COLUMN TYPE ENUM('A', 'B') NOT NULL,
ALGORITHM=COPY;
is better to create a new column, copy the data and remove the old one:
ALTER TABLE API.SEARCHES_ELEMENTS
ADD COLUMN TYPE_NEW ENUM('A', 'B') NOT NULL AFTER TYPE,
ALGORITHM=INSTANT;
LOCK TABLES API.SEARCHES_ELEMENTS WRITE;
UPDATE API.SEARCHES_ELEMENTS SET TYPE_NEW = TYPE;
ALTER TABLE API.SEARCHES_ELEMENTS
RENAME COLUMN TYPE TO TYPE_OLD,
RENAME COLUMN TYPE_NEW TO TYPE,
ALGORITHM=INSTANT;
UNLOCK TABLES;
ALTER TABLE API.SEARCHES_ELEMENTS
DROP COLUMN TYPE_OLD,
ALGORITHM=INPLACE;
Note: adding a value in ENUM might be use the algorithm=instant
Modifying the definition of an ENUM or SET column by adding new
enumeration or set members to the end of the list of valid member
values may be performed instantly or in place, as long as the storage
size of the data type does not change. For example, adding a member to
a SET column that has 8 members changes the required storage per value
from 1 byte to 2 bytes; this requires a table copy. Adding members in
the middle of the list causes renumbering of existing members, which
requires a table copy.
A tables metadata is not only locked when a running query is using it, but also if it has previously been used in an active transaction until that transaction commits or rolls back, to prevent the table from changing while still being referenced by the transaction.
If you are running at least MySQL 5.7 and have the performance_schema enabled you can check for current metadata locks via the performance_schema.metadata_locks table.
See also:
https://dev.mysql.com/doc/refman/5.7/en/performance-schema-metadata-locks-table.html
https://dev.mysql.com/doc/refman/5.7/en/metadata-locking.html

Fastest way to remove a HUGE set of row keys from a table via primary key? [duplicate]

I have two tables. Let's call them KEY and VALUE.
KEY is small, somewhere around 1.000.000 records.
VALUE is huge, say 1.000.000.000 records.
Between them there is a connection such that each KEY might have many VALUES. It's not a foreign key but basically the same meaning.
The DDL looks like this
create table KEY (
key_id int,
primary key (key_id)
);
create table VALUE (
key_id int,
value_id int,
primary key (key_id, value_id)
);
Now, my problem. About half of all key_ids in VALUE have been deleted from KEY and I need to delete them in a orderly fashion while both tables are still under high load.
It would be easy to do
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null;
However, as it's not allowed to have a limit on multi table delete I don't like this approach. Such a delete would take hours to run and that makes it impossible to throttle the deletes.
Another approach is to create cursor to find all missing key_ids and delete them one by one with a limit. That seems very slow and kind of backwards.
Are there any other options? Some nice tricks that could help?
Any solution that tries to delete so much data in one transaction is going to overwhelm the rollback segment and cause a lot of performance problems.
A good tool to help is pt-archiver. It performs incremental operations on moderate-sized batches of rows, as efficiently as possible. pt-archiver can copy, move, or delete rows depending on options.
The documentation includes an example of deleting orphaned rows, which is exactly your scenario:
pt-archiver --source h=host,D=db,t=VALUE --purge \
--where 'NOT EXISTS(SELECT * FROM `KEY` WHERE key_id=`VALUE`.key_id)' \
--limit 1000 --commit-each
Executing this will take significantly longer to delete the data, but it won't use too many resources, and without interrupting service on your existing database. I have used it successfully to purge hundreds of millions of rows of outdated data.
pt-archiver is part of the Percona Toolkit for MySQL, a free (GPL) set of scripts that help common tasks with MySQL and compatible databases.
Directly from MySQL documentation
If you are deleting many rows from a large table, you may exceed the
lock table size for an InnoDB table. To avoid this problem, or simply
to minimize the time that the table remains locked, the following
strategy (which does not use DELETE at all) might be helpful:
Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO t_copy SELECT * FROM t WHERE ... ;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE t TO t_old, t_copy TO t;
Drop the original table:
DROP TABLE t_old;
No other sessions can access the tables involved while RENAME TABLE
executes, so the rename operation is not subject to concurrency
problems. See Section 12.1.9, “RENAME TABLE Syntax”.
So in Your case You may do
INSERT INTO value_copy SELECT * FROM VALUE WHERE key_id IN
(SELECT key_id FROM `KEY`);
RENAME TABLE value TO value_old, value_copy TO value;
DROP TABLE value_old;
And according to what they wrote here RENAME operation is quick and number of records doesn't affect it.
What about this for having a limit?
delete x
from `VALUE` x
join (select key_id, value_id
from `VALUE` v
left join `KEY` k using (key_id)
where k.key_id is null
limit 1000) y
on x.key_id = y.key_id AND x.value_id = y.value_id;
First, examine your data. Find the keys which have too many values to be deleted "fast". Then find out which times during the day you have the smallest load on the system. Perform the deletion of the "bad" keys during that time. For the rest, start deleting them one by one with some downtime between deletes so that you don't put to much pressure on the database while you do it.
May be instead of limit divide whole set of rows into small parts by key_id:
delete v
from VALUE v
left join KEY k using (key_id)
where k.key_id is null and v.key_id > 0 and v.key_id < 100000;
then delete rows with key_id in 100000..200000 and so on.
You can try to delete in separated transaction batches.
This is for MSSQL, but should be similar.
declare #i INT
declare #step INT
set #i = 0
set #step = 100000
while (#i< (select max(VALUE.key_id) from VALUE))
BEGIN
BEGIN TRANSACTION
delete from VALUE where
VALUE.key_id between #i and #i+#step and
not exists(select 1 from KEY where KEY.key_id = VALUE.key_id and KEY.key_id between #i and #i+#step)
set #i = (#i+#step)
COMMIT TRANSACTION
END
Create a temporary table!
drop table if exists batch_to_delete;
create temporary table batch_to_delete as
select v.* from `VALUE` v
left join `KEY` k on k.key_id = v.key_id
where k.key_id is null
limit 10000; -- tailor batch size to your taste
-- optional but may help for large batch size
create index batch_to_delete_ix_key on batch_to_delete(key_id);
create index batch_to_delete_ix_value on batch_to_delete(value_id);
-- do the actual delete
delete v from `VALUE` v
join batch_to_delete d on d.key_id = v.key_id and d.value_id = v.value_id;
To me this is a kind of task the progress of which I would want to see in a log file. And I would avoid solving this in pure SQL, I would use some scripting in Python or other similar language. Another thing that would bother me is that lots of LEFT JOINs with WHERE IS NOT NULL between the tables might cause unwanted locks, so I would avoid JOINs either.
Here is some pseudo code:
max_key = select_db('SELECT MAX(key) FROM VALUE')
while max_key > 0:
cur_range = range(max_key, max_key-100, -1)
good_keys = select_db('SELECT key FROM KEY WHERE key IN (%s)' % cur_range)
keys_to_del = set(cur_range) - set(good_keys)
while 1:
deleted_count = update_db('DELETE FROM VALUE WHERE key IN (%s) LIMIT 1000' % keys_to_del)
db_commit
log_something
if not deleted_count:
break
max_key -= 100
This should not bother the rest of the system very much, but may take long. Another issue is to optimize the table after you deleted all those rows, but this is another story.
If the target columns are properly indexed this should go fast,
DELETE FROM `VALUE`
WHERE NOT EXISTS(SELECT 1 FROM `key` k WHERE k.key_id = `VALUE`.key_id)
-- ORDER BY key_id, value_id -- order by PK is good idea, but check the performance first.
LIMIT 1000
Alter the limit from 10 to 10000 to get acceptable performance, and rerun it several times.
Also take in mind that this mass deletes will perform locks and backups for each row ..
multiple the execution time for each row several times ...
There are some advanced methods to prevent this, but the easiest workaround
is just to put a transaction around this query.
Do you have SLAVE or Dev/Test environment with same data?
The first step is to find out your data distribution if you are worried about a particular key having 1 million value_ids
SELECT v.key_id, COUNT(IFNULL(k.key_id,1)) AS cnt
FROM `value` v LEFT JOIN `key` k USING (key_id)
WHERE k.key_id IS NULL
GROUP BY v.key_id ;
EXPLAIN PLAN for above query is much better than adding
ORDER BY COUNT(IFNULL(k.key_id,1)) DESC ;
Since you don't have partitioning on key_id (too many partitions in your case) and want to keep database running during your delete process, the option is to delete in chucks with SLEEP() between different key_id deletes to avoid overwhelming server. Don't forget to keep an eye on your binary logs to avoid disk filling.
The quickest way is :
Stop application so data is not changed.
Dump key_id and value_id from VALUE table with only matching key_id in KEY table by using
mysqldump YOUR_DATABASE_NAME value --where="key_id in (select key_id from YOUR_DATABASE_NAME.key)" --lock-all --opt --quick --quote-names --skip-extended-insert > VALUE_DATA.txt
Truncate VALUE table
Load data exported in step 2
Start Application
As always, try this in Dev/Test environment with Prod data and same infrastructure so you can calculate downtime.
Hope this helps.
I am just curious what the effect would be of adding a non-unique index on key_id in table VALUE. Selectivity is not high at all (~0.001) but I am curious how that would affect the join performance.
Why don't you split your VALUE table into several ones according to some rule like key_id module some power of 2 (like 256 for example)?

Can i adjust the value of an auto-incremented field in the database automatically?

Can i adjust the value of an auto-incremented field in the database automatically?
I have a table called "post" which has a field called "pid" which is set to auto-increment.
Posts from this table may be deleted by the user at a later time, but the auto- incremented value will not be adjusted. Is there a way to adjust the pid field everytime posts are deleted?
for eg:If i have 4 entries: pid=1,2,3,4(pid-auto-increment)
Now if i delete 2, is there a way to update 3 to 2 and 4 to 3 and so on ?
Why would you need to adjust the auto-increment? Each post is uniquely identified using the pid and if that is to change, then the whole DB structure will fail. The idea of the auto-increment is based on this principle and that you don't have to worry about assigning numbers yourself.
If deleting a record is a problem, then you might want to keep it in the database and flag it as deleted. They you can use this flag to show / hide from the users.
Deletion from end
You can manually set AUTO_INCREMENT of a table to a specified value via
ALTER TABLE tbl AUTO_INCREMENT = val;
See Using AUTO_INCREMENT in MySQL manual.
This solves deletion from end – before adding new rows, set AUTO_INCREMENT to 0 and it will be automatically set to current maximum plus one. Newly inserted rows will occupy the same IDs as the deleted ones.
Deletion from anywhere – renumbering
It is possible to manually specify value of the field having AUTO_INCREMENT. AUTO_INCREMENT is ignored them. If you specify a value already used, unique constraint will abort the query. If you specify a value that is bigger than the current maximum, AUTO_INCREMENT automatically set to this one plus one.
If you do not want to manually renumber the records, write a script for that, nor mess with stored procedures, you can use user-defined variables:
SET #id = 0;
UPDATE tbl SET id = #id := #id + 1 ORDER BY id;
SET #alt = CONCAT('ALTER TABLE tbl AUTO_INCREMENT = ', #id + 1);
PREPARE aifix FROM #alt;
EXECUTE aifix;
DEALLOCATE PREPARE aifix;
Example use
http://www.paulwhippconsulting.com.au/webdevelopment/31-renumbering-an-qorderingq-field-in-mysql
http://www.it-iss.com/mysql/mysql-renumber-field-values/
For more info see my answer to a related question.
Warning – this may be harmful!
Usually there is no need to renumber the records. Actually it may be harmful as you may have dangling references to the old record somewhere (possibly outside the DB) and they now become valid again, which could cause confusion. This is why AUTO_INCREMENT attribute of the table is not decremented after a row is deleted.
You should just delete the unwanted records and stop worrying about the holes. They are just in the numbering of the records, purely logical, physically they don’t need to exist in the storage. No space wasted in the long time perspective. For some time the storage really has holes. You can let the DB engine get rid of them by OPTIMIZE TABLE tbl or ALTER TABLE tbl ORDER BY column.

Fast way to populate a relational database in MySQL using JDBC?

I am trying to implement simple program in Java that will be used to populate a MySQL database from a CSV source file. For each row in the CSV file, I need to execute following sequence of SQL statements (example in pseudo code):
execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...
There are three basic problems:
1. Database is not on localhost so each single INSERT/SELECT has latency and this is the basic problem
2. CSV file contains millions of rows (like 15 000 000) so it takes too long.
3. I cannot modify the database structure (add extra tables, disable keys etc).
I was wondering how can I speed up the INSERT/SELECT process? Currently 80% of the execution time is consumed by communication.
I already tried to group the above statements and execute them as batch but because of LAST_INSERT_ID it does not work. In any other cases it takes too long (see point 1).
Fastest way is to let MySQL parse the CSV and load records into the table. For that, you can use "LOAD DATA INFILE":
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
It works even better if you can transfer the file to server or keep it on a shared directory that is accessible to server.
Once that is done, you can have a column that indicates whether the records has been processed or not. Its value should be false by default.
Once data is loaded, you can pick up all records where processed=false.
For all such records you can populate table 2 and 3.
Since all these operation would happen on server, server <> client latency would not come into the picture.
Feed the data into a blackhole
CREATE TABLE `test`.`blackhole` (
`t1_f1` int(10) unsigned NOT NULL,
`t1_f2` int(10) unsigned NOT NULL,
`t2_f1` ... and so on for all the tables and all the fields.
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;
Note that this is a blackhole table, so the data is going nowhere.
However you can create a trigger on the blackhole table, something like this.
And pass it on using a trigger
delimiter $$
create trigger ai_blackhole_each after insert on blackhole for each row
begin
declare lastid_t1 integer;
declare lastid_t2 integer;
insert into table1 values(new.t1_f1, new.t1_f2);
select last_insert_id() into lastid_t1;
insert into table2 values(new.t2_f1, new.t2_f1, lastid_t1);
etc....
end$$
delimiter ;
Now you can feed the blackhole table with a single insert statement at full speed and even insert multiple rows in one go.
insert into blackhole values(a,b,c,d,e,f,g,h),(....),(...)...
Disable index updates to speed things up
ALTER TABLE $tbl_name DISABLE KEYS;
....Lot of inserts
ALTER TABLE $tbl_name ENABLE KEYS;
Will disable all non-unique key updates and speed up the insert. (an autoincrement key is unique, so that's not affected)
If you have any unique keys and you don't want MySQL to check for them during the mass-insert, make sure you do an alter table to eliminate the unique key and enable it afterwards.
Note that the alter table to put the unique key back in will take a long time.

Delete, Truncate or Drop to clean out a table in MySQL

I am attempting to clean out a table but not get rid of the actual structure of the table. I have an id column that is auto-incrementing; I don't need to keep the ID number, but I do need it to keep its auto-incrementing characteristic. I've found delete and truncate but I'm worried one of these will completely drop the entire table rendering future insert commands useless.
How do I remove all of the records from the table so that I can insert new data?
drop table will remove the entire table with data
delete * from table will remove the data, leaving the autoincrement values alone. it also takes a while if there's a lot of data in the table.
truncate table will remove the data, reset the autoincrement values (but leave them as autoincrement columns, so it'll just start at 1 and go up from there again), and is very quick.
TRUNCATE will reset your auto-increment seed (on InnoDB tables, at least), although you could note its value before truncating and re-set accordingly afterwards using alter table:
ALTER TABLE t2 AUTO_INCREMENT = value
Drop will do just that....drop the table in question, unless the table is a parent to another table.
Delete will remove all the data that meets the condition; if no condition is specified, it'll remove all the data in the table.
Truncate is similar to delete; however, it resets the auto_increment counter back to 1 (or the initial starting value). However, it's better to use truncate over delete because delete removes the data by each row, thus having a performance hit than truncate. However, truncate will not work on InnoDB tables where referential integrity is enforced unless it is turned off before the truncate command is issued.
So, relax; unless you issue a drop command on the table, it won't be dropped.
Truncate table is what you are looking for
http://www.1keydata.com/sql/sqltruncate.html
Another possibility involves creating an empty copy of the table, setting the AUTO_INCREMENT (with some eventual leeway for insertions during the non-atomic operation) and then rotating both :
CREATE TABLE t2_new LIKE t2;
SELECT #newautoinc:=auto_increment /*+[leeway]*/
FROM information_schema.tables
WHERE table_name='t2';
SET #query = CONCAT("ALTER TABLE t2_new AUTO_INCREMENT = ", #newautoinc);
PREPARE stmt FROM #query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
RENAME TABLE t2 TO t2_old, t2_new TO t2;
And then, you have the extra advantage of being still able to change your mind before removing the old table.
If you reconsider your decision, you can still bring back old records from the table before the operation:
INSERT /*IGNORE*/ INTO t2 SELECT * FROM t2_old /*WHERE [condition]*/;
When you're good you can drop the old table:
DROP TABLE t2_old;
I've just come across a situation where DELETE is drastically affecting SELECT performance compared to TRUNCATE on a full-text InnoDB query.
If I DELETE all rows and then repopulate the table (1million rows), a typical query takes 1s to come back.
If instead I TRUNCATE the table, and repopulate it in exactly the same way, a typical query takes 0.05s to come back.
YMMV, but for whatever reason for me on MariaDB 10.3.15-MariaDB-log DELETE seems to be ruining my index.