MySQL ADD COLUMN slow under AWS RDS - mysql

I have an RDS MySql with the following settings:
Class: db.m5.xlarge
Storage: Prosisionned 1000 IOPS (SSD)
I then want to add a few columns to a table that is about 20 GB in size (according to INFORMATION_SCHEMA.files). Here's my statement:
ALTER TABLE MY_TABLE
ADD COLUMN NEW_COLUMN_1 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_2 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_3 INT(10) UNSIGNED NULL,
ADD CONSTRAINT SOME_CONSTRAINT FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK),
ADD COLUMN NEW_COLUMN_4 DATE NULL;
This query took 172 minutes to execute. Most of this time was spent coping the data to a temporary table.
During that operation, there were no other queries (read or write) being executed. I had the database just for myself. SHOW FULL PROCESSLIST was saying that State was equal to copy to tmp table for my query.
What I don't understand is that the the AWS RDS Console tells me that the write througput was between 30 MB/s and 35 MB/s for 172 minutes.
Assuming a write througput of 30 MB/s, I should have been able to write 30 * 60 * 172 = 309600 MB = 302 GB. This is much bigger than the size of the temporary table that was created during the opration (20 GB).
So two questions:
what is mysql/rds writing beside my temp table? Is there a way to disable that so that I can get the full bandwidth to create the temp table?
is there any way to accelerate that operation? Taking 3 hours to write 20 GB of data seems pretty long.

I was using MySQL 5.7. According to this MySQL blog post, version 8.0 improved the situation: "InnoDB now supports Instant ADD COLUMN".
I therefore changed my query to use the new feature.
-- Completes in 0.375 seconds!
ALTER TABLE MY_TABLE
ADD COLUMN NEW_COLUMN_1 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_2 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_3 INT(10) UNSIGNED NULL,
-- 'ALGORITHM=INSTANT' is not compatible with foreign keys.
-- The foreign key will need to be added in another statement
-- ADD CONSTRAINT SOME_CONSTRAINT FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK),
ADD COLUMN NEW_COLUMN_4 DATE NULL,
-- the new option
ALGORITHM=INSTANT;
-- This completed in about 6 minutes.
-- Adding the foreign creates an index under the hood.
-- This index was 1.5 GB big.
SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE MY_TABLE
ADD FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK);
SET FOREIGN_KEY_CHECKS=1;
So my conclusions:
upgrade to MySQL 8 if you can
make sure that you always use (when possible) the ALGORITHM=INSTANT option.

InnoDB is probably the storage engine you are using, since it's the default storage engine. InnoDB does some I/O that might seem redundant, to ensure there is no data loss.
For example:
Data and index pages modified in the buffer pool must be written to the tablespace. The table may need to split some pages during the process of adding columns, because the rows become wider, and fewer rows fit per page.
During writing pages to the tablespace, InnoDB first writes those pages to the doublewrite buffer, to ensure against data loss if there's a crash during a page write.
Transactions are written to the InnoDB redo log, and this may even result in multiple overwrites to the same block in the log.
Transactions are also written to the binary log if it is enabled for purposes of replication. Though this shouldn't be a big cost in the cast of an ALTER TABLE statement, because DDL statements are always written to the binary log in statement format, not in row format.
You also asked what can be done to speed up the ALTER TABLE. The reason to want it to run faster is usually because during an ALTER TABLE, the table is locked and may block concurrent queries.
At my company, we use the free tool pt-online-schema-change, so we can continue to use the table more or less freely while it is being altered. It actually takes longer to complete the alter this way, but it's not so inconvenient since it doesn't block our access to the table.

Related

Create foreign key on MySQL table takes forever with copy to tmp table

I am trying to set a foreign key constraint on a 5.7 InnoDB table with 30M+ rows.
It now already runs for 45 minutes on a quad core 64GB server. The processlist outputs the state copy to tmp table for the issued alter table command.
InnoDB_buffer_pool_size is set to 32G and has room.
Why does the system create a tmp table and can this somehow be increased in performance?
It's likely that the time is being taken building an index for that foreign key. If you already had an index where the foreign key column(s) were the leftmost columns of the index, then it would use that index and not build a new one.
45 minutes doesn't sound like an unusual amount of time to build an index on such a large table. You haven't said what the data type of the foreign key column(s) are, so perhaps it's a large varchar or something and it is taking many gigabytes to build that index.
Perhaps your server's disk is too slow. If you're using non-SSD storage, or remote storage (like Amazon EBS), it's slow by modern standards.
The CPU cores isn't going to make any difference, because the work is being done in one thread anyway. A faster CPU speed would help, but not more cores.
At my company, we use pt-online-schema-change to apply all schema changes or index builds. This allows clients to read and write the table concurrently, so it doesn't matter that it takes 45 minutes or 90 minutes or even longer. Eventually it finishes, and swaps the new table for the old table.
Attention! This disables key checking so know what you are doing, in some cases this is not recommended, but can help many people so I think it's worth answering.
I had this problem this week, I have a client that still have mySQL 5.5, so I had to make it work. You just need to disable keys checking, as well as put your application down for maintenance (so you don't have any operations).
Before creating your FK or adding a column, use:
ALTER TABLE table_name DISABLE KEYS;
Then run your command, my table with 1M rows took only 57 seconds.
Then you run:
ALTER TABLE table_name ENABLE KEYS;

Reclaiming space from a small but heavily used MySQL 8 table

I have a production mysql 8 server that has a table for user sessions for a PHP application. I am using innodb_file_per_table. The table is small at any given time (about 300-1000 rows), but rows are constantly being deleted and added. Without interference, the sessions.ibd file slowly grows until it takes up all available disk space. This morning, the table was 300 records and took up over 90GB. This built up over the long term (months).
Running OPTIMIZE TABLE reclaims all of the disk space and brings the table back under 100M. An easy solution would be to make a cron script that runs OPTIMIZE TABLE once a week during our maintenance period. Another proposed suggestion is to convert the table to a MyISAM table, since it doesn't really require any of the features of InnoDB. Both of these solutions should be effective, but they are table specific and don't protect against the general problem. I'd like to know whether there is a solution to this problem that involves database configuration.
Here are the non-default innodb configuration options that we're using:
innodb-flush-log-at-trx-commit = 1
innodb-buffer-pool-size = 24G
innodb-log-file-size = 512M
innodb-buffer-pool-instances = 8
Are there other options we should be using so that the sessions.ibd file doesn't continually grow?
Here is the CREATE TABLE for the table:
CREATE TABLE `sessions` (
`id` varchar(255) NOT NULL DEFAULT '',
`data` mediumtext,
`expires` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
In addition to additions and subtractions, the data column is updated often.
MyISAM would have a different problem -- fragmentation. After a delete, there is a hole in the table. The hole is filled in first. Then a link is made to the next piece of the record. Eventually fetching a row would involve jumping around most of the table.
If 300 rows takes 100MB, a row averages 333KB? That's rather large. And does that number vary a lot? Do you have a lot of text/blob columns? Do they change often? Or is it just delete and add? Would you care to share SHOW CREATE TABLE.
I can't think how a table could grow by a factor of 900 without having at least one very multi-GB row added, then deleted. Perhaps with the schema, I could think of some cause and/or workaround.

InnoDB Performance on primary index added/altering

So I have a huge update where I have to insert around 40gb data into an innodb table. Its taking quite a while, so Im wondering which method would be the fastest (and more importantly why, as I could just do a split test).
Method 1)
a) Insert all rows
b) create ALTER TABLE su_tmp_matches ADD PRIMARY KEY ( id )
Method 2)
a) ALTER TABLE su_tmp_matches ADD PRIMARY KEY ( id )
b) Insert all rows
Currently we are using method 1, but the step b) seems to take a shitload of time. So Im wondering if there is any implication of the size here (40gb - 5 million rows).
---- so I decided to test this as well ---
Pretty quick brand new mysql server - loads and loads of ram, and fast ram, fast discs as well, and pretty tuned up (we have more than 5000 requests per second on one pieces):
1,6 mio rows / 6gb data:
81 seconds to "delete" a primary index
550 seconds to "add" a primary index (after data is added)
120 seconds to create a copy of the table with the primary index create BEFORE data insert
80 seconds to create a copy of the table without the primary index (which then is 550 seconds to create afterwards)
Seems pretty absurd - question is, if indexes are the same thing.
From the documentation :
InnoDB does not have a special optimization for separate index
creation the way the MyISAM storage engine does. Therefore, it does
not pay to export and import the table and create indexes afterward.
The fastest way to alter a table to InnoDB is to do the inserts
directly to an InnoDB table.
It seems to me that adding the constraint of unicity before the insert could only help the engine if your column having a primary key is an autoincremented integer. But I really doubt there would be a notable difference.
A useful recommendation :
During the conversion of big tables, increase the size of the InnoDB
buffer pool to reduce disk I/O, to a maximum of 80% of physical
memory. You can also increase the sizes of the InnoDB log files.
EDIT : as by experience MySQL doesn't always perform as expected from the documentation performance-wise, I think any benchmark you do on this would be interesting, even if not a definite answer per se.

MySQL Partitioning: Simultaneous insertion to different partitions performance

I have a partitioned InnoDB mysql table, and I need to insert hundreds of millions of rows.
I am currently using the LOAD DATA INFILE command for loading many (think 10's of thousands) of .csv files into said table.
What are the performance implications if I simultaneously insert large blocks of data into different distinct partitions?
Might I benefit from running multiple processes which each run batches of LOAD DATA INFILE statements?
Miscellaneous information:
Hardware: Intel i7, 24GB ram, Ubuntu 10.04 w/ MySQL 5.5.11, Raid 1 storage
#mysql on freenode IRC have told me that the performance implications will be the same as with normal InnoDB or MyISAM - InnoDB will do row-level locking and MyISAM will do table-level locking.
Table Structure:
CREATE TABLE `my_table` (
`short_name` varchar(10) NOT NULL,
`specific_info` varchar(20) NOT NULL,
`date_of_inquiry` datetime DEFAULT NULL,
`price_paid` decimal(8,2) DEFAULT NULL,
`details` varchar(255) DEFAULT '',
UNIQUE KEY `unique_record` (`short_name`,`specific_info`,`date_of_inquiry`),
KEY `short_name` (`short_name`),
KEY `underlying_quotedate` (`short_name`,`date_of_inquiry`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50500 PARTITION BY LIST COLUMNS(short_name)*/
(PARTITION pTOYS_R_US VALUES IN ('TOYS-R-US') ENGINE = InnoDB,
PARTITION pZAPPOS VALUES IN ('ZAPPOS') ENGINE = InnoDB,
PARTITION pDC VALUES IN ('DC') ENGINE = InnoDB,
PARTITION pGUCCI VALUES IN ('GUCCI') ENGINE = InnoDB,
...on and on...
);
Not a full list, but some pointers...
The fastest way to insert rows is to use LOAD DATA INFILE
See: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
If that's not an option and you want to speed up things, you'll need to find the bottleneck and optimize for that.
If the partitions are across a network, network traffic might kill you same for CPU, disk I/O and memory, only profiling a sample will tell.
Disable key updates
If you cannot do load data infile make sure you disable key updates
ALTER TABLE table1 DISABLE KEYS
... lots of inserts
ALTER TABLE table1 ENABLE KEYS
Note that disable key updates only disables non-unique keys, unique keys are always updated.
Binary log
If you have the binary log running, this will record all those inserts, consider disabling it, you can disable it with MySQL running by using a symlink and pointing that to /dev/null for the duration of the mass insert.
If you want the binary log to persist, you can do a simultaneous insert to a parallel database with blackhole tables and binary log enabled.
Autoincrement key
If you let MySQL calculate the autoincrement key this will create contention around the key generation. Consider feeding MySQL a precalculated autoincrementing primay key value instead of NULL
Unique keys
Unique keys are checked on every insert (for uniqueness) and they eat a lot of time. Because MySQL needs to do a full scan on that index on every insert.
If you know that the values that you insert are unique, it's better to drop that requirement and add it after you are done.
When you add it back in MySQL will take a lot of time checking, but at least it will do it only once, not on every insert.
If you want to get maximum I/O performance from it you'll want the different partitions on different disks volumes.
I'm not sure about the performance implications if all of the partitions are on the same physical disks but obviously you're more likely to run out of I/O capacity that way.
It's likely to depend on your machine specs, but for what it's worth I've tried this and it definitely speeds things up for my specific task. Ie, it takes me about an hour to load all the data into one partition. If I don't partition, I have to perform the task serially so it takes 12 * 1 = 12 hours. However, on my machine with 24 cores, I can parallelize the task to complete in just 1 hour.

high overhead in new mysql table

does anyone knows why I get an overhead of 131.0 MiB on a newly created table (zero rows)?
im using phpmy admin and the code of my script is
CREATE TABLE IF NOT EXISTS `mydb`.`mytable` (
`idRol` INT NOT NULL AUTO_INCREMENT ,
`Rol` VARCHAR(45) NOT NULL ,
PRIMARY KEY (`idRol`) )
ENGINE = InnoDB;
thanks in advance.
InnoDB uses a shared table space. That means that per default all the tables regardless of database are stored in a single file in the filesystem. This differs from for example MyISAM which stores every table as a single file.
The behaviour of InnoDB can be changed, although I don't think it's really necessary in this case. See Using Per-Table Tablespaces.
The overhead is probably the space left by deleted rows, and InnoDB will reuse it when you insert new data. It's nothing to be concerned about.
It might be because mysql generated an index on 'idRol'
Storing an index takes some space, but I am not sure if this is the reason. It's only a guess. I'm not a DBA.