MySQL performance of adding a column to a large table - mysql

I have MySQL 5.5.37 with InnoDB installed locally with apt-get on Ubuntu 13.10. My machine is i7-3770 + 32Gb memory + SSD hard drive on my desktop. For a table "mytable" which contains only 1.5 million records the following DDL query takes more than 20 min (!):
ALTER TABLE mytable ADD some_column CHAR(1) NOT NULL DEFAULT 'N';
Is there a way to improve it?
I checked
show processlist;
and it was showing that it is copying my table for some reason.
It is disturbingly inconvenient. Is there a way to turn off this copy?
Are there other ways to improve performance of adding a column to a large table?
Other than that my DB is relatively small with only 1.3Gb dump size. Therefore it should (in theory) fit 100% in memory.
Are there settings which can help?
Would migration to Precona change anything for me?
Add: I have
innodb_buffer_pool_size = 134217728

Are there other ways to improve performance of adding a column to a large table?
Short answer: no. You may add ENUM and SET values instantly, and you may add secondary indexes while locking only for writes, but altering table structure always requires a table copy.
Long answer: your real problem isn't really performance, but the lock time. It doesn't matter if it's slow, it only matters that other clients can't perform queries until your ALTER TABLE is finished. There are some options in that case:
You may use the pt-online-schema-change, from Percona toolkit. Backup your data first! This is the easiest solution, but may not work in all cases.
If you don't use foreign keys and it's slow because you have a lot of indexes, it might be faster for you to create a copy of the table with the changes you need but no secondary indexes, populate it with the data, and create all indexes with a single alter table at the end.
If it's easy for you to create replicas, like if you're hosted at Amazon RDS, you may create a master-master replica, run the alter table there, let it get back in sync, and switch instances after finished.
UPDATE
As others mentioned, MySQL 8.0 INNODB added support for instant column adds. It's not a magical solution, it has limitations and side-effects -- it can only be the last column, the table must not have a full text index, etc -- but should help in many cases.
You can specify explicit ALGORITHM=INSTANT LOCK=NONE parameters, and if an instant schema change isn't possible, MySQL will fail with an error instead of falling back to INPLACE or COPY. Example:
ALTER TABLE mytable
ADD COLUMN mycolumn varchar(36) DEFAULT NULL,
ALGORITHM=INPLACE, LOCK=NONE;
https://mysqlserverteam.com/mysql-8-0-innodb-now-supports-instant-add-column/

MariaDb 10.3, MySQL 8.0 and probably other MySQL variants to follow have an "Instant ADD COLUMN" feature whereby most columns (there are a few constraints, see docs) can be added instantly with no table rebuild.
MariaDb: https://mariadb.com/resources/blog/instant-add-column-innodb
MySQL: https://mysqlserverteam.com/mysql-8-0-innodb-now-supports-instant-add-column/

I know this is a rather old question but today i encountered a similar problem. I decided to create a new table and to import the old table in the new table. Something like:
CREATE TABLE New_mytable LIKE mytable ;
ALTER TABLE New_mytable ADD some_column CHAR(1) NOT NULL DEFAULT 'N';
insert into New_mytable select * from mytable ;
Then
START TRANSACTION;
insert into New_mytable select * from mytable where id > (Select max(id) from New_mytable) ;
RENAME TABLE mytable TO Old_mytable;
RENAME TABLE New_mytable TO mytable;
COMMIT;
This does not make the update process go any faster, but it does minimize downtime.
Hope this helps.

What about Online DDL?
http://www.tocker.ca/2013/11/05/a-closer-look-at-online-ddl-in-mysql-5-6.html
Maybe you would use TokuDB instead:
http://www.tokutek.com/products/tokudb-for-mysql/

There is no way to avoid copying the table when adding or removing columns because the structure changes. You can add or remove secondary indexes without a table copy.
Your table data doesn't reside in memory. The indexes can reside in memory.
1.5 million records is not a lot of rows, and 20 minutes seems quite long, but perhaps your rows are large and you have many indexes.
While the table is being copied, you can still select rows from the table. However, if you try to do any updates, they will be blocked until the ALTER is complete.

Related

mysql add multiple column to large table in optimised way

I wanted to add 8 new columns to a large mysql(version 5.6) table with innodb having millions of record. I am trying to achieve this in most optimised way.
Is there any advantage of using a single query to adding all columns over adding 8 columns in 8 different queries. If so would like to know why.
On specifying ALGORITHM=INPLACE, LOCK=NONE, What all things i need to take care so that it wont cause any data corruption or application failure!
I was testing out ALGORITHM=INPLACE, LOCK=NONE with the query.
ALTER TABLE table_test ADD COLUMN test_column TINYINT UNSIGNED DEFAULT 0 ALGORITHM=INPLACE LOCK = NONE;
But its taking same time as the query ran with ALGORITHM=DEFAULT. What can be the reason.
Table which im altering is having only primary key index and no other indexes. From application the queries coming to this table are:
insert into table;
select * from table where user_id=uid;
select sum(column) from table where user_id=id and date<NOW();
By "optimized", do you mean "fastest"? Or "least impact on other queries"?
In older versions, the optimal way (using no add-ons) was to put all the ADD COLUMNs in a single ALTER TABLE; then wait until it finishes.
In any version, pt-online-schema-change will add all the columns with only a brief downtime.
Since you mention ALGORITHM=INPLACE, LOCK=NONE, I assume you are using a newer version? So, it may be that 8 ALTERs is optimal. There would be some interference, but perhaps not "too much".
ALGORITHM=DEFAULT lets the server pick the "best". This is almost always really the "best". That is, there is rarely a need to say anything other than DEFAULT.
You can never get data corruption. At worst, a query may fail due to some kind of timeout due to the interference of the ALTER(s). You should always be checking for error (including timeouts), and take handle it in your app.
To discuss the queries...
insert into table;
One row at a time? Or batched? (Batched is more efficient -- perhaps 10x better.)
select * from table;
Surely not! That would give you all the columns for millions of rows. Why should you ever do that?
select count(column) from table where pk=id and date<NOW();
COUNT(col) checks col for being NOT NULL -- Do you need that? If not, then simply do COUNT(*).
WHERE pk=id gives you only one row; so why also qualify with date<NOW()? The PRIMARY KEY makes the query as fast as possible.
The only index is PRIMARY KEY? This seems unusual for a million-row table. Is it a "Fact" table in a "Data Warehouse" app?
Internals
(Caveat: Much of this discussion of Internals is derived indirectly, and could be incorrect.)
For some ALTERs, the work is essentially just in the schema. Eg: Adding options on the end of an ENUM; increasing the size of a VARCHAR.
For some ALTERs with INPLACE, the processing is essentially modifying the data in place -- without having to copy it. Eg: Adding a column at the end.
PRIMARY KEY changes (in InnoDB) necessarily involve rebuilding the BTree containing the data; they cannot be done INPLACE.
Many secondary INDEX operations can be done without touching (other than reading) the data. DROP INDEX throws away a BTree and makes some meta changes. ADD INDEX reads the entire table, building the index BTree on the side, then announcing its existence. CHARACTER SET and COLLATION changes require rebuilding an index.
If the table must be copied over, there is a significant lock on the table. Any ALTER that needs to read all the data has an indirect impact because of the I/O and/or CPU and/or brief locks on blocks/rows/etc.
It is unclear whether the code is smart enough to handle a multi-task ALTER in the most efficient way. Adding 8 columns in one INPLACE pass should be possible, but if it made the code too complex, that operation may be converted to COPY.
Probably a multi-task ALTER will do the 'worst' case. For example, changing the PRIMARY KEY and augmenting an ENUM will simply do both in a single COPY. Since COPY is the original way of doing all ALTERs, it is well debugged and optimized by now. (But it is slow and invasive.)
COPY is really quite simple to implement, mostly involving existing primitives:
Lock real so no one is writing to it
CREATE TABLE new LIKE real;
ALTER TABLE new ... -- whatever you asked for
copy all the rows from real to new -- this is the slow part
RENAME TABLE real TO old, new TO real; -- fast, atomic, etc.
Unlock
DROP TABLE old;
INPLACE is more complex because it must decide among many different algorithms and locking levels. DEFAULT has to punt off to COPY if it cannot do INPLACE.

MySQL change engine big millions rows table

I want to change engine of 2 million rows table from MyISAM to InnoDB. I am afraid of this long time operation, so I created similar structure InnoDB table and now I want to copy all data from old one to this new one. What is the fastest way? SELECT INSERT? What about START TRANSACTION? Please, help. I dont want to hang my server.
Do yourself a favor: copy the whole setup to your local machine and try it all out there. You'll have a much better idea of what you are getting into. Just be aware of potential differences in hardware between your production server and your local machine.
The fastest way is probably the most straightforward way:
INSERT INTO table2 SELECT * FROM table1;
I suspect that you cannot do it any faster than what is built into the ALTER. And it does have to copy over all the data and rebuild all the indexes.
Be sure to have innodb_buffer_pool_size raised to prepare for InnoDB. And lower key_buffer_size to allow room. Suggest 35% and 12% of RAM, respectively, for the transition. After all tables are converted, suggest 70% and a mere 20MB.
One slight speedup is to do some select that fetches the entire table and the entire PRIMARY KEY (if it can be cached). This will do some I/O before really starting. Example: SELECT avg(id) FROM tbl where id is the primary key. And SELECT avg(foo) FROM tbl where foo is not indexed but it numeric. These will force a full scan of the PK index and the data, thereby caching the stuff that the ALTER will have to read.
Other tips on converting: http://mysql.rjweb.org/doc.php/myisam2innodb .

Adding Index to 3 million rows MySQL

I need to add at least 1 index to a column of type int(1) on an InnoDB table. There are about 3 million rows that it would need to index. This is a database on my production server, and it is in use by thousands of people everyday. I tried to add an index the standard way, but it was taking up too much time (I let it run for about 7 minutes before killing the process) and locking rows, meaning a frozen application for many users.
My VPS that runs all of this has 512mb of RAM and has an Intel Xeon E5504 processor.
How can I add an index to this production database without interrupting my user's experience?
Unless the table either reads XOR writes then you'll probably need to take down the site. Lock the databases, run the operation and wait.
If the table is a write only swap the writes to a temporary table and run the operation on the old table, then swap the writes back to the old table and insert the data from the temporary table.
If the table is read only, duplicate the table and run the operation on the copy.
If the table is a read/write then a messy alternative that might work, is to create a new table with the indexes and set the primary key start point to the next value in the original table, add a join to your read requests to select from both tables, but write exclusively to the new table. Then write a script that inserts from the old table to the new then deletes the row in the old table. It'll take far, far longer than the downtime, and plenty can go wrong, but it should be do-able.
you can set the start point of a primary key with
ALTER TABLE `my_table` AUTO_INCREMENT = X;
hope that helps.
take a look at pt-online-schema-change. i think this tool can be quite useful in your case. it will obviously put additional load on your database server but should not block access to the table for most of the operation time.

Changing Large MySQL InnoDB Tables

Adding a new column or adding a new index can take hours and days for large innodb tables in MySQL with more than 10 million rows. What is the best way to increase the performance on large innodb tables in these two cases? More memory, tweaking the configuration (for example increasing the sort_buffer_size or innodb_buffer_pool_size), or some kind of trick? Instead of altering a table directly, one could create a new one, change it, and copy the old data the new, like this which is useful for ISAM tables and multiple changes:
CREATE TABLE tablename_tmp LIKE tablename;
ALTER TABLE tablename_tmp ADD fieldname fieldtype;
INSERT INTO tablename_tmp SELECT * FROM tablename;
ALTER TABLE tablename RENAME tablename_old;
ALTER TABLE tablename_tmp RENAME tablename;
Is it recommendable for innodb tables, too, or is it just what the ALTER TABLE command does anway?
Edit 2016: we've recently (August 2016) released gh-ost, modifying my answer to reflect it.
Today there are several tools which allow you to do online alter table for MySQL. These are:
edit 2016: gh-ost: GitHub's triggerless schema migration tool (disclaimer: I am author of this tool)
oak-online-alter-table, as part of the openark-kit (disclaimer: I am author of this tool)
pt-online-schema-change, as part of the Percona Toolkit
Facebook's online schema change for MySQL
Let's consider the "normal" `ALTER TABLE`:
A large table will take long time to ALTER. innodb_buffer_pool_size is important, and so are other variables, but on very large table they are all negligible. It just takes time.
What MySQL does to ALTER a table is to create a new table with new format, copy all rows, then switch over. During this time the table is completely locked.
Consider your own suggestion:
It will most probably perform worst of all options. Why is that? Because you're using an InnoDB table, the INSERT INTO tablename_tmp SELECT * FROM tablename makes for a transaction. a huge transaction. It will create even more load than the normal ALTER TABLE.
Moreover, you will have to shut down your application at that time so that it does not write (INSERT, DELETE, UPDATE) to your table. If it does - your whole transaction is pointless.
What the online tools provide
The tools do not all work alike. However, the basics are shared:
They create a "shadow" table with altered schema
They create and use triggers to propagate changes from original table to ghost table
They slowly copy all the rows from your table to shadow table. They do so in chunks: say, 1,000 rows at a time.
They do all the above while you are still able to access and manipulate the original table.
When satisfied, they swap the two, using a RENAME.
The openark-kit tool has been in use for 3.5 years now. The Percona tool is a few months old, but possibly more tested then the former. Facebook's tool is said to work well for Facebook, but does not provide with a general solution to the average user. I haven't used it myself.
Edit 2016: gh-ost is a triggerless solution, which significantly reduces master write-load on the master, decoupling the migration write load from the normal load. It is auditable, controllable, testable. We've developed it internally at GitHub and released it as open source; we're doing all our production migrations via gh-ost today. See more here.
Each tool has its own limitations, look closely at documentation.
The conservative way
The conservative way is to use an Active-Passive Master-Master replication, do the ALTER on the standby (passive) server, then switch roles and do the ALTER again on what used to be the active server, now turned passive. This is also a good option, but requires an additional server, and deeper knowledge of replication.
Rename screws up referenced tables.
If you have say table_2 which is child to tablename, on ALTER TABLE tablename RENAME tablename_old;
table_2 will start pointing to tablename_old.
Now without altering table_2 you cannt point it back to tablename. You have to keep on going making alters in every child and referenced table.

Changing table type to InnoDB

I have a myisam only dedicated 32 GB RAM mysql server that is running on default configuration. I want to change the engine type to InnoDB of one table in order to avoid table locks. It has 50 million records and size on disk is around 15 GB. I am using mysql version 5.5
I guess I will need to add the following options and restart mysql.
innodb_buffer_pool_size=1G
innodb_log_file_size=100M
innodb_file_per_table=1
What else is considered to be necessary while changing the engine type?
You'll actually be running a command to convert each table.
It goes faster to first sort the table:
ALTER TABLE tablename ORDER BY primary_key_column;
Then, run the alter command:
ALTER TABLE tablename ENGINE = INNODB;
It might take a while if the table is really large, and it will use a lot of your CPU....
First of all check if your database supports InnoDB engine (I bet it is supported ;)):
SHOW ENGINES\G
If so, there is already default innodb related parameters in place, check them with:
SHOW VARIABLES LIKE '%innodb%'
and try to understand them and alter the to your specific needs. Even if you use the default params, you are now fine to play arround with InnoDB tables.
If you want to create only InnoDB tables, you can change your default storage engine, either for your current session with: SET storage_engine=INNODB; or in your config using default-storage-engine option.
By the way, the fastest way to convert a table to InnoDB is not the above described way. One can do the following to convert a table to InnoDB by simply inserting the data:
CREATE TABE new AS SELECT * FROM old WHERE 1<>1;
ALTER TABLE new ENGINE = INNODB;
INSERT INTO new SELECT * FROM old;
Of course you have to add the indexes you need manually, but its usually worth the time (and pain) you save compared to the ALTER TABLE ... on slightly bigger tables.