I'm working with a production database on a table that has > 2 million rows, and a UNIQUE KEY over col_a, col_b.
I need up modify that index to be over col_a, col_b, and col_c.
I believe this to be a valid, atomic command to make the change:
ALTER TABLE myTable
DROP INDEX `unique_cols`,
ADD UNIQUE KEY `unique_cols` (
`col_a`,
`col_b`,
`col_c`
);
Is this the most efficient way to do it?
I'm not certain that the following way is the best way for you. This is what worked for us after we suffered a few database problems ourselves and had to fix them quickly.
We work on very large tables, over 4-5GB in size.
Those tables have >2 million rows.
In our experience running any form of alter queries / Index creation on the table is dangerous if the table is being written to.
So in our case here is what we do if the table has writes 24/7:
Create a new empty table with the correct indexes.
Copy data to the new table row by row, using a tool like Percona or manually writing a script.
This allows for the table to use less Memory, and also saves you in case you have a MyISAM table.
In the scenario that you have a very large table that is not being written to regularly, you could create the indexes while it is not in use.
This is hard to predict and can lead to problems if you've not estimated correctly.
In either case, your goal should be to:
Save memory / load on the system.
Reduce locks on the tables
The above also holds true when we add / delete columns for our super large tables, so this is not something we do for just creating indexes, but also adding and subtracting columns.
Hope this helps, and anyone is free to disagree / add to my answer.
Some more helpful answers:
https://dba.stackexchange.com/questions/54211/adding-index-to-large-mysql-tables:
https://dba.stackexchange.com/a/54214
https://serverfault.com/questions/174749/modifying-columns-of-very-large-mysql-tables-with-little-or-no-downtime
most efficient way to add index to large mysql table
Related
I wanted to add 8 new columns to a large mysql(version 5.6) table with innodb having millions of record. I am trying to achieve this in most optimised way.
Is there any advantage of using a single query to adding all columns over adding 8 columns in 8 different queries. If so would like to know why.
On specifying ALGORITHM=INPLACE, LOCK=NONE, What all things i need to take care so that it wont cause any data corruption or application failure!
I was testing out ALGORITHM=INPLACE, LOCK=NONE with the query.
ALTER TABLE table_test ADD COLUMN test_column TINYINT UNSIGNED DEFAULT 0 ALGORITHM=INPLACE LOCK = NONE;
But its taking same time as the query ran with ALGORITHM=DEFAULT. What can be the reason.
Table which im altering is having only primary key index and no other indexes. From application the queries coming to this table are:
insert into table;
select * from table where user_id=uid;
select sum(column) from table where user_id=id and date<NOW();
By "optimized", do you mean "fastest"? Or "least impact on other queries"?
In older versions, the optimal way (using no add-ons) was to put all the ADD COLUMNs in a single ALTER TABLE; then wait until it finishes.
In any version, pt-online-schema-change will add all the columns with only a brief downtime.
Since you mention ALGORITHM=INPLACE, LOCK=NONE, I assume you are using a newer version? So, it may be that 8 ALTERs is optimal. There would be some interference, but perhaps not "too much".
ALGORITHM=DEFAULT lets the server pick the "best". This is almost always really the "best". That is, there is rarely a need to say anything other than DEFAULT.
You can never get data corruption. At worst, a query may fail due to some kind of timeout due to the interference of the ALTER(s). You should always be checking for error (including timeouts), and take handle it in your app.
To discuss the queries...
insert into table;
One row at a time? Or batched? (Batched is more efficient -- perhaps 10x better.)
select * from table;
Surely not! That would give you all the columns for millions of rows. Why should you ever do that?
select count(column) from table where pk=id and date<NOW();
COUNT(col) checks col for being NOT NULL -- Do you need that? If not, then simply do COUNT(*).
WHERE pk=id gives you only one row; so why also qualify with date<NOW()? The PRIMARY KEY makes the query as fast as possible.
The only index is PRIMARY KEY? This seems unusual for a million-row table. Is it a "Fact" table in a "Data Warehouse" app?
Internals
(Caveat: Much of this discussion of Internals is derived indirectly, and could be incorrect.)
For some ALTERs, the work is essentially just in the schema. Eg: Adding options on the end of an ENUM; increasing the size of a VARCHAR.
For some ALTERs with INPLACE, the processing is essentially modifying the data in place -- without having to copy it. Eg: Adding a column at the end.
PRIMARY KEY changes (in InnoDB) necessarily involve rebuilding the BTree containing the data; they cannot be done INPLACE.
Many secondary INDEX operations can be done without touching (other than reading) the data. DROP INDEX throws away a BTree and makes some meta changes. ADD INDEX reads the entire table, building the index BTree on the side, then announcing its existence. CHARACTER SET and COLLATION changes require rebuilding an index.
If the table must be copied over, there is a significant lock on the table. Any ALTER that needs to read all the data has an indirect impact because of the I/O and/or CPU and/or brief locks on blocks/rows/etc.
It is unclear whether the code is smart enough to handle a multi-task ALTER in the most efficient way. Adding 8 columns in one INPLACE pass should be possible, but if it made the code too complex, that operation may be converted to COPY.
Probably a multi-task ALTER will do the 'worst' case. For example, changing the PRIMARY KEY and augmenting an ENUM will simply do both in a single COPY. Since COPY is the original way of doing all ALTERs, it is well debugged and optimized by now. (But it is slow and invasive.)
COPY is really quite simple to implement, mostly involving existing primitives:
Lock real so no one is writing to it
CREATE TABLE new LIKE real;
ALTER TABLE new ... -- whatever you asked for
copy all the rows from real to new -- this is the slow part
RENAME TABLE real TO old, new TO real; -- fast, atomic, etc.
Unlock
DROP TABLE old;
INPLACE is more complex because it must decide among many different algorithms and locking levels. DEFAULT has to punt off to COPY if it cannot do INPLACE.
I have a table in my MySQL database with round 5M rows. Inserting rows to the table is too slow as MySQL updates index while inserting. How to stop index updating while inserting and do the indexing separately later?
Thanks
Kamrul
Sounds like your table might be over indexed. Maybe post your table definition here so we can have a look.
You have two choices:
Keep current indexes and remove unused indexes. If you have 3 indexes on a table for every single write to the table there will be 3 writes to the indexes. A index is only helpful during reads so you might want to remove unused indexes. During a load indexes will be updated which will slow down your load.
Drop you indexes before load then recreate them after load. You can drop your indexes before data load then insert and rebuild. The rebuild might take longer than the slow inserts. You will have to rebuild all indexes one by one. Also unique indexes can fail if duplicates are loaded during the load process without the indexes.
Now I suggest you take a good look at the indexes on the table and reduce them if they are not used in any queries. Then try both approaches and see what works for you. There is no way I know of in MySQL to disable indexes as they need the values insert to be written to their internal structures.
Another thing you might want to try it to split the IO over multiple drives i.e partition your table over several drives to get some hardware performance in place.
I've got a table with 10M rows, and I'm trying to ALTER TABLE to add another column (a VARCHAR(80)).
From a data-modelling perspective, that column should be NOT NULL - but the amount of time it takes to run the statement is a consideration, and the client code could be changed to deal with a NULL column if that's warranted.
Should the NULL-ability of the column I'm trying to add significantly impact the amount of time it takes to add the column either way?
More Information
The context in which I'm doing this is a Django app, with a migration generated by South - adding three separate columns, and adding an index on one of the newly-added columns. Looking at the South-generated SQL, it spreads this operation (adding three columns and an index) over 15 ALTER TABLE statements - which seems like it will make this operation take a whole lot longer than it should.
I've seen some references that suggest that InnoDB doesn't actually have to create a field in the on-disk file for nullable fields that are NULL, and just modifies a bitfield in the header. Would this impact the speed of the ALTER TABLE operation?
I don't think the nullability of the column has anything to do with the speed of ALTER TABLE. In most alter table operations, the whole table - with all the indexes - has to be copied (temporarily) and then the alteration is done on the copy. With 10M rows, it's kind of slow. From MySQL docs:
Storage, Performance, and Concurrency Considerations
In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary table is created in the database directory of the new table. This can differ from the database directory of the original table for ALTER TABLE operations that rename the table to a different database.
If you want to make several changes in a table's structure, it's usually better to do them in one ALTER TABLE operation.
Allowing client code to make changes in tables is probably not the best idea - and you have hit on one good reason for not allowing that. Why do you need it? If you can't do otherwise, it would probably be better - for performance reasons - to allow your client code to be creating a table (with the new column and the PK of the existing table) instead of adding a column.
I have a large table (~50M records) and i want to pass the records from this table to a different table that have the same structure (the new table have one extra index).
I'm using INSERT IGNORE INTO... to pass the records.
whats the fastest way to do this? is it by passing small chunks (lets say of 1M records) or bigger chunks?
is there any way i could speed the process?
Before perform Insert, disable indexes (DISABLE KEYS) (if you can) on destination table:
Reference can be found: Here
Also if you not using transanction / relations maybe consider switch to MyIsam engine.
We have a huge database and inserting a new column is taking too long. Anyway to speed up things?
Unfortunately, there's probably not much you can do. When inserting a new column, MySQL makes a copy of the table and inserts the new data there. You may find it faster to do
CREATE TABLE new_table LIKE old_table;
ALTER TABLE new_table ADD COLUMN (column definition);
INSERT INTO new_table(old columns) SELECT * FROM old_table;
RENAME table old_table TO tmp, new_table TO old_table;
DROP TABLE tmp;
This hasn't been my experience, but I've heard others have had success. You could also try disabling indices on new_table before the insert and re-enabling later. Note that in this case, you need to be careful not to lose any data which may be inserted into old_table during the transition.
Alternatively, if your concern is impacting users during the change, check out pt-online-schema-change which makes clever use of triggers to execute ALTER TABLE statements while keeping the table being modified available. (Note that this won't speed up the process however.)
There are four main things that you can do to make this faster:
If using innodb_file_per_table the original table may be highly fragmented in the filesystem, so you can try defragmenting it first.
Make the buffer pool as big as sensible, so more of the data, particularly the secondary indexes, fits in it.
Make innodb_io_capacity high enough, perhaps higher than usual, so that insert buffer merging and flushing of modified pages will happen more quickly. Requires MySQL 5.1 with InnoDB plugin or 5.5 and later.
MySQL 5.1 with InnoDB plugin and MySQL 5.5 and later support fast alter table. One of the things that makes a lot faster is adding or rebuilding indexes that are both not unique and not in a foreign key. So you can do this:
A. ALTER TABLE ADD your column, DROP your non-unique indexes that aren't in FKs.
B. ALTER TABLE ADD back your non-unique, non-FK indexes.
This should provide these benefits:
a. Less use of the buffer pool during step A because the buffer pool will only need to hold some of the indexes, the ones that are unique or in FKs. Indexes are randomly updated during this step so performance becomes much worse if they don't fully fit in the buffer pool. So more chance of your rebuild staying fast.
b. The fast alter table rebuilds the index by sorting the entries then building the index. This is faster and also produces an index with a higher page fill factor, so it'll be smaller and faster to start with.
The main disadvantage is that this is in two steps and after the first one you won't have some indexes that may be required for good performance. If that is a problem you can try the copy to a new table approach, using just the unique and FK indexes at first for the new table, then adding the non-unique ones later.
It's only in MySQL 5.6 but the feature request in http://bugs.mysql.com/bug.php?id=59214 increases the speed with which insert buffer changes are flushed to disk and limits how much space it can take in the buffer pool. This can be a performance limit for big jobs. the insert buffer is used to cache changes to secondary index pages.
We know that this is still frustratingly slow sometimes and that a true online alter table is very highly desirable
This is my personal opinion. For an official Oracle view, contact an Oracle public relations person.
James Day, MySQL Senior Principal Support Engineer, Oracle
usually new line insert means that there are many indexes.. so I would suggest reconsidering indexing.
Michael's solution may speed things up a bit, but perhaps you should have a look at the database and try to break the big table into smaller ones. Take a look at this: link. Normalizing your database tables may save you loads of time in the future.