Creating a index before a FK in MySQL - mysql

I have a not so big table, around 2M~ rows.
Because some business rule I had to add a new reference on this table.
Right now the application is writing values but not using the column.
Now I need to update all null rows to the correct values, create a FK, and start using the column.
But this table has a lot of reads, and when I try to alter table to add the FK the table is locked and the read queries get blocked.
There is any way to speed this?
Leaving all fields in NULL values helps to speed up (since I think there will be no need to check if the values is valid)?
Creating a index before helps to speed up?
In postgres I could create a not valid FK and then validate it(which caused only row lock, not table lock), there is anything similar in MySQL?

What's taking time is building the index. A foreign key requires an index. If there is already an index on the appropriate column(s), the FK will use it. If there is no index, then adding the FK constraint implicitly builds a new index. This takes a while, and the table is locked in the meantime.
Starting in MySQL 5.6, building an index should allow concurrent read and write queries. You can try to make this explicit:
ALTER TABLE mytable ADD INDEX (col1, col2) LOCK=NONE;
If this doesn't work (like if it gives an error because it doesn't recognize the LOCK=NONE syntax), then you aren't using a version of MySQL that supports online DDL. See https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html
If you can't build an index or define a foreign key without locking the table, then I suggest trying the free tool pt-online-schema-change. We use this at my job, and we make many schema changes per day in production, without blocking any queries.

Related

Multiple index on same column

I have a table which already have a column with BTREE index on it. Now I want to add a unique key constraint to the same column to avoid race condition from my rails app.
All the reference blogs/article shows I have to add a migration to create a new uniq index on that column like below
add_index :products, :key, :string, unique: true
I want to understand
What happens to BTREE index which is already present?(I need this)
Is it OK to have both the index and they both work fine?
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
You don't need both indexes.
In MySQL's default storage engine InnoDB, a UNIQUE KEY index is also a BTREE. InnoDB only supports BTREE indexes, whether they are unique or not (it also supports fulltext indexes, but that's a different story).
So a unique index is also useful for searching and sorting, just like a non-unique index.
Building an index will lock the table. I suggest using an online schema change tool like pt-online-schema-change or gh-ost. We use the former at my company, and we run hundreds of schema changes per week on production tables without blocking access. In fact, using one of these tools might cause the change to take longer, but we don't care because we aren't suffering any limited access while it's running.
What happens to BTREE index which is already present?(I need this)
Nothing. Creating a new index does not affect existing indexes.
Is it OK to have both the index and they both work fine?
Two indices by the same expression which differs in uniqueness only? This makes no sense.
It is recommended to remove regular index when unique one is created. This will save a lot of disk space. Additionally - when regular and unique indices by the same expression (literally!) exists then server will never use regular index.
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
The table will be locked shortly at the start of the index creation process. But if index creation and parallel CUD operations are executed then both of them will be slower.
The time needed for index creation can be determined only in practice. Sometimes it cannot be even predicted.

Does MySQL obtains a lock on the table while adding a new composite index?

I have a table on production which is causing performance issues. I've identified a few quick indexes I can add to resolve the issue. Does MySQL obtains a lock on the table while adding a new index? Does it depend on the nature of index (unique, composite etc)?
Note that I don't have any specific need to add a new column so won't be running any ALTER command.
A UNIQUE index must check for uniqueness.
Any change to the PRIMARY KEY requires rebuilding the index. (For InnoDB)
New versions of MySQL can add an index with very little impact.
Composite does not matter.
Adding a "new column" is a different question. This, also, "depends". New versions can do such "instantly" if you don't have the "after" clause.

Most efficient way to add a column to a MySql UNIQUE KEY?

I'm working with a production database on a table that has > 2 million rows, and a UNIQUE KEY over col_a, col_b.
I need up modify that index to be over col_a, col_b, and col_c.
I believe this to be a valid, atomic command to make the change:
ALTER TABLE myTable
DROP INDEX `unique_cols`,
ADD UNIQUE KEY `unique_cols` (
`col_a`,
`col_b`,
`col_c`
);
Is this the most efficient way to do it?
I'm not certain that the following way is the best way for you. This is what worked for us after we suffered a few database problems ourselves and had to fix them quickly.
We work on very large tables, over 4-5GB in size.
Those tables have >2 million rows.
In our experience running any form of alter queries / Index creation on the table is dangerous if the table is being written to.
So in our case here is what we do if the table has writes 24/7:
Create a new empty table with the correct indexes.
Copy data to the new table row by row, using a tool like Percona or manually writing a script.
This allows for the table to use less Memory, and also saves you in case you have a MyISAM table.
In the scenario that you have a very large table that is not being written to regularly, you could create the indexes while it is not in use.
This is hard to predict and can lead to problems if you've not estimated correctly.
In either case, your goal should be to:
Save memory / load on the system.
Reduce locks on the tables
The above also holds true when we add / delete columns for our super large tables, so this is not something we do for just creating indexes, but also adding and subtracting columns.
Hope this helps, and anyone is free to disagree / add to my answer.
Some more helpful answers:
https://dba.stackexchange.com/questions/54211/adding-index-to-large-mysql-tables:
https://dba.stackexchange.com/a/54214
https://serverfault.com/questions/174749/modifying-columns-of-very-large-mysql-tables-with-little-or-no-downtime
most efficient way to add index to large mysql table

mysql add multiple column to large table in optimised way

I wanted to add 8 new columns to a large mysql(version 5.6) table with innodb having millions of record. I am trying to achieve this in most optimised way.
Is there any advantage of using a single query to adding all columns over adding 8 columns in 8 different queries. If so would like to know why.
On specifying ALGORITHM=INPLACE, LOCK=NONE, What all things i need to take care so that it wont cause any data corruption or application failure!
I was testing out ALGORITHM=INPLACE, LOCK=NONE with the query.
ALTER TABLE table_test ADD COLUMN test_column TINYINT UNSIGNED DEFAULT 0 ALGORITHM=INPLACE LOCK = NONE;
But its taking same time as the query ran with ALGORITHM=DEFAULT. What can be the reason.
Table which im altering is having only primary key index and no other indexes. From application the queries coming to this table are:
insert into table;
select * from table where user_id=uid;
select sum(column) from table where user_id=id and date<NOW();
By "optimized", do you mean "fastest"? Or "least impact on other queries"?
In older versions, the optimal way (using no add-ons) was to put all the ADD COLUMNs in a single ALTER TABLE; then wait until it finishes.
In any version, pt-online-schema-change will add all the columns with only a brief downtime.
Since you mention ALGORITHM=INPLACE, LOCK=NONE, I assume you are using a newer version? So, it may be that 8 ALTERs is optimal. There would be some interference, but perhaps not "too much".
ALGORITHM=DEFAULT lets the server pick the "best". This is almost always really the "best". That is, there is rarely a need to say anything other than DEFAULT.
You can never get data corruption. At worst, a query may fail due to some kind of timeout due to the interference of the ALTER(s). You should always be checking for error (including timeouts), and take handle it in your app.
To discuss the queries...
insert into table;
One row at a time? Or batched? (Batched is more efficient -- perhaps 10x better.)
select * from table;
Surely not! That would give you all the columns for millions of rows. Why should you ever do that?
select count(column) from table where pk=id and date<NOW();
COUNT(col) checks col for being NOT NULL -- Do you need that? If not, then simply do COUNT(*).
WHERE pk=id gives you only one row; so why also qualify with date<NOW()? The PRIMARY KEY makes the query as fast as possible.
The only index is PRIMARY KEY? This seems unusual for a million-row table. Is it a "Fact" table in a "Data Warehouse" app?
Internals
(Caveat: Much of this discussion of Internals is derived indirectly, and could be incorrect.)
For some ALTERs, the work is essentially just in the schema. Eg: Adding options on the end of an ENUM; increasing the size of a VARCHAR.
For some ALTERs with INPLACE, the processing is essentially modifying the data in place -- without having to copy it. Eg: Adding a column at the end.
PRIMARY KEY changes (in InnoDB) necessarily involve rebuilding the BTree containing the data; they cannot be done INPLACE.
Many secondary INDEX operations can be done without touching (other than reading) the data. DROP INDEX throws away a BTree and makes some meta changes. ADD INDEX reads the entire table, building the index BTree on the side, then announcing its existence. CHARACTER SET and COLLATION changes require rebuilding an index.
If the table must be copied over, there is a significant lock on the table. Any ALTER that needs to read all the data has an indirect impact because of the I/O and/or CPU and/or brief locks on blocks/rows/etc.
It is unclear whether the code is smart enough to handle a multi-task ALTER in the most efficient way. Adding 8 columns in one INPLACE pass should be possible, but if it made the code too complex, that operation may be converted to COPY.
Probably a multi-task ALTER will do the 'worst' case. For example, changing the PRIMARY KEY and augmenting an ENUM will simply do both in a single COPY. Since COPY is the original way of doing all ALTERs, it is well debugged and optimized by now. (But it is slow and invasive.)
COPY is really quite simple to implement, mostly involving existing primitives:
Lock real so no one is writing to it
CREATE TABLE new LIKE real;
ALTER TABLE new ... -- whatever you asked for
copy all the rows from real to new -- this is the slow part
RENAME TABLE real TO old, new TO real; -- fast, atomic, etc.
Unlock
DROP TABLE old;
INPLACE is more complex because it must decide among many different algorithms and locking levels. DEFAULT has to punt off to COPY if it cannot do INPLACE.

MYSQL mass deletion FROM 80 tables

I have 50GB mysql database (80 tables) that I need to delete some contents from it.
I have a reference table that contains list if product ids that needs to be deleted from the the other tables.
Now, the other tables can be 2 GB each, contains the items that needs to be deleted.
My question is: since it is not a small database, what is the safest way to delete
the data in one shot in order to avoid problems?
What is the best method to verify the the entire data was deleted?
Probably this doesn't help anymore. But you should keep this in mind when creating the database. In mysql (depending on the table storage type, for instance in InnoDB) you can specify relations (They are called foreign key constraints). These relations mean that if you delete an entry from one row (for instance products) you can automatically update or delete entries in other tables that have that row as foreign key (such as product_storage). These relations guard that you have a 100% consistent state. However these relations might be hard to add on hindsight. If you plan to do this more often, it is definitely worth researching if you can add these to your database, they will save you a lot of work (all kinds of queries become simpler)
Without these relations you can't be 100% sure. So you'd have to go over all the tables, not which columns you want to check on and write a bunch of sql queries to make sure there are no entries left.
As Thirler has pointed out, it would be nice if you had foreign keys. Without them burnall 's solution can be used to transactions to ensure that no inconsistencies creep.
Regardless of how you do it, this could take a long time, even hours so please be prepared for that.
As pointed out earlier foreign keys would be nice in this place. But regarding question 1 you could perhaps run the changes within a transaction from the MySQL prompt. This assumes you are using a transaction safe storage engine like InnoDB. You can convert from myisam to InnoDB if you need to. Anyway something like this:
START TRANSACTION;
...Perform changes...
...Control changes...
COMMIT;
...or...
ROLLBACK;
Is it acceptable to have any downtime?
When working with PostgreSQL with databases >250Gb we use this technique on production servers in order to perform database changes. If the outcome isn't as expected we just rollback the transaction. Of course there is a penalty as the I/O-system has to work a bit.
// John
I am agree with Thirler that using of foreign keys is preferrable. It guarantees referential integrity and consisitency of the whole database.
I can believe that life sometimes requires more tricky logic.
So you could use manual queries like
delete from a where id in (select id from keys)
You could delete all records at once or by range of keys or using LIMIT in DELETE. Proper index is a must.
To verify consistency you need function or query. For example:
create function check_consistency() returns boolean
begin
return not exists(select * from child where id not in (select id from parent) )
and not exists(select * from child2 where id not in (select id from parent) );
-- and so on
end
Also maybe something to look into is Partitioning in MySQL tables. For more information check out the ref manual:
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Comes down that you can divide tables (for example) in different partitions per datetime values or indexsets.