Since partition also splits the table into subtables, I wanted to know if there is any way to index the partitioned table one by one based on the partition name or id. I am asking this because, my table can have 1 Billion+ rows and add index query takes long hours/day, so wanted to check if I can start adding index based on the partition that I think is more important first or vice versa.
No, MySQL has no syntax to support creating indexes on a partitioned table one partition at a time. The index will be added to all partitions in one ALTER TABLE or CREATE INDEX statement.
At my company, we execute schema changes using pt-online-schema-change, a script that allows clients to continue reading and writing the table while the alter is running. It might even take longer to run the schema change, but since it doesn't block clients, this doesn't cause a problem.
The script is part of the Percona Toolkit, which is a free, open-source collection of tools written in Perl and Bash.
Related
I have a not so big table, around 2M~ rows.
Because some business rule I had to add a new reference on this table.
Right now the application is writing values but not using the column.
Now I need to update all null rows to the correct values, create a FK, and start using the column.
But this table has a lot of reads, and when I try to alter table to add the FK the table is locked and the read queries get blocked.
There is any way to speed this?
Leaving all fields in NULL values helps to speed up (since I think there will be no need to check if the values is valid)?
Creating a index before helps to speed up?
In postgres I could create a not valid FK and then validate it(which caused only row lock, not table lock), there is anything similar in MySQL?
What's taking time is building the index. A foreign key requires an index. If there is already an index on the appropriate column(s), the FK will use it. If there is no index, then adding the FK constraint implicitly builds a new index. This takes a while, and the table is locked in the meantime.
Starting in MySQL 5.6, building an index should allow concurrent read and write queries. You can try to make this explicit:
ALTER TABLE mytable ADD INDEX (col1, col2) LOCK=NONE;
If this doesn't work (like if it gives an error because it doesn't recognize the LOCK=NONE syntax), then you aren't using a version of MySQL that supports online DDL. See https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html
If you can't build an index or define a foreign key without locking the table, then I suggest trying the free tool pt-online-schema-change. We use this at my job, and we make many schema changes per day in production, without blocking any queries.
I'm working with a production database on a table that has > 2 million rows, and a UNIQUE KEY over col_a, col_b.
I need up modify that index to be over col_a, col_b, and col_c.
I believe this to be a valid, atomic command to make the change:
ALTER TABLE myTable
DROP INDEX `unique_cols`,
ADD UNIQUE KEY `unique_cols` (
`col_a`,
`col_b`,
`col_c`
);
Is this the most efficient way to do it?
I'm not certain that the following way is the best way for you. This is what worked for us after we suffered a few database problems ourselves and had to fix them quickly.
We work on very large tables, over 4-5GB in size.
Those tables have >2 million rows.
In our experience running any form of alter queries / Index creation on the table is dangerous if the table is being written to.
So in our case here is what we do if the table has writes 24/7:
Create a new empty table with the correct indexes.
Copy data to the new table row by row, using a tool like Percona or manually writing a script.
This allows for the table to use less Memory, and also saves you in case you have a MyISAM table.
In the scenario that you have a very large table that is not being written to regularly, you could create the indexes while it is not in use.
This is hard to predict and can lead to problems if you've not estimated correctly.
In either case, your goal should be to:
Save memory / load on the system.
Reduce locks on the tables
The above also holds true when we add / delete columns for our super large tables, so this is not something we do for just creating indexes, but also adding and subtracting columns.
Hope this helps, and anyone is free to disagree / add to my answer.
Some more helpful answers:
https://dba.stackexchange.com/questions/54211/adding-index-to-large-mysql-tables:
https://dba.stackexchange.com/a/54214
https://serverfault.com/questions/174749/modifying-columns-of-very-large-mysql-tables-with-little-or-no-downtime
most efficient way to add index to large mysql table
I wanted to add 8 new columns to a large mysql(version 5.6) table with innodb having millions of record. I am trying to achieve this in most optimised way.
Is there any advantage of using a single query to adding all columns over adding 8 columns in 8 different queries. If so would like to know why.
On specifying ALGORITHM=INPLACE, LOCK=NONE, What all things i need to take care so that it wont cause any data corruption or application failure!
I was testing out ALGORITHM=INPLACE, LOCK=NONE with the query.
ALTER TABLE table_test ADD COLUMN test_column TINYINT UNSIGNED DEFAULT 0 ALGORITHM=INPLACE LOCK = NONE;
But its taking same time as the query ran with ALGORITHM=DEFAULT. What can be the reason.
Table which im altering is having only primary key index and no other indexes. From application the queries coming to this table are:
insert into table;
select * from table where user_id=uid;
select sum(column) from table where user_id=id and date<NOW();
By "optimized", do you mean "fastest"? Or "least impact on other queries"?
In older versions, the optimal way (using no add-ons) was to put all the ADD COLUMNs in a single ALTER TABLE; then wait until it finishes.
In any version, pt-online-schema-change will add all the columns with only a brief downtime.
Since you mention ALGORITHM=INPLACE, LOCK=NONE, I assume you are using a newer version? So, it may be that 8 ALTERs is optimal. There would be some interference, but perhaps not "too much".
ALGORITHM=DEFAULT lets the server pick the "best". This is almost always really the "best". That is, there is rarely a need to say anything other than DEFAULT.
You can never get data corruption. At worst, a query may fail due to some kind of timeout due to the interference of the ALTER(s). You should always be checking for error (including timeouts), and take handle it in your app.
To discuss the queries...
insert into table;
One row at a time? Or batched? (Batched is more efficient -- perhaps 10x better.)
select * from table;
Surely not! That would give you all the columns for millions of rows. Why should you ever do that?
select count(column) from table where pk=id and date<NOW();
COUNT(col) checks col for being NOT NULL -- Do you need that? If not, then simply do COUNT(*).
WHERE pk=id gives you only one row; so why also qualify with date<NOW()? The PRIMARY KEY makes the query as fast as possible.
The only index is PRIMARY KEY? This seems unusual for a million-row table. Is it a "Fact" table in a "Data Warehouse" app?
Internals
(Caveat: Much of this discussion of Internals is derived indirectly, and could be incorrect.)
For some ALTERs, the work is essentially just in the schema. Eg: Adding options on the end of an ENUM; increasing the size of a VARCHAR.
For some ALTERs with INPLACE, the processing is essentially modifying the data in place -- without having to copy it. Eg: Adding a column at the end.
PRIMARY KEY changes (in InnoDB) necessarily involve rebuilding the BTree containing the data; they cannot be done INPLACE.
Many secondary INDEX operations can be done without touching (other than reading) the data. DROP INDEX throws away a BTree and makes some meta changes. ADD INDEX reads the entire table, building the index BTree on the side, then announcing its existence. CHARACTER SET and COLLATION changes require rebuilding an index.
If the table must be copied over, there is a significant lock on the table. Any ALTER that needs to read all the data has an indirect impact because of the I/O and/or CPU and/or brief locks on blocks/rows/etc.
It is unclear whether the code is smart enough to handle a multi-task ALTER in the most efficient way. Adding 8 columns in one INPLACE pass should be possible, but if it made the code too complex, that operation may be converted to COPY.
Probably a multi-task ALTER will do the 'worst' case. For example, changing the PRIMARY KEY and augmenting an ENUM will simply do both in a single COPY. Since COPY is the original way of doing all ALTERs, it is well debugged and optimized by now. (But it is slow and invasive.)
COPY is really quite simple to implement, mostly involving existing primitives:
Lock real so no one is writing to it
CREATE TABLE new LIKE real;
ALTER TABLE new ... -- whatever you asked for
copy all the rows from real to new -- this is the slow part
RENAME TABLE real TO old, new TO real; -- fast, atomic, etc.
Unlock
DROP TABLE old;
INPLACE is more complex because it must decide among many different algorithms and locking levels. DEFAULT has to punt off to COPY if it cannot do INPLACE.
I am facing a performance issue in mysql due to large index size on my table. Index size has grown to 6GB and my instance is running on 32GB memory. Majority of rows is not required in that table after a few hours and can be removed selectively. But removing them is a time consuming solution and doesn't reduce index size.
Please suggest some solution to manage this index.
You can optimize your table to rebuild index and get back space if not getting even after deletion-
optimize table table_name;
But as your table is bulky so it will lock during optimze table and also you are facing issue how can remove old data even you don't need few hours old data. So you can do as per below-
Step1: during night hours or when there is less traffic on your db, first rename your main table and create a new table with same name. Now insert few hours data from old table to new table.
By this you can remove unwanted data and also new table will be optimzed.
Step2: In future to avoid this issue, you can create a stored procedure. Which will will execute in night hours only 1 time per day and either delete till previous day (as per your requirement) data from this table or will move data to any historical table.
Step3: As now your table always keep only sigle day data then you can execute optimize table statement to rebuild and claim space back on this table easily.
Note: delete statement will not rebuild index and will not free space on server. For this you need to do optimize your table. It can be by various ways like by alter statement or by optimize statement etc.
If you can remove all the rows older than X hours, then PARTITIONing is the way to go. PARTITION BY RANGE on the hour and use DROP PARTITION to remove an old hour and REORGANIZE PARTITION to create a new hour. You should have X+2 partitions. More details.
If the deletes are more complex, please provide more details; perhaps we can come up with another solution that deals with the question about index size. Please include SHOW CREATE TABLE.
Even if you cannot use partitions for purging, it may be useful to have partitions for OPTIMIZE. Do not use OPTIMIZE PARTITION; it optimizes the entire table. Instead, use REORGANIZE PARTITION if you see you need to shrink the index.
How big is the table?
How big is innodb_buffer_pool_size?
(6GB index does not seem that bad, especially since you have 32GB of RAM.)
I have a table with 1 million records. As my query will maily based on one column(32 constants). I am trying to add 32 partitions use type list.
My application can't stop, there will be insert some record in the meanwhile? Can I add partition to the table? Does it impace my application. Such as lock some rows duirng the partition.
I search the internet, but didn't find too much material abou the story of Add partition to existing table?
THank you.
A common way to do a table migration such as this without any impact to the application is to follow these steps:
Create a duplicated table that contains the revisions you require (in your case the partition)
Setup a trigger on the original table to insert into the duplicated table (this will act as a form of replication for a short period of time)
Start a migration from the original table to the new table, at a rate that will not hinder your application (say 1000 rows at a time)
When the migration completes, you'll have your tables in perfect sync, this is the time for you to modify your application to start reading and writing using the new table.
Once you're happy that your app is functional using the new table, drop the old table.
Migration complete, have a beer.