Table with 50 million data and adding index takes too much time - mysql

I was working on table which has near about 50 million data(2GB-size). I had requirement to optimize the performance. So when I add index on column through phpmyadmin panel, table got lock and result in holding up all queries in queue on that table and ultimately results in restart/kill all queries. (And yeah, I forgot to mention I was doing this on production. My bad!)
When I did some research I found out some solution like creating duplicate table but any alternative method ?

You may follow this steps,
Create a temp table
Creates triggers on the first table (for
inserts, updates, deletes) so that they are replicated to the temp
table
In small batches, migrate data When done, rename table to new
table, and drop the other table
But as you said you are doing it in production then you need to consider live traffic while dropping a table and creating another one

Related

How to reduce index size of a table in mysql using innodb engine?

I am facing a performance issue in mysql due to large index size on my table. Index size has grown to 6GB and my instance is running on 32GB memory. Majority of rows is not required in that table after a few hours and can be removed selectively. But removing them is a time consuming solution and doesn't reduce index size.
Please suggest some solution to manage this index.
You can optimize your table to rebuild index and get back space if not getting even after deletion-
optimize table table_name;
But as your table is bulky so it will lock during optimze table and also you are facing issue how can remove old data even you don't need few hours old data. So you can do as per below-
Step1: during night hours or when there is less traffic on your db, first rename your main table and create a new table with same name. Now insert few hours data from old table to new table.
By this you can remove unwanted data and also new table will be optimzed.
Step2: In future to avoid this issue, you can create a stored procedure. Which will will execute in night hours only 1 time per day and either delete till previous day (as per your requirement) data from this table or will move data to any historical table.
Step3: As now your table always keep only sigle day data then you can execute optimize table statement to rebuild and claim space back on this table easily.
Note: delete statement will not rebuild index and will not free space on server. For this you need to do optimize your table. It can be by various ways like by alter statement or by optimize statement etc.
If you can remove all the rows older than X hours, then PARTITIONing is the way to go. PARTITION BY RANGE on the hour and use DROP PARTITION to remove an old hour and REORGANIZE PARTITION to create a new hour. You should have X+2 partitions. More details.
If the deletes are more complex, please provide more details; perhaps we can come up with another solution that deals with the question about index size. Please include SHOW CREATE TABLE.
Even if you cannot use partitions for purging, it may be useful to have partitions for OPTIMIZE. Do not use OPTIMIZE PARTITION; it optimizes the entire table. Instead, use REORGANIZE PARTITION if you see you need to shrink the index.
How big is the table?
How big is innodb_buffer_pool_size?
(6GB index does not seem that bad, especially since you have 32GB of RAM.)

Adding Index to 3 million rows MySQL

I need to add at least 1 index to a column of type int(1) on an InnoDB table. There are about 3 million rows that it would need to index. This is a database on my production server, and it is in use by thousands of people everyday. I tried to add an index the standard way, but it was taking up too much time (I let it run for about 7 minutes before killing the process) and locking rows, meaning a frozen application for many users.
My VPS that runs all of this has 512mb of RAM and has an Intel Xeon E5504 processor.
How can I add an index to this production database without interrupting my user's experience?
Unless the table either reads XOR writes then you'll probably need to take down the site. Lock the databases, run the operation and wait.
If the table is a write only swap the writes to a temporary table and run the operation on the old table, then swap the writes back to the old table and insert the data from the temporary table.
If the table is read only, duplicate the table and run the operation on the copy.
If the table is a read/write then a messy alternative that might work, is to create a new table with the indexes and set the primary key start point to the next value in the original table, add a join to your read requests to select from both tables, but write exclusively to the new table. Then write a script that inserts from the old table to the new then deletes the row in the old table. It'll take far, far longer than the downtime, and plenty can go wrong, but it should be do-able.
you can set the start point of a primary key with
ALTER TABLE `my_table` AUTO_INCREMENT = X;
hope that helps.
take a look at pt-online-schema-change. i think this tool can be quite useful in your case. it will obviously put additional load on your database server but should not block access to the table for most of the operation time.

Optimize table on huge mysql tables without partition

We have a very huge Mysql table which is MyISAM. Whenever we run optimize table command, the table is locked and performance is getting impacted. The table is not read only and hence creating temporary tables and swapping them may not work out. We are not able to partition the table also.
Is there any other way/tool to achieve optimize table functionality without degrading the performance. Any suggestion would be of great help.
Thanks in advance.
http://dev.mysql.com/doc/refman/5.5/en/optimize-table.html
For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which
rebuilds the table (...)
Therefore, I would not expect any improvement in switching to InnoDB, as Quassnoi probably suggests.
By definition, OPTIMIZE TABLE needs some exclusive access to the table, hence the degraded performances during OPTIMIZE'ation
Nevertheless, there could be some steps to take to reduce the time taken by OPTIMIZE, depending on how your table is "huge" :
if your table has many fields, your table might need to be normalized. Conversely, you might want to de-normalize your table by spreading your columns into several "narrower" tables, and establish one-to-one relations.
if your table has many records, implement a "manual" partitionning in your application code. A simple step would be to create an "archive" table that holds rarely updated records. This way you only need to optimize a smaller set of records (the non-archive table).
optimize table command lock the table,it decrease the performance.
you download percona tool kit command to optimize table.
this command not lock the table during optimize table.
use below link :
https://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html

InnoDB: ALTER TABLE performance related to NULLability?

I've got a table with 10M rows, and I'm trying to ALTER TABLE to add another column (a VARCHAR(80)).
From a data-modelling perspective, that column should be NOT NULL - but the amount of time it takes to run the statement is a consideration, and the client code could be changed to deal with a NULL column if that's warranted.
Should the NULL-ability of the column I'm trying to add significantly impact the amount of time it takes to add the column either way?
More Information
The context in which I'm doing this is a Django app, with a migration generated by South - adding three separate columns, and adding an index on one of the newly-added columns. Looking at the South-generated SQL, it spreads this operation (adding three columns and an index) over 15 ALTER TABLE statements - which seems like it will make this operation take a whole lot longer than it should.
I've seen some references that suggest that InnoDB doesn't actually have to create a field in the on-disk file for nullable fields that are NULL, and just modifies a bitfield in the header. Would this impact the speed of the ALTER TABLE operation?
I don't think the nullability of the column has anything to do with the speed of ALTER TABLE. In most alter table operations, the whole table - with all the indexes - has to be copied (temporarily) and then the alteration is done on the copy. With 10M rows, it's kind of slow. From MySQL docs:
Storage, Performance, and Concurrency Considerations
In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary table is created in the database directory of the new table. This can differ from the database directory of the original table for ALTER TABLE operations that rename the table to a different database.
If you want to make several changes in a table's structure, it's usually better to do them in one ALTER TABLE operation.
Allowing client code to make changes in tables is probably not the best idea - and you have hit on one good reason for not allowing that. Why do you need it? If you can't do otherwise, it would probably be better - for performance reasons - to allow your client code to be creating a table (with the new column and the PK of the existing table) instead of adding a column.

Running optimize on table copy?

I have an InnoDB table in MySQL which used to contain about 600k rows. After deleting 400k+ rows, my guess is that I need to run an OPTIMIZE.
However, since the table will be locked during this operation, the site will not be usable at that time. So, my question is: should I run the optimize on the live database table (with a little under 200k rows)? Or is it possible to create a copy of that table, run the OPTIMIZE on that copy and after that rename both tables so the copy the data back to the live table?
If you create a copy, then it should be optimised already if you do CREATE TABLE..AS SELECT... No need to run it separately
However, I'd consider copy the 200k rows to keep into a new table, then renaming the tables.
This way is less steps and less work all round.
CREATE TABLE MyTableCopy AS
SELECT *
FROM myTable
WHERE (insert Keep condition here);
RENAME TABLE
myTable TO myTable_DeleteMelater,
MyTableCopy TO myTable;