MySQL - rebuild partition vs optimize partition - mysql

I've partitioned tables in my MySQL 5.1.41 which hold very huge amount of data. Recently, I've deleted a lot of data which caused fragmentation of around 500 GB yet there is a lot of data in the partitions.
To reclaim that space to the OS, I had to de-fragment the partitions. I referred to MySQL documentation, https://dev.mysql.com/doc/refman/5.1/en/partitioning-maintenance.html which confused me with the following statements,
Rebuilding partitions : Rebuilds the partition; this has the same effect as dropping all records stored in the partition, then
reinserting them. This can be useful for purposes of defragmentation.
Optimizing partitions : If you have deleted a large number of rows from a partition or if you have made many changes to a partitioned
table with variable-length rows (that is, having VARCHAR, BLOB, or
TEXT columns), you can use ALTER TABLE ... OPTIMIZE PARTITION to
reclaim any unused space and to defragment the partition data file.
I tried both and observed sometimes "rebuild" happens faster and sometimes "optimize". Each partition I run these commands on, has records from millions to sometimes billions. I'm aware of what MySQL does for above each statement.
Do they need to be applied based on number of rows in the partition? If so, on how many rows I can use "optimize" and on how many I should use "rebuild"?
Also, which is better to use?

MyISAM or InnoDB? (The answer will be different.)
For MyISAM, REBUILD/REORGANIZE/OPTIMIZE will take about the same effort per partition.
For InnoDB, OPTIMIZE PARTITION rebuilds all partitions. So, don't use this if you want to do the partitions one at a time. REORGANIZE PARTITION of the partition into an identical partition definition should act only on the one partition. I recommend that.
It is generally not worth using partitioning unless you have a least a million rows. Also BY RANGE is the only form that has any performance benefits that I have found.
Perhaps the main use of partitioning is with a time-series where you want to delete "old" data. PARTITION BY RANGE with weekly or monthly partitions lets you very efficiently DROP PARTITION rather than DELETE. More in my blog.
(My answer applies to all versions through 5.7, not just your antique 5.1.)

Related

How to optimize an "optimize" MYSQL query that takes a lot of time

I have a table (innodb) with 1 million new inserts (20GB) a week. I only need the data for 1 week, so I delete it after 7 days, so each day we delete around 3GB and insert 3GB new. That table is already in a separate database from the rest.
The problem is that disk space is only freed after an optimize query, so we run it every few weeks at night. It works, but it takes 30 minutes and freezes the whole database server that time, not just the particular database.
Is there any way to opimize faster?
If we run an optimize everytime we delete the data, will it be faster than running the optimize every few weeks instead? I thought it might be faster to run it when just 3GB of deleted rows need to be removed from disk, if we run it after 20 days it's 60GB. Is that right? and is there another way to optmize the optimmize?
Instead of worrying about speeding up OPTIMIZE TABLE, let's get rid of the need for it.
PARTITION BY RANGE(TO_DAYS(...)) ...
Then DROP PARTITION nightly; this is much faster than using DELETE, and avoids the need for OPTIMIZE.
Be sure to have innodb_file_per_table=ON.
Also nightly, use REORGANIZE PARTITION to turn the future partition into tomorrow's partition and a new, empty, partition.
Details here: http://mysql.rjweb.org/doc.php/partitionmaint
Note that each PARTITION is effectively a separate table so DROP PARTITION is effectively a drop table.
There should be 10 partitions:
1 starter table to avoid the overhead of a glitch when partitioning by DATETIME.
7 daily partitions
1 extra day, so that there will be a full 7 day's worth.
1 empty future partition just in case your nightly script fails to run.
Since you have an antique version that does not have PARTITIONing, here is another solution:
Compress the html and storing into a BLOB (instead of TEXT).
Do the compress and uncompress in the client.
This technique will shave shrink the disk footprint upwards of 3:1.
That won't eliminate the OPTIMIZE issue, but it will
Use less disk space.
Be faster (due to having less data to shovel around).
But, as already mentioned, InnoDB cleans up the free space somewhat. I suspect that the table does not grow past 2x after an Optimize? Normally a BTree that starts with no free space degrades to about 69% full after a lot of churn. But then it stays at that ratio.
Emails, HTML, text, code -- all of these shrink about 3:1 with any decent compression library (zlib, PHP's compress(), etc). Most image formats and pdfs are already compressed; they don't benefit from a second compression.
MySQL is not designed for that volume...
try a warehouse database engine (columnar engine) like AWS RedShift it will feel a 4 mb database again :)
if you cant use it, you can install postgres and add the plugin for compressed columnar tables (should be similar to redshift)

How to reduce index size of a table in mysql using innodb engine?

I am facing a performance issue in mysql due to large index size on my table. Index size has grown to 6GB and my instance is running on 32GB memory. Majority of rows is not required in that table after a few hours and can be removed selectively. But removing them is a time consuming solution and doesn't reduce index size.
Please suggest some solution to manage this index.
You can optimize your table to rebuild index and get back space if not getting even after deletion-
optimize table table_name;
But as your table is bulky so it will lock during optimze table and also you are facing issue how can remove old data even you don't need few hours old data. So you can do as per below-
Step1: during night hours or when there is less traffic on your db, first rename your main table and create a new table with same name. Now insert few hours data from old table to new table.
By this you can remove unwanted data and also new table will be optimzed.
Step2: In future to avoid this issue, you can create a stored procedure. Which will will execute in night hours only 1 time per day and either delete till previous day (as per your requirement) data from this table or will move data to any historical table.
Step3: As now your table always keep only sigle day data then you can execute optimize table statement to rebuild and claim space back on this table easily.
Note: delete statement will not rebuild index and will not free space on server. For this you need to do optimize your table. It can be by various ways like by alter statement or by optimize statement etc.
If you can remove all the rows older than X hours, then PARTITIONing is the way to go. PARTITION BY RANGE on the hour and use DROP PARTITION to remove an old hour and REORGANIZE PARTITION to create a new hour. You should have X+2 partitions. More details.
If the deletes are more complex, please provide more details; perhaps we can come up with another solution that deals with the question about index size. Please include SHOW CREATE TABLE.
Even if you cannot use partitions for purging, it may be useful to have partitions for OPTIMIZE. Do not use OPTIMIZE PARTITION; it optimizes the entire table. Instead, use REORGANIZE PARTITION if you see you need to shrink the index.
How big is the table?
How big is innodb_buffer_pool_size?
(6GB index does not seem that bad, especially since you have 32GB of RAM.)

Disk usage for optimizing a partitioned MySQL-Table

I have a large MyISAM table with 3 million rows that has a size of 31 GB due to a 10KB blob in each row. The table has already 30 partitions. I want to optimize the table since I am going to remove rows with some old data and resize the blobs.
My question is about the disk usage while optimzing:
If I do an optimize of the whole table, does MySQL steps through the partitions and optimize only one partionen at a time and thus only need extra space of one small partion? Or do I have to optimize a or few partitions in order not to have so much extra disk space while optimizing.
Optimizing only a partition with s size of ~1 GB takes only very fews seconds and I could not see any heavy disk usage.
(My answer assumes InnoDB. Even if I am overly pessimistic, the 'solution' should work fine for MyISAM.)
For InnoDB, keep in mind the issues of innodb_file_per_table.
OPTIMIZE will build a copy of the entire table.
Solution: If you are tight on space, you can optimize one partition at a time by doing
ALTER TABLE REORGANIZE PARTITION ...
( INTO PARTITION ... );
Yes, you will need to build the ... for one partition at a time, and execute them one at a time.
(Do not do OPTIMIZE PARTITION, that will optimize the entire table.)
Would you like to elaborate on what your table is like? I may want to talk you out of partitioning or talk you into a different way of partitioning.

Why my mysql table has to optimize frequently

I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?
Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.
I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.

Will partitions improve MySQL INSERT speed?

I'm doing a lot of INSERTs via LOAD DATA INFILE on MySQL 5.0. After many inserts, say a few hundred millions rows (InnoDB, PK + a non-unique index, 64 bit Linux 4GB RAM, RAID 1), the inserts slow down considerably and appear IO bound. Are partitions in MySQL 5.1 likely to improve performance if the data flows into separate partition tables?
The previous answer is erroneous in his assumptions that this will decrease performance. Quite the contrary.
Here's a lengthy, but informative article and the why and how to do partitioning in MySQL:
http://dev.mysql.com/tech-resources/articles/partitioning.html
Partitioning is typically used, as was mentioned, to group like-data together. That way, when you decided to archive off or flat out destroy a partition, your tables do not become fragmented. This, however, does not hurt performance, it can actually increase it. See, it is not just deletions that fragment, updates and inserts can also do that. By partitioning the data, you are instructing the RDBMS the criteria (indeces) by which the data should be manipulated and queried.
Edit: SiLent SoNG is correct. DISABLE / ENABLE KEYS only works for MyISAM, not InnoDB. I never knew that, but I went and read the docs. http://dev.mysql.com/doc/refman/5.1/en/alter-table.html#id1101502.
Updating any indexes may be whats slowing it down. You can disable indexes while your doing your update and turn them back on so they can be generated once for the whole table.
ALTER TABLE foo DISABLE KEYS;
LOAD DATA INFILE ... ;
ALTER TABLE ENABLE KEYS;
This will cause the indexes to all be updated in one go instead of per-row. This also leads to more balanced BTREE indexes.
No improvement on MySQL 5.6
"MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned."
http://dev.mysql.com/doc/refman/5.6/en/partitioning-pruning.html
If the columns INSERT checks (primary keys, for instance) are indexed - then this will only decrease the speed: MySQL will have to additionally decide on partitioning.
All queries are only improved by adding indexes. Partitioning is useful when you have tons of very old data (e.g. year<2000) which is rarely used: then it'll be nice to create a partition for that data.
Cheers!