Disk usage for optimizing a partitioned MySQL-Table - mysql

I have a large MyISAM table with 3 million rows that has a size of 31 GB due to a 10KB blob in each row. The table has already 30 partitions. I want to optimize the table since I am going to remove rows with some old data and resize the blobs.
My question is about the disk usage while optimzing:
If I do an optimize of the whole table, does MySQL steps through the partitions and optimize only one partionen at a time and thus only need extra space of one small partion? Or do I have to optimize a or few partitions in order not to have so much extra disk space while optimizing.

Optimizing only a partition with s size of ~1 GB takes only very fews seconds and I could not see any heavy disk usage.

(My answer assumes InnoDB. Even if I am overly pessimistic, the 'solution' should work fine for MyISAM.)
For InnoDB, keep in mind the issues of innodb_file_per_table.
OPTIMIZE will build a copy of the entire table.
Solution: If you are tight on space, you can optimize one partition at a time by doing
ALTER TABLE REORGANIZE PARTITION ...
( INTO PARTITION ... );
Yes, you will need to build the ... for one partition at a time, and execute them one at a time.
(Do not do OPTIMIZE PARTITION, that will optimize the entire table.)
Would you like to elaborate on what your table is like? I may want to talk you out of partitioning or talk you into a different way of partitioning.

Related

Drop and Recreate Index on MYSQL table, it will improve the performance?

I have Executed the query in the newly imported MySQL database but it takes 68sec to complete. Then I have dropped and recreated the same indexes on 2 main tables then it takes 24sec only.
Why it has occurred? Is it a good practice or not?
Thanks in Advance
You are misinterpreting the results and the cause. Dropping and re-creating the indexes isn't what makes it go faster. There are two things that could be going on:
1) DB doesn't fit into RAM so when you recreated two indexes that made most of them stick in the buffer pool by the time you ran the query.
2) Table was fragmented or had very lightly filled blocks. Recreating indexes probably rebuilt the table and that may have improved page occupancy If your query requires a full table scan, this would have meant fewer GBs of table to scan and possibly less fragmented (can matter on spinning rust).
As a general rule you should never need to do that. If you disable the query cache (query_cache_type=0, query_cache_size=0 on MySQL < 8), and run the query twice, the second time is the speed you can expect with hit buffer pool.

How to optimize an "optimize" MYSQL query that takes a lot of time

I have a table (innodb) with 1 million new inserts (20GB) a week. I only need the data for 1 week, so I delete it after 7 days, so each day we delete around 3GB and insert 3GB new. That table is already in a separate database from the rest.
The problem is that disk space is only freed after an optimize query, so we run it every few weeks at night. It works, but it takes 30 minutes and freezes the whole database server that time, not just the particular database.
Is there any way to opimize faster?
If we run an optimize everytime we delete the data, will it be faster than running the optimize every few weeks instead? I thought it might be faster to run it when just 3GB of deleted rows need to be removed from disk, if we run it after 20 days it's 60GB. Is that right? and is there another way to optmize the optimmize?
Instead of worrying about speeding up OPTIMIZE TABLE, let's get rid of the need for it.
PARTITION BY RANGE(TO_DAYS(...)) ...
Then DROP PARTITION nightly; this is much faster than using DELETE, and avoids the need for OPTIMIZE.
Be sure to have innodb_file_per_table=ON.
Also nightly, use REORGANIZE PARTITION to turn the future partition into tomorrow's partition and a new, empty, partition.
Details here: http://mysql.rjweb.org/doc.php/partitionmaint
Note that each PARTITION is effectively a separate table so DROP PARTITION is effectively a drop table.
There should be 10 partitions:
1 starter table to avoid the overhead of a glitch when partitioning by DATETIME.
7 daily partitions
1 extra day, so that there will be a full 7 day's worth.
1 empty future partition just in case your nightly script fails to run.
Since you have an antique version that does not have PARTITIONing, here is another solution:
Compress the html and storing into a BLOB (instead of TEXT).
Do the compress and uncompress in the client.
This technique will shave shrink the disk footprint upwards of 3:1.
That won't eliminate the OPTIMIZE issue, but it will
Use less disk space.
Be faster (due to having less data to shovel around).
But, as already mentioned, InnoDB cleans up the free space somewhat. I suspect that the table does not grow past 2x after an Optimize? Normally a BTree that starts with no free space degrades to about 69% full after a lot of churn. But then it stays at that ratio.
Emails, HTML, text, code -- all of these shrink about 3:1 with any decent compression library (zlib, PHP's compress(), etc). Most image formats and pdfs are already compressed; they don't benefit from a second compression.
MySQL is not designed for that volume...
try a warehouse database engine (columnar engine) like AWS RedShift it will feel a 4 mb database again :)
if you cant use it, you can install postgres and add the plugin for compressed columnar tables (should be similar to redshift)

How improve query speed with bigdata ? Mysql

The table structure is as follows:
When I run this query, the execute time is about 2-3 minutes:
select id,name,infohash,files from tb_torrent where id between 0 and 10000;
There's just over 200,000 data, why is the execution so slow? And how to fix it?
The unnecessary use of BIGint is not enough to explain the sluggishness. Let's look for other issues.
Does that "key" icon mean that there is an index on id? Perchance is it PRIMARY KEY?
What ENGINE is in use? If it is MyISAM, then you have the drawback of the PK not being 'clustered' with the data, thereby making the 10K lookups slower.
What will you do with 10K rows? Think of the networks costs. And the memory costs in the client.
But maybe this is the real problem... If this is InnoDB, and if the TEXT columns are "big", then the values are stored "off record". This leads to another disk hit to get any big text values. Change them to some realistic max len of VARCHAR(...).
How much RAM do you have? What is the value of innodb_buffer_pool_size? Did you time the query twice? (The first time would be I/O-bound; the second might be hitting cache. How big (in MB or GB) is the table?

How to reduce index size of a table in mysql using innodb engine?

I am facing a performance issue in mysql due to large index size on my table. Index size has grown to 6GB and my instance is running on 32GB memory. Majority of rows is not required in that table after a few hours and can be removed selectively. But removing them is a time consuming solution and doesn't reduce index size.
Please suggest some solution to manage this index.
You can optimize your table to rebuild index and get back space if not getting even after deletion-
optimize table table_name;
But as your table is bulky so it will lock during optimze table and also you are facing issue how can remove old data even you don't need few hours old data. So you can do as per below-
Step1: during night hours or when there is less traffic on your db, first rename your main table and create a new table with same name. Now insert few hours data from old table to new table.
By this you can remove unwanted data and also new table will be optimzed.
Step2: In future to avoid this issue, you can create a stored procedure. Which will will execute in night hours only 1 time per day and either delete till previous day (as per your requirement) data from this table or will move data to any historical table.
Step3: As now your table always keep only sigle day data then you can execute optimize table statement to rebuild and claim space back on this table easily.
Note: delete statement will not rebuild index and will not free space on server. For this you need to do optimize your table. It can be by various ways like by alter statement or by optimize statement etc.
If you can remove all the rows older than X hours, then PARTITIONing is the way to go. PARTITION BY RANGE on the hour and use DROP PARTITION to remove an old hour and REORGANIZE PARTITION to create a new hour. You should have X+2 partitions. More details.
If the deletes are more complex, please provide more details; perhaps we can come up with another solution that deals with the question about index size. Please include SHOW CREATE TABLE.
Even if you cannot use partitions for purging, it may be useful to have partitions for OPTIMIZE. Do not use OPTIMIZE PARTITION; it optimizes the entire table. Instead, use REORGANIZE PARTITION if you see you need to shrink the index.
How big is the table?
How big is innodb_buffer_pool_size?
(6GB index does not seem that bad, especially since you have 32GB of RAM.)

MySQL - rebuild partition vs optimize partition

I've partitioned tables in my MySQL 5.1.41 which hold very huge amount of data. Recently, I've deleted a lot of data which caused fragmentation of around 500 GB yet there is a lot of data in the partitions.
To reclaim that space to the OS, I had to de-fragment the partitions. I referred to MySQL documentation, https://dev.mysql.com/doc/refman/5.1/en/partitioning-maintenance.html which confused me with the following statements,
Rebuilding partitions : Rebuilds the partition; this has the same effect as dropping all records stored in the partition, then
reinserting them. This can be useful for purposes of defragmentation.
Optimizing partitions : If you have deleted a large number of rows from a partition or if you have made many changes to a partitioned
table with variable-length rows (that is, having VARCHAR, BLOB, or
TEXT columns), you can use ALTER TABLE ... OPTIMIZE PARTITION to
reclaim any unused space and to defragment the partition data file.
I tried both and observed sometimes "rebuild" happens faster and sometimes "optimize". Each partition I run these commands on, has records from millions to sometimes billions. I'm aware of what MySQL does for above each statement.
Do they need to be applied based on number of rows in the partition? If so, on how many rows I can use "optimize" and on how many I should use "rebuild"?
Also, which is better to use?
MyISAM or InnoDB? (The answer will be different.)
For MyISAM, REBUILD/REORGANIZE/OPTIMIZE will take about the same effort per partition.
For InnoDB, OPTIMIZE PARTITION rebuilds all partitions. So, don't use this if you want to do the partitions one at a time. REORGANIZE PARTITION of the partition into an identical partition definition should act only on the one partition. I recommend that.
It is generally not worth using partitioning unless you have a least a million rows. Also BY RANGE is the only form that has any performance benefits that I have found.
Perhaps the main use of partitioning is with a time-series where you want to delete "old" data. PARTITION BY RANGE with weekly or monthly partitions lets you very efficiently DROP PARTITION rather than DELETE. More in my blog.
(My answer applies to all versions through 5.7, not just your antique 5.1.)