I have a table which consists of heavy blobs, and I wanted to conduct some tests on it.
I know deleted space is not reclaimed by innodb, so I decided to reuse existing records by updating its own values instead of createing new records.
But I noticed, whether I delete and insert a new entry, or I do UPDATE on existing ROW, InnoDB keeps on growing.
Assuming I have 100 Rows, each Storing 500KB of information, My InnoDB size is 10MB, now when I call UPDATE on all rows (no insert/ no delete), the innodb grows by ~8MB for every run I do. All I am doing is I am storing exactly 500KB of data in each row, with little modification, and size of blob is fixed.
What can I do to prevent this?
I know about optimize table, but I cant do it because on regular usage, the table is going to be 60-100GB big, and running optimize will just stall entire server.
But I noticed, whether I delete and insert a new entry, or I do UPDATE on existing ROW, InnoDB keeps on growing.
InnoDB is a multiversion system.
This means that each time you update or delete a row, the previous version of the row gets copied to a special place called a rollback segment. The original record is rewritten with the new data (in case of an UPDATE) or marked as deleted (in case of a DELETE).
If you do a transaction rollback or execute a concurrent SELECT statement (which has to see the previous version since the new one is not yet committed), the data is retrieved from the rollback segment.
The problem with this is that InnoDB tablespace never shrinks. If the tablespace is full, new space is allocated. The space occupied by the rollback information from the commited transactions may be reused too, but if the tablespace is full it will be extended and this is permanent.
There is no supported way to shrink the ibdata* files except backing up the whole database, dropping and recreating the InnoDB files and restoring from the backup.
Even if you use innodb_file_per_table option, the rollback tablespace is still stored in the shared ibdata* files.
Note, though, that space occupied by the tables and space occupied by the tablespace are the different things. Tables can be shrunk by running OPTIMIZE and this will make them more compact (within the tablespace), but the tablespace itself cannot be shrunk.
You can reduce table size by running Optimize query
OPTIMIZE TABLE table_name;
more information can be found here
Related
I have an innodb table with big table size (around 7GB),
I already delete about 80% of its rows,
The problem is the table size remains the same,
The innodb_file_per_table is ON,
How do I shrink the table size so it reflect the actual condition and save more disk space ?
It depends...
If you built the table with innodb_file_per_table=ON, then you can do OPTIMIZE TABLE now. (Be sure it is still ON.)
If =OFF, then the table is in the tablespace file ibdata1 and the only way to return the space to the OS is by dumping all tables, removing ibdata1, then reloading the dump. This is tedious and risky. Meanwhile, the free space created by the DELETEs will be used by future INSERTs, etc.
More discussion of big deletes.
I have a MySQL database in my production environment.Which had about 430 million row, of which 190 million rows were not of any use, so I started deleting these rows range by range, in night, as it would have affected my apps performance in daytime.
Now when I am seeing in my monitoring app, I am seeing 100%IO, of which maximum is write (12-30MB/s). (400-500 writes/sec)
But when I am checking process list I don't find any INSERT or UPDATE query or any rollback.
What can be the possible issue or how can I find any hidden query which may be writing in MySQL.
(In IOTP, I found that write operations are being done by mysqld only)
One more thing, I can see write with 80MB/s in IOTOP , but when I am checking directory size in / , I don't see any rise in any directory size.
Back away slowly... and wait.
InnoDB doesn't change the actual data in the tablespace files with each DML query.
It writes the changes to memory, of course, and then the redo log, at which point they are effectively "live" and safely persisted to disk... but they are not yet applied to the actual data (tablespace) files. InnoDB then syncs the changes to the data files in the background but in the mean time, other queries use a combination of the tablespace and log contents to determine what the "official" table data currently contains. This is, of course, an oversimplification, but MVCC necessarily means the physical data is a superset, though not necessarily a proper superset, of the logical data.
That's very likely to be the explanation for what you are seeing now.
It makes sense that free/used disk space isn't changing, because finalizing the deletion of those rows will only really be marking the space inside the tablespace files as unused. They shouldn't grow or shrink.
Resist the temptation to try to "fix" it and whatever you do, don't restart the server... because the best possible outcome is that it will pick up where it left off because it still has work to do.
SHOW ENGINE INNODB STATUS takes some practice to interpret but will likely be another key to the puzzle.
Is the delete operation still undergoing? DELETE can be extremely slow and generate a lot of writes. It is often better to create a new identical table and copy the rows you want to KEEP over to it and then switch it with the production table instead of delete stuff in the production table directly.
If the DELETE has already finished and you suspect that there are other queries running, you can enable query log for a few seconds and see which queries are executed:
TRUNCATE TABLE mysql.general_log;
SET GLOBAL log_output = 'TABLE';
SET GLOBAL general_log = 'ON';
SELECT SLEEP(10);
SET GLOBAL general_log = 'OFF';
Then SELECT from mysql.general_log to see which queries executed during the 10 seconds sleep.
Could you tell me, will MySQL data in tables be physically fragmented if I insert new data and delete old data (insert in table top, and delete bottom rows)?
Will the size of table be growing in any way, before I do OPTIMIZE?
Are you DELETEing and INSERTing at about the same rate? Is the table InnoDB?
If Yes to both, then "don't worry". The DELETEs will free up blocks and the INSERTs will re-use those blocks (or new blocks if the INSERTs are getting ahead).
InnoDB tables are composed of 16KB blocks that are chained together in a BTree structure. What you described will be freeing blocks from one side of the tree and creating new blocks on the other side. However, each block can reside anywhere, so the concept of "side of the tree" is more 'virtual' than 'physical'.
After you have done a lot of churning, try the OPTIMIZE once. If it does not cut the table size in half, that is a further clue that OPTIMIZE is not worth it. A 30% shrinkage after a lot of churn is somewhat typical. But it stabilizes at about that, and won't go over about 30%.
If, on the other hand, you are DELETEing most of the table all at once, then you may be better off rebuilding the table with the not-to-be-deleted rows. This combines the delete and the optimize steps into one. Then use this to flip the tables 'instantly':
RENAME TABLE real TO old, new TO real; DROP TABLE old;
And do not do any inserts during the rebuild.
When innodb_file_per_table is OFF and all data is going to be stored in ibdata files. When you remove rows, they are just marked as deleted on disk but space will be consumed by InnoDB files which can be re-used later when you insert/update more rows but it will never shrink.
If you drop some tables of delete some data then there is no any other way to reclaim that unused disk space except dump/reload method.
But, if you are using innodb_file_per_table then you can reclaim the space by running OPTIMIZE TABLE on that table. OPTIMIZE TABLE will create a new identical empty table. Then it will copy row by row data from old table to the new one. In this process a new .ibd tablespace will be created and the space will be reclaimed However, the shared tablespace-ibdata1 can still grow.
For more info- mysql reference and percona expert
I have a very large table with around 1M records. Due to bad performance, I have optimized the queries and needed to change the index.
I changed it using ALTER, now I am really not sure how this works in InnoDB. Do I need to restart MySQL server? If I need to restart MySQL server, how do I keep data integrity between tables (so that I don't miss data which is there in memory and did not get written to DB)?
I Googled and found that in the case of MySQL restart, I need to use global variable innodb_fast_shutdown -- what does it do when I set it and what if I don't? It is not very clear.
I am new to MySQL area with InnoDB. Any help is really appreciated.
So changed it using ALTER, now i am really not sure about how this works in innodb?
You are saying you added the index with ALTER TABLE ... ADD INDEX ... (or ADD KEY -- they are two ways of asking for exactly the same thing) presumably?
Once the ALTER TABLE finishes executing and your mysql> prompt returns, there is nothing else needed. At that point, the table has its new index and the index is fully populated.
You're done, and there is no need to restart the server.
Since you mentioned it, I'll also try to help clear up your misconceptions about innodb_fast_shutdown and the memory/disk divide in InnoDB.
InnoDB makes a one-time request for a block of memory the size of innodb_buffer_pool_size from the operating system when MySQL server starts up, in this example from the MySQL error log from one of my test servers:
130829 11:27:30 InnoDB: Initializing buffer pool, size = 4.0G
This is where InnoDB stores table and index data in memory, and the best performance is when this pool is large enough for all of your data and indexes. When rows are read, the pages from the tablespace files are read into the buffer pool first, then data extracted from there. If changes are made, the changes are written to the in-memory copies of table data and indexes in the buffer pool, and eventually they are flushed to disk. Pages in the pool are either "clean" -- meaning they are identical to what's on disk, because they've not been changed since they were loaded, or if changed, the changes have already been written to disk -- or "dirty" meaning they do not match what is on disk.
However, InnoDB is ACID-compliant -- and this could not be true if it only wrote the changes in memory and the changes were not persisted immediately somewhere prior to the in-memory changes even being made ... and that "somewhere" is the redo log -- on disk -- that stores what changes to be made in memory, immediately, in a format that allows this operation to be much faster than updating the actual tablespace files themselves in real-time would be.
In turn, the innodb_fast_shutdown variable determines whether MySQL finishes up everything written to the redo log before shutdown -- or after it starts back up. It works fine, either way, but if you need to shut the server down faster, it's faster and perfectly safe to let it pick everything up later, no matter what changes you have made.
Importantly, I don't know what you have read, but in routine operations, you never need to mess with the value of innodb_fast_shutdown unless you are shutting down in preparation for doing an upgrade to your version of MySQL server (and then it is primarily a safety precaution). The data on disk is always consistent with what is in memory, either because the tablespace files are already consistent with the memory representation of the data, or because the pending changes to the tablespace files are safely stored in the redo log, where they will be properly processed when the server comes back online.
In the case of ALTER TABLE anything pending for the table prior to the ALTER would have already been take care of, since InnoDB typically rebuilds entire the table in response to this command, so the only possible "pending" changes would be DML that occurred after the ALTER.
When you run a delete on a MyISAM table, it leaves a hole in the table until the table is optimized.
This affects concurrent inserts. On the default setting, concurrent_inserts only work for tables without holes. However, in the documentation for MyISAM, under the concurrent_insert section it says:
Enables concurrent inserts for all MyISAM tables, even those that have holes. For a table with a hole, new rows are inserted at the end of the table if it is in use by another thread. Otherwise, MySQL acquires a normal write lock and inserts the row into the hole.
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_concurrent_insert
Does that mean MyISAM automatically fills in holes whenever a new row is insert into the table? Previously I thought the holes would not be fixed until you OPTIMIZE a table.
Yes, MyISAM tables reuses deleted/free space.(Wether this is true for all cases, I don't know).
This can easily be observed by deleting a handful of records and inserting some new one - the MyISAM data file for that table will not grow in size.
Inserts into the middle of the table require a lock. So, in the default setting, MySQL will favor filling the holes, even though it will prevent concurrent inserts.
So, yes, MySQL prefers to fill the holes.
Setting concurrent_inserts to 2 tells MySQL that if there is a lock on the table, insert at the end, which doesn't require a lock, even though there are still holes to be filled. Therefore, this allows concurrent inserts even if there are holes in the middle, at the cost of taking longer to fill the holes.