I've learned that execute DELETE sql in mysql will not release space, so i do an experiment about it.
During the experiment, i find when my data size is large(about 20000 rows), i delete half of the rows, the data length is not reduced.
But when my data size is about 1000 rows, i try to delete half of the rows. Unexpectedly the data length will be reduced.
It's really confusing. I have search a lot about this problem, but got nothing. What's the reason?
My MySQL version is 5.6
InnoDB table statistics are not updated every time you DELETE. So the data_length, index_length, rows, average_row_length and data_free may not describe the precise state of the table at all times.
You might like to read:
https://www.percona.com/blog/2011/10/06/when-does-innodb-update-table-statistics-and-when-it-can-bite/
https://www.percona.com/blog/2017/09/11/updating-innodb-table-statistics-manually/
Even after the table stats are updated, they are not exact. They are estimates based on a limited sample of the table.
It should also be mentioned that the size of a tablespace file remains at its high-water mark, even if the data_length is reduced. The difference is space that is part of the file on your filesystem, but it is not occupied by data or indexes anymore. InnoDB should reuse that "free" space within the tablespace before it expands the file further.
Related
I have a database named "bongoTv" where lots of table but I found one table its size about 20GB with less amount of data.
After removing few row storage did not reduced. Then I ran a command
OPTIMIZE TABLE notifiation to re-indexing. But It increase its size to 25GB.
As per my undersetting with other DBMS it should be reduce its size but why its size increased, I think it cached previous information somewhere.
After searching on web I found need to configure with innodb_file_per_table=ON. But here in my configuration it is also enabled. But it did not worked.
Need expert opinion who dedicatedly working on this MySQL.
In that case what need to do from my end, what is the solution this issue?
#Louis &
#P.Salmon Can you help me on this?
Thanks in Advance who is going to help me on this.
In general, InnoDB tablespace files never shrink. If you delete data, it makes some space "unused" and over time InnoDB will try to reuse unused space before expanding the tablespace file further.
But there is also tablespace fragmentation. As you delete rows and leave small gaps of unused space, those small gaps may not be usable for new data. So over time, the gaps grow in number, and the tablespace uses more space than it should, if you were to store the same data as compactly as possible.
The free space that comprise full extents, or contiguous 1MB areas, are shown as data_free when you run SHOW TABLE STATUS. But smaller gaps of unused space are not shown. MySQL has no way of reporting the "crumbs" of unused space.
When you use OPTIMIZE TABLE on an InnoDB table, it still cannot shrink the tablespace, it only copies data to a new tablespace. It tries to defragment the data, leaving out the gaps where possible. So if there are a lot of large and small gaps in your old tablespace, the new tablespace should have a smaller total size.
However, while filling pages of the new tablespace, InnoDB deliberately leaves 1/16 of each page unused, to allow for future updates that might need just a little bit more room. So in theory, you might see OPTIMIZE TABLE cause the file to grow larger if the original was very compact and the new file was created with more "elbow room."
But that still does not account for the 20GB to 25GB change you saw. That might be because sizes are cached. That is, the old file was in fact 25GB, but the table status was not reporting it. MySQL 8.0 especially has some caching behavior on some table statistics: https://bugs.mysql.com/bug.php?id=86170
So how to reduce the table size in MySQL?
Deleting rows is the most effective way. If you don't need data to be in the database anymore, delete it. If you might need data for archival purposes but don't need to query it every day, then copy it out to some long-term archiving format, or another database instance on a large-capacity server, and then delete the data from your primary database.
Changing data types to be smaller. For example, why use a BIGINT (64-bits) when a SMALLINT (16-bits) is sufficient for the values you store? It may seem like a small change, but it adds up. Values are stored in the row, but also stored again in any indexes that include that column.
Using compression. The best results are in text and strings that store readable text. The amount of compression depends on the nature of the data. Don't count on this too much, because at best one can expect a 2:1 ratio of compression, and often not even that much.
Ultimately, databases tend to grow larger, and often even the rate of growth accelerates. If you accumulate a lot of data and never delete or archive them, you must make a strategy to support the growth. You may just have to get larger and larger storage volumes.
I tried to know how much extent ( "free space" ) does my database have after deleting a rather large table. ( Around 10GB )
I have run the command:
SELECT table_schema "Data Base Name",
round( sum( data_free ) / 1024 / 1024 / 1024 ) "Free Space in GB"
FROM information_schema.TABLES
GROUP BY table_schema;
which gave me a list of databases, and their "free spaces".
The problem is, that the database which had the 10GB table removed now has a 1500GB+ free space according to this report which is significally bigger than my actual hard drive capacity. ( which is around 200GB )
How is this possible? How could I get a more realistic report? Am I missing something?
UPDATE
As an experiment, I have added and removed an 1GB table in this database, now the report shows around 110GB more free space. Might there be a problem with my configuration, or is this a common issue?
(This is answering some of the questions buried in Comments.)
Misnomer "Free" space only includes whole blocks, not spare room inside blocks, and many other details.
Case 1: All tables are in ibdata1 -- SHOW TABLE STATUS (or the equivalent query into information_schema will show the same Data_free value, namely how much is free in ibdata1. This space can be reused by any table. It is hard to give the space back to the OS.
Case 2: All tables are file_per_table -- Now each Data_free refers to the space for the table. And the SUM() is meaningful. (ibdata1 still exists, but it does not contain any real tables; there is a lot of other stuff that InnoDB needs.)
Case 3: Mixture -- If you turn file_per_table on/off at various times, some tables will be in ibdata1, some will have their own tablespaces.
Case 4: CREATE TABLESPACE in 5.7 -- For example, you can have a tablespace for each database.
Case 5: PARTITIONed tables -- Each partition acts like a table.
Case 6: 8.0 -- Even more changes are coming.
Database == Directory In MySQL's directory tree each database can be seen as a filesystem directory. Within that directory can be seen some set of files for each table. The .frm file contains the table definition. If an .ibd file exists, the table was created with file_per_table. This may be the most reliable way to discover whether the table is file_per_table. (8.0 will have significant changes here.)
How much space can I reuse? There is no good answer. Usually inserting a row will find space in the block where it belongs, and Data_free will not shrink. But, if there were block split(s), Data_free can drop by some multiple of 16KB (the block size) or 4MB (the "extent size" - or maybe it is 8MB?). Also, random inserts lead to BTree blocks being, on average, about 69% full.
Changing innodb_file_per_table has no effect until the next CREATE TABLE or ALTER TABLE. And then it only has effect on where to put the newly created/copied data+indexes (ibdata1 or .ibd). It will not destroy data.
Big tables usually have 4MB to 7MB of Data_free. When computing how many rows you can add, don't plan on Data_free dropping below that range.
Avg_row_size should be useful. But sometimes it (and Rows) are poorly approximated. Their product (Data_length) is always correct. So, this might be a good estimate of "rows to go before grabbing more space from OS:
(Data_free - 7M) / Avg_row_size
Tablespace Recommendations: Put 'big' tables in file_per_table. Put 'tiny' tables in ibdata1 or database-specific tablespaces (5.7). Sorry, no simple recommendation on the dividing line between 'big' and 'tiny'. And it is clumsy to migrate a table: SET global innodb_file_per_table = ...;; logout; login (to pick up the global); ALTER TABLE tbl ENGINE=InnoDB;. And it is necessarily a full copy of the table.
(Caveat: I have left out many details.)
It sounds as though you do not have innondb_file_per_table set, and are therefore using a shared table space. If so, then you will be reurning the global 'allocated but unused' shared space, repeatedly for each table_schema.
I am trying to reduce the disk space usage of a table in an RDS instance of MySQL 5.6.23. It's an InnoDB table with about 8 million rows and 30 columns. Several of the columns are of type TEXT NULL DEFAULT NULL. One of the reasons why the table is so big is because rather than deleting rows from this table, they are instead marked as deleted via a flag column named 'deleted'.
After reading the MySQL documentation on storage requirements:
http://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html
It seems as though the storage required for a TEXT field depends on the length of text in the field rather than being a fixed size (L + 2 bytes, where L < 2^16 and where L is the length of the value in bytes). So although I've read elsewhere that these fields are in fact fixed width, I processed about 50,000 rows marked as deleted and set all their TEXT column values to null.
However, there was no reduction in disk space reported either by the MySQL client or the AWS Console RDS interface. Why didn't this free up disk space?
When you set the column value to NULL InnoDB would have to reorganize the record storage in order to reduce the total amount of disk space used by the table. You should see a reduction if you a dummy ALTER TABLE that is not dummy enough for MySQL to notice a short-circuit way to do it making it actually rebuild the table, or manually drop, re-create, and reinsert the records. OPTIMIZE TABLE should do it as well.
Sasha's answer may or may not apply.
After setting the column to NULL, any freed blocks are made available for future INSERTs / UPDATEs. But the freed blocks are not given back to the OS. Whether a block is freed depends on a lot of details.
The amount of disk space for a TEXT field depends both on the amount of text and the Row_format ("Compact", etc). A TEXT column may be entirely or partially stored in a block separate from the rest of the data.
If your table was created while innodb_file_per_table was ON, then OPTIMIZE TABLE will give the free space back to the OS. And SHOW TABLE STATUS will show some decrease in values.
If innodb_file_per_table had been OFF, freed up space is left in ibdata1, but that file is not shrunken. It can be shrunken only by dump all tables; stop mysqld; remove ibdata1; restart; reload. (Yuck.) OPTIMIZE TABLE will increase Data_free inside ibdata1.
(Assuming OFF) This will make the table more manageable, but leave a lot of free space in an un-shrunken ibdata1:
SET innodb_file_per_table = ON;
ALTER TABLE foo ENGINE=InnoDB;
If you anticipate growth in ibdata1 for other reasons; this may be wise to do. Otherwise, it just makes the disk space problem worse.
I have been using a method oft seen for measuring table size using a query, e.g.:
INSERT INTO tableRecords (loadTime, dataFromDate, tableName, rowCount, sizeMB)
SELECT NOW(),
SUBDATE(CURDATE(),1),
'table_name',
COUNT(*),
(SELECT ROUND(((data_length + index_length) / 1024 / 1024), 2)
FROM information_schema.TABLES
WHERE table_schema = 'db_name' AND table_name = 'table_name')
FROM table_name
I've been running this daily for some time
However, I notice that often for days at a time the number stays the same, regardless of how many rows are added
Is there a better way to do this so that I can ensure I'm getting the current table size?
I am able to run multiple queries with the script I am using
Aware of the OPTIMIZE TABLE command but I'm unsure if it's the right thing to use or exactly how/when to use it. Is it necessary to solve my problem? I could see it taking some time to run all of the OPTIMIZE TABLE commands every day in my case (some large tables)
If that is the right way to go, should I just run this command daily prior to my INSERT?
Are there any other options?
You should know that the number reported for table size in INFORMATIONS_SCHEMA.TABLES is only an estimate -- it can be off +/- 10% or more. It may also change from time to time, when MySQL updates its statistics.
This may be why the number doesn't seem to change even as you insert lots of rows. You can force the statistics to update with ANALYZE TABLE.
Using SHOW TABLE STATUS is really just reading the same values in the INFORMATION_SCHEMA, so it has the same issue with precision.
Looking at the physical file size on disk is not accurate either, because the file can normally contain some amount of fragmentation, from past deletions or rows that don't fill up a given space in the file fully. Therefore, the real size of data is quite different from the physical size of the file.
This also means that the "size" of a table could mean different things. Is it the size of real rows of data? The size of the pages that data occupies? The size of the pages if they were defragmented? The size of the physical file on disk?
Why do you need such a precise measure of table size, anyway? Even if you could get a perfectly precise table size measurement, it would become obsolete as soon as you inserted another row. That's why an estimate is sufficient for most purposes.
As for when to use OPTIMIZE TABLE, I recommend to use it after I've done some operation that could cause significant fragmentation, like a series of many DELETE statements. Or periodically to defragment. How frequently depends on the level of activity on the table. A table with low traffic perhaps yearly would be enough. A table with a lot of inserts/updates/deletes maybe every couple of weeks. Doing it daily is likely to be overkill.
You're right that it locks your table and block activity for some time, the larger the table, the longer it needs. You can use pt-online-schema-change to allow your table restructure to happen in the background without blocking activity on the original table.
Also, MySQL 5.6 can do OPTIMIZE TABLE for InnoDB tables as online DDL, without locking.
Re your comment:
InnoDB updates statistics from time to time, but it's not impossible for it to go days between the auto-refresh of statistics. Here are some references that talk about this in more detail:
Controlling Optimizer Statistics Estimation
When Does InnoDB Update Table Statistics? (And When It Can Bite)
When are InnoDB table index statistics updated?
InnoDB Persistent Statistics at last (for MySQL 5.6)
We have a large InnoDB table which is gigabytes in size. We're now clearing it up. Rows are being removed however from MySQLAdministrator, the data length of the table has not reduced at all. Why is this so? Should we run "Optimize table"?
Thanks!
Krt_Malta
I don't think optimize table will work.. The space made available by your deletion is not returned to disk but is most probably empty indeed. The engine will simply start using that free space before taking up more physical disk space.