reclaim unused space in SQL Server 2008 - sql-server-2008

I have a table with more than 300,000 records, of size approximately 1.5 GB
In that table I have three varchar(5000) fields, the rest are small fields.
In issuing an update, setting those three fields to ''.
After a shrink (database and files) the database uses almost the same space as before...
DBCC SHRINKDATABASE(N'DataBase' )
DBCC SHRINKFILE (N'DataBase' , 1757)
DBCC SHRINKFILE (N'DataBase_log' , 344)
Any ideas on how to reclaim that disk space?

Essentially, you have to "move" the contents of the table from one place on the hard drive to another. When so moved, SQL will "repack" the contents of the pages efficiently. Just replacing 5000 bytes of data with 3 (or 0 and a flipped null bitmask) will not cause SQL to revised or rewrite the contents of the table's pages.
If the table has a clustered index, just reindexing it (ALTER INDEX... REBUILD...) will do the trick.
If the table does not have a clustered index, you can either create one and then drop it, or SELECT...INTO... a new table, drop the old table, and rename the new one to the original name.

Just because you set the column to nil doesn't mean the database will reorg the table. The updated record will still fit on the same page it fit on before (the amount of free space on the page will increase).
Also, you do know, don't you, that varchar(5000) doesn't mean that it takes up 5000 octets? It's variable length -- a two-octet length prefix containing the data length of the field, followed by the data octects. Setting a varchar(5000) column in a row to 'foobar' will required 8 octets of space (2+6).
Re-build your indices, including the clustering index.
If you don't have a clustering index, add one. That will force a reorg of the table. Now drop the clustering index.
Now when you shrink the datafile, you should reclaim some disk space.

I've just had to set those fields to null, issue the shrink, and then set them to ''
and the db went from 1.5 GB to 115 MB
pretty strange...
--
in fact, setting those fields to nullable -that means recreating the whole table- did the trick

Related

innodb_ft_min_token_size = 1 performance implications

If I change innodb_ft_min_token_size =1 from default of 3, will this cause a lot more disk usage? Any performance issues with search?
I want to be able to use fulltext search in 1 character in words.
Also once I make this change how would I rebuild the index? Will this put a lot of load on server?
There are not that many 1- and 2- letter words, so the space change may not be that great.
Modifying innodb_ft_min_token_size, innodb_ft_max_token_size, or ngram_token_size [in my.cnf] requires restarting the server.
To rebuild FULLTEXT indexes for an InnoDB table, use ALTER TABLE with the DROP INDEX and ADD INDEX options to drop and re-create each index.
-- https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html
The "Scope" of innodb_ft_min_token_size is "Global". That is, it applies to all InnoDB FT indexes.
-- https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_ft_min_token_size
Recreating the index will read the entire table and rebuild the FT index, which will "lock" the table at some level for some period of time. The time to rebuild will be roughly proportional to the size of the table. And it will consume a bunch of extra disk space until it is finished. (The table and all the indexes will be copied over and at least the FT index will be rebuilt.)
If you have a thousand rows, no big deal. If you have a billion rows, you will need a long "downtime".
After changing the innodb_ft_max_token_size, I would be afraid to do a short wildcard test like
AGAINST('a*' IN BOOLEAN MODE)
If you have a test server, simply try it.
I noticed that the documentation recommends a value of 1 for Chinese, etc.

MySQL Table taking up too much space?

I have a raw text file, with the size of 8.1GB.
The input data is very straight forward:
Lab_A (string), Lab_B (string), Distance (float)
I was trying to load the data into a table, using LOAD DATA INFILE, but the drive ran out of space.
The destination table had the following format:
Id (INT), Lab_A (VARCHAR), Lab_B (VARCHAR), Distance (FLOAT).
With a primary key of Id and an index of (Lab_A + Distance).
Create statement below:
CREATE TABLE 'warwick_word_suite'.'distances' (
'id' INT NOT NULL AUTO_INCREMENT,
'label1' VARCHAR(45) NOT NULL,
'label2' VARCHAR(45) NOT NULL,
'distance' FLOAT NOT NULL,
PRIMARY KEY ('id'),
INDEX 'LABEL_INDEX' ('label1' ASC, 'distance' ASC));
The drive had 50GB and ran out of space. Given 10GB reserved for the system, I am assuming the table was requesting more than > 32GB for the table.
My question is:
How much do InnoDB tables actually take up, relative to the size of the input data?
Do indexed tables take up a lot more space, compared to identical unindexed tables?
Should I simply order a bigger drive for my database server?
EDIT:
I tracked down the data hog to "ibdata1", stored in /var/lib/mysql. This file is taking up 30.3GB.
Double trouble.
InnoDB takes 2x-3x what the raw data takes. This is a crude approximation; there are many factors.
ibdata1 is the default place to put the table. Having tried to put the table there, that file will not shrink. This can be a problem. It would have been better to have innodb_file_per_table = ON before trying to load the file. Then the table would have gone into a separate .ibd file, and upon failure, that file would have vanished. As it is, you are low on disk space with no simple way to recover it. (Recovery includes dumping all the other InnoDB tables, stopping mysqld, removing ibdata1, restarting, and then reloading the other tables.
Back to the ultimate problem... How to use the data. First, can we see a sample (a few rows) of the data. There may be some clues. How many rows in the table (or lines in the file)?
This may be a case for loading into MyISAM instead of InnoDB; the size for that table will be closer to 8.1GB, plus two indexes, which may add another 5-10GB. Still unpleasantly tight.
Normalizing the lab names would probably be a big gain. Suppose you have 10K labs and 100M distances (every lab to every other lab). Half of those are redundant? Normalizing lab names would save maybe 50 bites per row -- perhaps half the space?
Or you could get more disk space.
Ponder which suggestion(s) of the above you want to tackle; the let us know what you still need help with.

Handling huge MyISAM table for optimisation

I have a huge (and growing) MyISAM table (700millions rows = 140Gb).
CREATE TABLE `keypairs` (
`ID` char(60) NOT NULL,
`pair` char(60) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM
The table option was changed to ROW_FORMAT=FIXED, cause both columns are always fixed length to max (60). And yes yes, ID is well a string sadly and not an INT.
SELECT queries are pretty ok in speed efficiency.
Databases and mysql engine are all 127.0.0.1/localhost. (nothing distant)
Sadly, INSERT is slow as hell. I dont even talk about trying to LOAD DATA millions new rows... takes days.
There won't have any concurrent read on it. All SELECTs are done one by one by only my local server.(it is not for client's use)
(for infos : files sizes .MYD=88Gb, .MYI=53Gb, .TMM=400Mb)
How could i speed up inserts into that table?
Would it help to PARTITION that huge table ? (how then?)
I heard MyISAM is using "structure cache" as .frm files. And that a line into config file is helping mysql keep in memory all the .frm (in case of partitionned), would it help also? Actualy, my .frm file is 9kb only for 700millions rows)
string shortenning/compress function... the ID string? (same idea as rainbow tables) even if it lowers the max allowed unique ID's, i will anyway never reach the max of 60chars. so maybe its an idea? but before creating a new unique ID i have to check if shortened string doesn't exists in db ofc
Same idea as shortening ID strings, what about using md5() on the ID? shorten string means faster or not in that case?
Sort the incoming data before doing the LOAD. This will improve the cacheability of the PRIMARY KEY(id).
PARTITIONing is unlikely to help, unless there is some useful pattern to ID.
PARTITIONing will not help for single-row insert nor for single-row fetch by ID.
If the strings are not a constant width of 60, you are wasting space and speed by saying CHAR instead of VARCHAR. Change that.
MyISAM's FIXED is useful only if there is a lot of 'churn' (deletes+inserts, and/or updates).
Smaller means more cacheable means less I/O means faster.
The .frm is an encoding of the CREATE TABLE; it is not relevant for this discussion.
A simple compress/zip/whatever will almost always compress text strings longer than 10 characters. And they can be uncompressed, losslessly. What do your strings look like? 60-character English text will shrink to 20-25 bytes.
MD5 is a "digest", not a "compression". You cannot recover the string from its MD5. Anyway, it would take 16 bytes after converting to BINARY(16).
The PRIMARY KEY is a BTree. If ID is somewhat "random", then the 'next' ID (unless the input is sorted) is likely not to be cached. No, the BTree is not rebalanced all the time.
Turning the PRIMARY KEY into a secondary key (after adding an AUTO_INCREMENT) will not speed things up -- it still has to update the BTree with ID in it!
How much RAM do you have? For your situation, and for this LOAD, set MyISAM's key_buffer_size to about 70% of available RAM, but not bigger than the .MYI file. I recommend a big key_buffer because that is where the random accesses are occurring; the .MYD is only being appended to (assuming you have never deleted any rows).
We do need to see your SELECTs to make sure these changes are not destroying performance somewhere else.
Make sure you are using CHARACTER SET latin1 or ascii; utf8 would waste a lot more space with CHAR.
Switching to InnoDB will double, maybe triple, the disk space for the table (data+index). Therefore, it will probably show down. But a mitigating factor is that the PK is "clustered" with the data, so you are not updating two things for each row inserted. Note that key_buffer_size should be lowered to 10M and innodb_buffer_pool_size should be set to 70% of available RAM.
(My bullet items apply to InnoDB except where MyISAM is specified.)
In using InnoDB, it would be good to try to insert 1000 rows per transaction. Less than that leads to more transaction overhead; more than that leads to overrunning the undo log, causing a different form of slowdown.
Hex ID
Since ID is always 60 hex digits, declare it to be BINARY(30) and pack them via UNHEX(...) and fetch via HEX(ID). Test via WHERE ID = UNHEX(...). That will shrink the data about 25%, and MyISAM's PK by about 40%. (25% overall for InnoDB.)
To do just the conversion to BINARY(30):
CREATE TABLE new (
ID BINARY(30) NOT NULL,
`pair` char(60) NOT NULL
-- adding the PK later is faster for MyISAM
) ENGINE=MyISAM;
INSERT INTO new
SELECT UNHEX(ID),
pair
FROM keypairs;
ALTER TABLE keypairs ADD
PRIMARY KEY (`ID`); -- For InnoDB, I would do differently
RENAME TABLE keypairs TO old,
new TO keypairs;
DROP TABLE old;
Tiny RAM
With only 2GB of RAM, a MyISAM-only dataset should use something like key_buffer_size=300M and innodb_buffer_pool_size=0. For InnoDB-only: key_buffer_size=10M and innodb_buffer_pool_size=500M. Since ID is probably some kind of digest, it will be very random. The small cache and the random key combine to mean that virtually every insert will involve a disk I/O. My first estimate would be more like 30 hours to insert 10M rows. What kind of drives do you have? SSDs would make a big difference if you don't already have such.
The other thing to do to speed up the INSERTs is to sort by ID before starting the LOAD. But that gets tricky with the UNHEX. Here's what I recommend.
Create a MyISAM table, tmp, with ID BINARY(30) and pair, but no indexes. (Don't worry about key_buffer_size; it won't be used.)
LOAD the data into tmp.
ALTER TABLE tmp ORDER BY ID; This will sort the table. There is still no index. I think, without proof, that this will be a filesort, which is much faster that "repair by key buffer" for this case.
INSERT INTO keypairs SELECT * FROM tmp; This will maximize the caching by feeding rows to keypairs in ID order.
Again, I have carefully spelled out things so that it works well regardless of which Engine keypairs is. I expect step 3 or 4 to take the longest, but I don't know which.
Optimizing a table requires that you optimize for specific queries. You can't determine the best optimization strategy unless you have specific queries in mind. Any optimization improves one type of query at the expense of other types of queries.
For example, if your query is SELECT SUM(pair) FROM keypairs (a query that would have to scan the whole table anyway), partitioning won't help, and just adds overhead.
If we assume your typical query is inserting or selecting one keypair at a time by its primary key, then yes, partitioning can help a lot. It all depends on whether the optimizer can tell that your query will find its data in a narrow subset of partitions (ideally one partition).
Also make sure to tune MyISAM. There aren't many tuning options:
Allocate key_buffer_size as high as you can spare to cache your indexes. Though I haven't ever tried anything higher than about 10GB, and I can't guarantee that MyISAM key buffers are stable at 53GB (the size of your MYI file).
Pre-load the key buffers: https://dev.mysql.com/doc/refman/5.7/en/cache-index.html
Size read_buffer_size and read_rnd_buffer_size appropriately given the queries you run. I can't give a specific value here, you should test different values with your queries.
Size bulk_insert_buffer_size to something large if you want to speed up LOAD DATA INFILE. It's 8MB by default, I'd try at least 256MB. I haven't experimented with that setting, so I can't speak from experience.
I try not to use MyISAM at all. MySQL is definitely trying to deprecate its use.
...is there a mysql command to ALTER TABLE add INT ID increment column automatically?
Yes, see my answer to https://stackoverflow.com/a/251630/20860
First, your primary key is not incrementable.
Which means, roughly: at every insert the index have to be rebalanced.
No wonder it goes slowpoke at the table of such a size.
And such an engine...
So, to the second: what's the point of keeping that MyISAM old junk?
Like, for example, you don't mind to loose row or two (or -teen) in case of an accident? And etc, etc, etc, even setting aside that current MySQL maintainer (Oracle Corp) explicitly discourages usage of MyISAM.
So, here are possible solutions:
1) Switch to Inno;
2) If you can't surrender the char ID, then:
Add autoincrement numerical key and set it primary - then, index would be clustered and the cost of insert would drop significantly;
Turn your current key into secondary index;
3) In case you can - it's obvious

Setting TEXT values to null does not reduce disk space usage of MySQL table

I am trying to reduce the disk space usage of a table in an RDS instance of MySQL 5.6.23. It's an InnoDB table with about 8 million rows and 30 columns. Several of the columns are of type TEXT NULL DEFAULT NULL. One of the reasons why the table is so big is because rather than deleting rows from this table, they are instead marked as deleted via a flag column named 'deleted'.
After reading the MySQL documentation on storage requirements:
http://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html
It seems as though the storage required for a TEXT field depends on the length of text in the field rather than being a fixed size (L + 2 bytes, where L < 2^16 and where L is the length of the value in bytes). So although I've read elsewhere that these fields are in fact fixed width, I processed about 50,000 rows marked as deleted and set all their TEXT column values to null.
However, there was no reduction in disk space reported either by the MySQL client or the AWS Console RDS interface. Why didn't this free up disk space?
When you set the column value to NULL InnoDB would have to reorganize the record storage in order to reduce the total amount of disk space used by the table. You should see a reduction if you a dummy ALTER TABLE that is not dummy enough for MySQL to notice a short-circuit way to do it making it actually rebuild the table, or manually drop, re-create, and reinsert the records. OPTIMIZE TABLE should do it as well.
Sasha's answer may or may not apply.
After setting the column to NULL, any freed blocks are made available for future INSERTs / UPDATEs. But the freed blocks are not given back to the OS. Whether a block is freed depends on a lot of details.
The amount of disk space for a TEXT field depends both on the amount of text and the Row_format ("Compact", etc). A TEXT column may be entirely or partially stored in a block separate from the rest of the data.
If your table was created while innodb_file_per_table was ON, then OPTIMIZE TABLE will give the free space back to the OS. And SHOW TABLE STATUS will show some decrease in values.
If innodb_file_per_table had been OFF, freed up space is left in ibdata1, but that file is not shrunken. It can be shrunken only by dump all tables; stop mysqld; remove ibdata1; restart; reload. (Yuck.) OPTIMIZE TABLE will increase Data_free inside ibdata1.
(Assuming OFF) This will make the table more manageable, but leave a lot of free space in an un-shrunken ibdata1:
SET innodb_file_per_table = ON;
ALTER TABLE foo ENGINE=InnoDB;
If you anticipate growth in ibdata1 for other reasons; this may be wise to do. Otherwise, it just makes the disk space problem worse.

Does deleting a row in a database makes that database free up space?

Inserting a row in a database adds up data to it. Does deleting a row free up space in that database? Or will the row just be deleted without changing the amount of used space?
When you use Oracle, space from removed rows can be reclaimed immediately using specific commands. Otherwise it will stay reserved for new rows, as #oymustang86 mentions.
For SQL, it depends on from whose standpoint. For example, if you can insert a million rows before running out of disk space, if you delete half of them, then you have space to insert a half million more again. (Ignoring transaction logging-- just imagine that the logging is done elsewhere)
The data file, after it has grown, will not shrink. So the drive space used by the data file isn't available to the OS or any other application. Until, a database shrink operation is done. That re-arranges the data inside the file to free up space on the drive for the OS or other application to use.
Ask Tom says for Oracle :
When you delete the data from the table -- the blocks will go onto the
freelist for that table (assuming the amount of space on a block that
was freed fell below the pctused). These blocks will be used for
subsequent inserts and updates into this table.
When you delete data from the index -- if the block the index entry
was on is now "empty"
-- that block will go back onto the freelist to be used anywhere in the index struct. Else that block stays where it is and data that
would naturally go onto that block (because of where it is in the
b*tree) will go there.
Space is effectively reused when you delete. Your database will not
show any new free space in dba_free_space -- it will have more blocks
on freelists and more empty holes in index structures.