LOAD DATA reclaim disk space after delete - mysql

I have a DB schema composed of MYISAM tables, i am interested to delete old records from time to time from some of the tables.
I know that delete does not reclaim the memory space, but as i found in a description of DELETE command, inserts may reuse the space deleted
In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions.
I am interested if LOAD DATA command also reuses the deleted space?
UPDATE
I am also interested how the index space reclaimed?
UPDATE 2012-12-03 23:11
some more info supplied based on the answer received from #RolandoMySQLDBA
after executing the following suggested query i got different results for different tables for which space need to be reused or reclaimed:
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable1';
> Dynamic
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable2';
> Fixed
UPDATE 2012-12-09 08:06
LOAD DATA do reuses previously deleted space (i have checked it by running a short script) if and only if the row format is fixed or (the row format is dynamic and there is a deleted row with exactly the same size).
it seems that if the row_format is dynamic, full look-up over the deleted list is made for each record , and if the exact row size is not found , the deleted record is not used, and the table memory usage will raise, additionally LOAD DATA will take much more time to import records.
I will except the answer given here , since it describes all the process perfectly.

For a MySQL table called mydb.mytable just run the following:
OPTIMIZE TABLE mydb.mytable;
You could also do this in stages:
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;
In either case, the table ends up with no fragmentation.
Give it a Try !!!
UPDATE 2012-12-03 12:50 EDT
If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE, please note the following:
When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either
SHOW CREATE TABLE mydb.mytable\G
or
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';
Since the row format of your table is Dynamic, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.
The presence of such rows can make myisamchk struggle.
This is why I recommended running OPTIMIZE TABLE. That way, data would be appended quicker.
UPDATE 2012-12-03 12:58 EDT
Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.
You could still defragment your table at your earliest convenience using OPTIMIZE TABLE.
UPDATE 2012-12-03 13:40 EDT
Why don't run the my second sugesstion
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;
This will give you an idea
How long OPTIMIZE TABLE would take to run
How much smaller the .MYD and .MYI would be after running OPTIMIZE TABLE
After you run my second suggestion, you can compare them with
SELECT
A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable_new'
) B;
UPDATE 2012-12-03 16:42 EDT
Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE. Each INSERT would pick the first row of the deleted rows.
We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:
The presence of a list of deleted rows, thus choosing from the list
Row_Format=Dynamic : list of deleted rows with each row with a different length
Row_Format=Fixed : list of deleted rows with all rows the same length
The absence of a list of deleted rows, thus appending
Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)
For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.

Related

Delete from a specific row to the last one

I have a table (cars) that has 26500 rows. Is it possible to delete from the row number 10001 through the end?
in InnoDB Tables
If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use DELETE at all) might be helpful:
Step 1: Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO `cars_copy` SELECT * FROM `cars` LIMIT 10000 ;
Step 2: Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE `cars` TO `cars_old`, `cars_copy` TO `cars` ;
Step 3: Drop the original table:
DROP TABLE `cars_old`;
No other sessions can access the tables involved while RENAME TABLE executes, so the rename operation is not subject to concurrency problems.
When your Rows are labelled with an ID you can just do this:
DELETE FROM cars WHERE ID > 10000

Mysql Batch insert around 11 GB data from one table to another [duplicate]

Is there a more-efficent, less laborious way of copying all records from one table to another that doing this:
INSERT INTO product_backup SELECT * FROM product
Typically, the product table will hold around 50,000 records. Both tables are identical in structure and have 31 columns in them. I'd like to point out this is not my database design, I have inherited a legacy system.
There's just one thing you're missing. Especially, if you're using InnoDB, is you want to explicitly add an ORDER BY clause in your SELECT statement to ensure you're inserting rows in primary key (clustered index) order:
INSERT INTO product_backup SELECT * FROM product ORDER BY product_id
Consider removing secondary indexes on the backup table if they're not needed. This will also save some load on the server.
Finally, if you are using InnoDB, reduce the number of row locks that are required and just explicitly lock both tables:
LOCK TABLES product_backup WRITE;
LOCK TABLES product READ;
INSERT INTO product_backup SELECT * FROM product ORDER BY product_id;
UNLOCK TABLES;
The locking stuff probably won't make a huge difference, as row locking is very fast (though not as fast as table locks), but since you asked.
mysqldump -R --add-drop-table db_name table_name > filepath/file_name.sql
This will take a dump of specified tables with a drop option to delete the exisiting table when you import it. then do,
mysql db_name < filepath/file_name.sql
DROP the destination table:
DROP TABLE DESTINATION_TABLE;
CREATE TABLE DESTINATION_TABLE AS (SELECT * FROM SOURCE_TABLE);
I don't think this will be worthy for a 50k table but:
If you have the database dump you can reload a table from it. As you want to load a table in another one you could change the table name in the dump with a sed command:
Here you have some hints:
http://blog.tsheets.com/2008/tips-tricks/mysql-restoring-a-single-table-from-a-huge-mysqldump-file.html
An alternative (depending on your design) would be to use triggers on the original table inserts so that the duplicated table gets the data as well.
And a better alternative would be to create another MySQL instance and either run it in a master-slave configuration or in a daily dump master/load slave fashion.

Why would a duped table be half the size?

I duplicated a large table like this:
CREATE TABLE newtable LIKE oldtable;
INSERT newtable SELECT * FROM oldtable;
I checked rows afterwards like this:
select count(*) from oldtable;
select count(*) from newtable;
to make sure they had the same # of rows
However in Navicat (whose counts I don't trust) the Data Length seemed way off so I ran a query on size like this:
SELECT
table_name AS `Table`,
round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB`
FROM information_schema.TABLES
WHERE table_schema = "$MY_DB"
AND table_name = "#MY_TABLE"
And the duped table is indeed almost 1/2 the size. I know for now that is some sparse information but I am wondering now if either:
There was some error in the duplication (spot checking indicates no)
Duplicating in that way somehow optimize(d)(s) the data/indexes
On an InnoDB table, over time the index will accumulate something similar to hard disk fragmentation, where INSERT and DELETE operations leave space in the index file that is not made available for reuse by the operating system.
By duplicating the table via INSERT INTO...SELECT *..., the new table is essentially getting a cleaner index with less of the fragmentation the old one had. You can reclaim that space on the old table by running an OPTIMIZE TABLE statement.
Use OPTIMIZE TABLE in these cases, depending on the type of table:
After doing substantial insert, update, or delete operations on an InnoDB table that has its own .ibd file because it was created with the innodb_file_per_table option enabled. The table and indexes are reorganized, and disk space can be reclaimed for use by the operating system.
So in your environment:
OPTIMIZE TABLE oldtable
Afterward, check the index_length in information_schema.TABLES again and its size will likely be closer to that of the duplicated table.
Note: On a large table this may take a long time, so you should not attempt it when locking will be an issue. For InnoDB, MySQL may actually drop and recreate the table rather than optimize the existing index in place.
Addendum after some tests:
Even when duplicating a large table in my testing, though the new table's index_length was considerably smaller than the old one, it still had space that could be reclaimed with OPTIMIZE TABLE. After optimizing the newly duplicated table, (with no intervening inserts/deletions) both old and new had identical sizes.

Mysql delete and optimize very slow

I searched Internet and Stack Overflow for my trouble, but couldn't find a good solution.
I have a table (MySql MyISAM) containing 300,000 rows (one column is blob field).
I must use:
DELETE FROM tablename WHERE id IN (1,4,7,88,568,.......)
There are nearly 30,000 id's in the IN syntax.
It takes nearly 1 hour. Also It does not make the .MYD file smaller although I delete 10% of it, so I run OPTIMIZE TABLE... command. It also lasts long...(I should use it, because disk space matters for me).
What's a way to improve performance when deleting the data as above and recover space? (Increasing buffer size? which one? or else?)
With IN, MySQL will scan all the rows in the table and match the record against the IN clause. The list of IN predicates will be sorted, and all 300,000 rows in the database will get a binary search against 30,000 ids.
If you do this with JOIN on a temporary table (no indexes on a temp table), assuming id is indexed, the database will do 30,000 binary lookups on a 300,000 record index.
So, 300,000 binary searches against 30,000 records, or 30,000 binary searches against 300,000 records... which is faster? The second one is faster, by far.
Also, delaying the index rebuilding with DELETE QUICK will result in much faster deletes. All records will simply be marked deleted, both in the data file and in the index, and the index will not be rebuilt.
Then, to recover space and rebuild the indexes at a later time, run OPTIMIZE TABLE.
The size of the list in your IN() statement may be the cause. You could add the IDs to a temporary table and join to do the deletes. Also, as you are using MyISAM you can use the DELETE QUICK option to avoid the index hit whilst deleting:
For MyISAM tables, if you use the QUICK keyword, the storage engine
does not merge index leaves during delete, which may speed up some
kinds of delete operations.
I think the best approach to make it faster is to create a new table and insert into it the rows which you dont want to delete and then drop the original table and then you can copy the content from the table to the main table.
Something like this:
INSERT INTO NewTable SELECT * FROM My_Table WHERE ... ;
Then you can use RENAME TABLE to rename the copy to the original name
RENAME TABLE My_Table TO My_Table_old, NewTable TO My_Table ;
And then finally drop the original table
DROP TABLE My_Table_old;
try this
create a table name temptable with a single column id
insert into table 1,4,7,88,568,......
use delete join something like
DELETE ab, b FROM originaltable AS a INNER JOIN temptable AS b ON a.id= b.id where b.id is null;
its just an idea . the query is not tested . you can check the syntax on google.

Optimize mySql for faster alter table add column

I have a table that has 170,002,225 rows with about 35 columns and two indexes. I want to add a column. The alter table command took about 10 hours. Neither the processor seemed busy during that time nor were there excessive IO waits. This is on a 4 way high performance box with tons of memory.
Is this the best I can do? Is there something I can look at to optimize the add column in tuning of the db?
I faced a very similar situation in the past and i improve the performance of the operation in this way :
Create a new table (using the structure of the current table) with the new column(s) included.
execute a INSERT INTO new_table (column1,..columnN) SELECT (column1,..columnN) FROM current_table;
rename the current table
rename the new table using the name of the current table.
ALTER TABLE in MySQL is actually going to create a new table with new schema, then re-INSERT all the data and delete the old table. You might save some time by creating the new table, loading the data and then renaming the table.
From "High Performance MySQL book" (the percona guys):
The usual trick for loading MyISAM table efficiently is to disable keys, load the data and renalbe the keys:
mysql> ALTER TABLE test.load_data DISABLE KEYS;
-- load data
mysql> ALTER TABLE test.load_data ENABLE KEYS;
Well, I would recommend using latest Percona MySQL builds plus since there is the following note in MySQL manual
In other cases, MySQL creates a
temporary table, even if the data
wouldn't strictly need to be copied.
For MyISAM tables, you can speed up
the index re-creation operation (which
is the slowest part of the alteration
process) by setting the
myisam_sort_buffer_size system
variable to a high value.
You can do ALTER TABLE DISABLE KEYS first, then add column and then ALTER TABLE ENABLE KEYS. I don't see anything can be done here.
BTW, can't you go MongoDB? It doesn't rebuild anything when you add column.
Maybe you can remove the index before alter the table because what is take most of the time to build is the index?
Combining some of the comments on the other answers, this was the solution that worked for me (MySQL 5.6):
create table mytablenew like mytable;
alter table mytablenew add column col4a varchar(12) not null after col4;
alter table mytablenew drop index index1, drop index index2,...drop index indexN;
insert into mytablenew (col1,col2,...colN) select col1,col2,...colN from mytable;
alter table mytablenew add index index1 (col1), add index index2 (col2),...add index indexN (colN);
rename table mytable to mytableold, mytablenew to mytable
On a 75M row table, dropping the indexes before the insert caused the query to complete in 24 minutes rather than 43 minutes.
Other answers/comments have insert into mytablenew (col1) select (col1) from mytable, but this results in ERROR 1241 (21000): Operand should contain 1 column(s) if you have the parenthesis in the select query.
Other answers/comments have insert into mytablenew select * from mytable;, but this results in ERROR 1136 (21S01): Column count doesn't match value count at row 1 if you've already added a column.