Is there any way to do a bulk/faster delete in mysql? - mysql

I have a table with 10 million records, what is the fastest way to delete & retain last 30 days.
I know this can be done in event scheduler, but my worry is if takes too much time, it might lock the table for much time.
It will be great if you can suggest some optimum way.
Thanks.

Offhand, I would:
Rename the table
Create an empty table with the same name as your
original table
Grab the last 30 days from your "temp" table and insert
them back into the new table
Drop the temp table
This will enable you to keep the table live through (almost) the entire process and get the past 30 days worth of data at your leisure.

You could try partition tables.
PARTITION BY LIST (TO_DAYS( date_field ))
This would give you 1 partition per day, and when you need to prune data you just:
ALTER TABLE tbl_name DROP PARTITION p#
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Not that it helps you with your current problem, but if this is a regular occurance, you might want to look into a merge table: just add tables for different periods in time, and remove them from the merge table definition when no longer needed. Another option is partitioning, in which it is equally trivial to drop a (oldest) partition.

To expand on Michael Todd's answer.
If you have the space,
Create a blank staging table similar to the table you want to reduce in size
Fill the staging table with only the records you want to have in your destination table
Do a double rename like the following
Assuming:
table is the table name of the table you want to purge a large amount of data from
newtable is the staging table name
no other tables are called temptable
rename table table to temptable, newtable to table;
drop temptable;
This will be done in a single transaction, which will require an instantaneous schema lock. Most high concurrency applications won't notice the change.
Alternatively, if you don't have the space, and you have a long window to purge this data, you can use dynamic sql to insert the primary keys into a temp table, and join the temp table in a delete statement. When you insert into the temp table, be aware of what max_packet_size is. Most installations of MySQL use 16MB (16777216 bytes). Your insert command for the temp table should be under max_packet_size. This will not lock the table. You'll want to run optimize table to reclaim space for the rest of the engine to use. You probably won't be able to reclaim disk space, unless you were to shutdown the engine and move the data files.

Shutdown your resource,
SELECT .. INTO OUTFILE, parse output, delete table, LOAD DATA LOCAL INFILE optimized_db.txt - more cheaper to re-create, than to UPDATE.

Related

Table with 50 million data and adding index takes too much time

I was working on table which has near about 50 million data(2GB-size). I had requirement to optimize the performance. So when I add index on column through phpmyadmin panel, table got lock and result in holding up all queries in queue on that table and ultimately results in restart/kill all queries. (And yeah, I forgot to mention I was doing this on production. My bad!)
When I did some research I found out some solution like creating duplicate table but any alternative method ?
You may follow this steps,
Create a temp table
Creates triggers on the first table (for
inserts, updates, deletes) so that they are replicated to the temp
table
In small batches, migrate data When done, rename table to new
table, and drop the other table
But as you said you are doing it in production then you need to consider live traffic while dropping a table and creating another one

What's Mysql fastest way to delete a database content?

mysql fastest way to delete a database content on freebsd? please help
i've tried to delete from navicat (400,000+ lines),
but in a hour... just 100,000 was deleted
I haven't phpmyadmin
To delete everything in a table:
TRUNCATE TABLE table_you_want_to_nuke
To delete certain rows, you have two options:
Follow these steps:
Create a temporary table using CREATE TABLE the_temp_table LIKE current_table
Drop all of the indexes on the temp table.
Copy the records you want to keep with INSERT INTO the_temp_table SELECT * FROM current_table WHERE ...
TRUNCATE TABLE current_table
INSERT INTO current_table SELECT * FROM the_temp_table
To speed this option up, you may want to drop all indexes from current_table before the final INSERT INTO, then recreate them after the INSERT. MySQL is much faster at indexing existing data than it is at indexing on the fly.
The option you're currently trying: DELETE FROM your_table WHERE whatever_condition. You probably need to break this into chunks using the WHERE condition or LIMIT, so you can batch it and not bog down the server forever.
Which is better/faster depends on lots of things, mostly the ratio of deleted records to retained records and the number of indexes involved. As always, test this carefully before doing it on a live database, as both DELETE and TRUNCATE will permanently destroy data.

Running optimize on table copy?

I have an InnoDB table in MySQL which used to contain about 600k rows. After deleting 400k+ rows, my guess is that I need to run an OPTIMIZE.
However, since the table will be locked during this operation, the site will not be usable at that time. So, my question is: should I run the optimize on the live database table (with a little under 200k rows)? Or is it possible to create a copy of that table, run the OPTIMIZE on that copy and after that rename both tables so the copy the data back to the live table?
If you create a copy, then it should be optimised already if you do CREATE TABLE..AS SELECT... No need to run it separately
However, I'd consider copy the 200k rows to keep into a new table, then renaming the tables.
This way is less steps and less work all round.
CREATE TABLE MyTableCopy AS
SELECT *
FROM myTable
WHERE (insert Keep condition here);
RENAME TABLE
myTable TO myTable_DeleteMelater,
MyTableCopy TO myTable;

MySQL/ASP - Delete Duplicate Rows

MySQL/ASP - Delete Duplicate Rows
I have a table with 100,000 rows called 'photoSearch'. When transferring the data from other tables (that took bloody ages and I was bloody tired), I accidentally forgot to remove the test transfer I did, which left 3500 rows in the table before I transferred everything over in one go.
The ID column is 'photoID' (INT) and I need to remove all duplicates that have a photoID of less than 6849. If I could just remove the duplicates, it would be less painful than to delete the table and start another transfer.
Has anybody got any suggestions on the most practical and safest way to do this?
UPDATE:
I actually answered my own question. I backed up my table for safety, and then I ran this:
ALTER IGNORE TABLE photoSearch ADD UNIQUE INDEX unique_id_index (photoID);
This removed all 3500 duplicates in under a minute :)
Traditional method
Backup your existing table photoSearch to something like tmp_photoSearch using a
create table tmp_photoSearch select * from photoSearch;
After that, you can perform data massage into table tmp_photoSearch.
Once you have gotten the results as expected,
perform a swap table
rename table photoSearch to photoSearch_backup, tmp_photoSearch to photoSearch;
To increase insert speed (if the bottle-neck is not on network transfer),
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable

What is the best way to periodically load data into table

I have a database with static tables which require to be updated from CSV weekly.
Tables are Mysql MyISAM and by static i mean they are used for read only (except when updated from CVS, obviously).
There're about 50 tables and in total about 200mb of data to be reloaded weekly.
I can think about 3 ways:
Truncate table
Load data from files
Or
For each table create a temporary table
Load data there
Truncate (or delete rows?) original table
Insert into original table select * from temporary table.
Or
Create table_new and load data there
Rename original table to table_old (or drop table altogether)
Rename table_new into original table
What do you reckon is the most efficient way?
Have you considered using mysqlimport? You can read about it here:
http://dev.mysql.com/doc/refman/5.1/en/mysqlimport.html
I probably wouldn't do anything with deleting the original tables, because then you have to re-create all your foreign keys, indexes, constraints, etc. which is a mess and a maintenance nightmare. Renaming tables can also cause problems (like if you have synonyms for the tables, I'm not sure if mysql has synonyms though).
What I would do, however, is disable the keys before loading the data.
ALTER TABLE tbl_name DISABLE KEYS
In other words, when loading the data you don't want it to be trying to update indexes because that will slow down the load. You want the indexes updated once the load is completed.
So I think by combining mysqlimport with the tip above, you should be able to get a really efficient load.
You could always do INSERT INTO ... ON DUPLICATE KEY UPDATE ... or REPLACE INTO .... You shouldn't get any down time (between a TRUNCATE and INSERT), and there's very little chance of corruption.
Be careful with REPLACE, since it will actually delete each record and re-insert it, firing any triggers you may have (unlikely in this case), but also giving you a new ID if you have an auto-increment field.
Your third option is the best, you can LOCK and DISABLE KEYS on the _new table while importing, and it'll be extra quick. You can even do a "batch atomic rename" of all your new tables to the "current ones", with zero downtime if they have relations between them.
I'm assuming the whole tables are contained in the weekly cvs updates (i.e. they're not incremental).
I would prefer the 3rd method and also keep the old table.
create table_new
drop table_old if exists
rename table to table_old
rename table_new to table
The advantage of this method is that it fast and safe with less effect on the readers. The creation of new table does not affect reads on existing table. The rename operation is faster (just a file rename in case of myisam) so the downtime is not that much. So the clients will not be affected by this that much. You also got to keep the old data in case something is wrong with the new data.
As you are not going to update it online I think it will be good if you do myisampack.