I currently have a MySQL table of about 20 million rows, and I need to prune it. I'd like to remove every row whose updateTime (timestamp of insertion) was more than one month ago. I have not personally performed any alterations of the table's order, so the data should be in the order in which it was inserted, and there is a UNIQUE key on two fields, id and updateTime. How would I go about doing this in a short amount of time?
How much down time can you incur? How big are the rows? How many are you deleting?
Simply put, deleting rows is one of the most expensive things you can do to a table. It's just a horrible thing overall.
If you don't have to do it, and you have the disk space for it, and your queries aren't affected by the table size (well indexed queries typically ignore table size), then you may just leave well enough alone.
If you have the opportunity and can take the table offline (and you're removing a good percentage of the table), then your best bet would be to copy the rows you want to keep to a new table, drop the old one, rename the new one to the old name, and THEN recreate your indexes.
Otherwise, you're pretty much stuck with good 'ol delete.
There are two ways to remove a large number of rows. First there is the obvious way:
DELETE FROM table1 WHERE updateTime < NOW() - interval 1 month;
The second (slightly more complicated) way is to create a new table and copy the data that you want to keep, truncate your old table, then copy the rows back.
CREATE TABLE table2 AS
SELECT * FROM table1 WHERE updateTime >= NOW() - interval 1 month;
TRUNCATE table1;
INSERT INTO table1
SELECT * FROM table2;
Using TRUNCATE is much faster than a DELETE with a WHERE clause when you have a large number of rows to delete and a relatively small number that you wish to keep.
Spliting the deletes with limit might speed up the process;
I had to delete 10M rows and i issued the command. It never responded for hours.
I killed the query ( which took couple of hours)
then Split the deletes.
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
Then i duplicated this statement in a file and used the command.
mysql> source /tmp/delete.sql
This was much faster.
You can also try to use tools like pt-tools. and pt-archiver.
Actually even if you can't take the table offline for long, you can still use the 'rename table' technique to get rid of old data.
Stop processes writting to table.
rename table tableName to tmpTableName;
create table tableName like tmpTableName;
set #currentId=(select max(id) from tmpTableName);
set #currentId=#currentId+1;
set #indexQuery = CONCAT("alter table test auto_increment = ", #currentId);
prepare stmt from #indexQuery;
execute stmt;
deallocate prepare stmt;
Start processes writting to table.
insert into tableName
select * from tmpTableName;
drop table;
New inserts to tableName will begin at the correct index; The old data will be inserted in correct indexes.
Related
I have a database table which is around 700GB with 1 Billion rows, the data is approximately 500 GB and index is 200GB,
I am trying to delete all the data before 2021,
Roughly around 298,970,576 rows in 2021 and there are 708,337,583 rows remaining.
To delete this I am running a non-stop query in my python shell
DELETE FROM table_name WHERE id < 1762163840 LIMIT 1000000;
id -> 1762163840 represent data from 2021. Deleting 1Mil row taking almost 1200-1800sec.
Is there any way I can speed up this because the current way is running for more than 15 days and there is not much data delete so far and it's going to do more days.
I thought that if I make a table with just ids of all the records that I want to delete and then do an exact map like
DELETE FROM table_name WHERE id IN (SELECT id FROM _tmp_table_name);
Will that be fast? Is it going to be faster than first making a new table with all the records and then deleting it?
The database is setup on RDS and instance class is db.r3.large 2 vCPU and 15.25 GB RAM, only 4-5 connections running.
I would suggest recreating the data you want to keep -- if you have enough space:
create table keep_data as
select *
from table_name
where id >= 1762163840;
Then you can truncate the table and re-insert new data:
truncate table table_name;
insert into table_name
select *
from keep_data;
This will recreate the index.
The downside is that this will still take a while to re-insert the data (renaming keep_data would be faster). But it should be much faster than deleting the rows.
AND . . . this will give you the opportunity to partition the table so future deletes can be handled much faster. You should look into table partitioning if you have such a large table.
Multiple techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
It points out that LIMIT 1000000 is unnecessarily big and causes more locking than might be desirable.
In the long run, PARTITIONing would be beneficial, it mentions that.
If you do Gordon's technique (rebuilding table with what you need), you lose access to the table for a long time; I provide an alternative that has essentially zero downtime.
id IN (SELECT...) can be terribly slow -- both because of the inefficiency of in-SELECT and due to the fact that DELETE will hang on to a huge number of rows for transactional integrity.
Hi i want to delete all record of a table which have 10 millions record but its hang and give me follow error:
Lock wait timeout exceeded; try restarting transaction
I am using the following query:
delete from table where name = '' order by id limit 1000
in for loop.
Please suggest me how to optimize it.
You said i want to delete all record of a table which have 10 millions record. Then why not use TRUNCATE command instead which will have minimal/no overhead of logging.
TRUNCATE TABLE tbl_name
You can as well use DELETE statement but in your case the condition checking (where name = '' order by id limit 1000) is not necessary since you wanted to get rid of all rows but DELETE has overhead of logging in transaction log which may matter for record volume of millions.
Per your comment, you have no other option rather than going by delete from table1 where name = 'naresh'. You can delete in chunks using the LIMIT operator like delete from table1 where name = 'naresh' limit 1000. So if name='naresh' matches 25000 rows, it will be deleting only 1000 rows out of them.
You can include the same in a loop as well like below (Not tested, minor tweak might require)
DECLARE v1 INT;
SELECT count(*) INTO v1 FROM table1 WHERE name = 'naresh';
WHILE v1 > 0 DO
DELETE FROM table1 WHERE name = 'naresh' LIMIT 1000;
SET v1 = v1 - 1000;
END WHILE;
So in the above code, loop will run for 25 times deleting 1000 rows each time (assuming name='naresh' condition returns 25K rows).
If you want to delete all records(empty table),
You can use
TRUNCATE TABLE `table_name_here`...
May be it will work for you...
(not tried with big database)
UPDATE myTable SET niceColumn=1 WHERE someVal=1;
SELECT * FROM myTable WHERE someVal=1;
Is there a way to combine these two queries into one? I mean can I run an update query and it shows the rows it updates. Because here I use "where id=1" filtering twice, I don't want this. Also I think if someVal changes before select query I will have troubles about what I get (ex: update updates it and after that someVal becomes 0 because of other script).
Wrap the two queries in a transaction with the desired ISOLATION LEVEL so that no other threads can't affect the locked rows between the update and the select.
Actually, even what you have done will not show the rows it updated, because meanwhile (after the update) some process may add/change rows.
And this will show all the records, including the ones updated yesterday :)
If I want to see exactly which rows were changed, I would go with temp table. First select into a temp table all the row IDs to be updated. Then perform the update based on the raw IDs in the temp table, and then return the temp table.
CREATE TEMPORARY TABLE to_be_updated
SELECT id
FROM myTable
WHERE someVal = 1;
UPDATE myTable
SET niceColumn = 1
WHERE id IN (SELECT * FROM to_be_updated);
SELECT *
FROM myTable
WHERE id IN (SELECT * FROM to_be_updated)
If in your real code the conditional part (where and so on) is too long to repeat, just put it in a variable that you use in both queries.
Unless you encounter a different problem, you shouldn't need these two combined.
I am attempting to clean out a table but not get rid of the actual structure of the table. I have an id column that is auto-incrementing; I don't need to keep the ID number, but I do need it to keep its auto-incrementing characteristic. I've found delete and truncate but I'm worried one of these will completely drop the entire table rendering future insert commands useless.
How do I remove all of the records from the table so that I can insert new data?
drop table will remove the entire table with data
delete * from table will remove the data, leaving the autoincrement values alone. it also takes a while if there's a lot of data in the table.
truncate table will remove the data, reset the autoincrement values (but leave them as autoincrement columns, so it'll just start at 1 and go up from there again), and is very quick.
TRUNCATE will reset your auto-increment seed (on InnoDB tables, at least), although you could note its value before truncating and re-set accordingly afterwards using alter table:
ALTER TABLE t2 AUTO_INCREMENT = value
Drop will do just that....drop the table in question, unless the table is a parent to another table.
Delete will remove all the data that meets the condition; if no condition is specified, it'll remove all the data in the table.
Truncate is similar to delete; however, it resets the auto_increment counter back to 1 (or the initial starting value). However, it's better to use truncate over delete because delete removes the data by each row, thus having a performance hit than truncate. However, truncate will not work on InnoDB tables where referential integrity is enforced unless it is turned off before the truncate command is issued.
So, relax; unless you issue a drop command on the table, it won't be dropped.
Truncate table is what you are looking for
http://www.1keydata.com/sql/sqltruncate.html
Another possibility involves creating an empty copy of the table, setting the AUTO_INCREMENT (with some eventual leeway for insertions during the non-atomic operation) and then rotating both :
CREATE TABLE t2_new LIKE t2;
SELECT #newautoinc:=auto_increment /*+[leeway]*/
FROM information_schema.tables
WHERE table_name='t2';
SET #query = CONCAT("ALTER TABLE t2_new AUTO_INCREMENT = ", #newautoinc);
PREPARE stmt FROM #query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
RENAME TABLE t2 TO t2_old, t2_new TO t2;
And then, you have the extra advantage of being still able to change your mind before removing the old table.
If you reconsider your decision, you can still bring back old records from the table before the operation:
INSERT /*IGNORE*/ INTO t2 SELECT * FROM t2_old /*WHERE [condition]*/;
When you're good you can drop the old table:
DROP TABLE t2_old;
I've just come across a situation where DELETE is drastically affecting SELECT performance compared to TRUNCATE on a full-text InnoDB query.
If I DELETE all rows and then repopulate the table (1million rows), a typical query takes 1s to come back.
If instead I TRUNCATE the table, and repopulate it in exactly the same way, a typical query takes 0.05s to come back.
YMMV, but for whatever reason for me on MariaDB 10.3.15-MariaDB-log DELETE seems to be ruining my index.
Here is what im trying to do explained in a query
DELETE FROM table ORDER BY dateRegistered DESC LIMIT 1000 *
I want to run such query in a script which i have already designed. Every time it finds older records that are 1001th record or above it deletes
So kinda of setting Max Row size but deleting all the older records.
Actually is there a way to set that up in the CREATE statement.
Therefore: If i have 9023 rows in the database, when i run that query it should delete 8023 rows and leave me with 1000
If you have a unique ID for rows here is the theoretically correct way, but it is not very efficient (not even if you have an index on the dateRegistered column):
DELETE FROM table
WHERE id NOT IN (
SELECT id FROM table
ORDER BY dateRegistered DESC
LIMIT 1000
)
I think you would be better off by limiting the DELETE directly by date instead of number of rows.
I don't think there is a way to set that up in the CREATE TABLE statement, at least not a portable one.
The only way that immediately occurs to me for this exact job is to do it manually.
First, get a lock on the table. You don't want the row count changing while you're doing this. (If a lock is not practical for your app, you'll have to work out a more clever queuing system rather than using this method.)
Next, get current row count:
SELECT count(*) FROM table
Once you have that, you should with simple maths be able to figure out how many rows need deleting. Let's say it said 1005 - you need to delete 5 rows.
DELETE FROM table ORDER BY dateRegistered ASC LIMIT 5
Now, unlock the table.
If a lock isn't practical for your scenario, you'll have to be a bit more clever - for example, select the unique ID of all the rows that need deleting, and queue them for gradual deletion. I'll let you work that out yourself :)