I have a question when I try to remove over 100,000 rows from a mysql table the server freezes and non of its websites can be accessed anymore!
I waited 2 hours and then restarted the server and restored the account.
I used following query:
DELETE FROM `pligg_links` WHERE `link_id` > 10000
-
SELECT* FROM `pligg_links` WHERE `link_id` > 10000
works perfectly
Is there a better way to do this?
You could delete the rows in smaller sets. A quick script that deletes 1000 rows at a time should see you through.
"Delete from" can be very expensive for large data sets.
I recommend using partitioning.
This may be done slightly differently in PostgreSQL and MySQL, but in PostgreSQL you can create many tables that are "partitions" of the larger table or on a partition. Queries and whatnot can be run on the larger table. This can greatly increase the speed with which you can query given you partition correctly. Also, you can delete a partition by simply dropping it. This happens very very quickly because it is somewhat equivalent to dropping a table.
Documentation for table partitioning can be found here:
http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
Make sure you have an index on link_id column.
And try to delete with chunks like 10.000 in a time.
Deleting from table is very costy operation.
Related
We have a large MySQL 5.5 database in which many rows are inserted daily and never deleted or updated. There are also users querying the live database. Tables are MyISAM.
But it is effectively impossible to run ANALYZE TABLES because it takes way too long. And so the query optimizer will often pick the wrong index. (15 hours, and sometimes crashes the tables.)
We want to try switching to all InnoDB. Will we need to run ANALYZE TABLES or not?
The MySQL docs say:
The cardinality (the number of different key values) in every index of a table
is calculated when a table is opened, at SHOW TABLE STATUS and ANALYZE TABLE and
on other circumstances (like when the table has changed too much).
But that begs the question: when is a table opened? If that means accessed during a connection then we need do nothing special. But I do not think that that is the case for InnoDB.
So what is the best approach? Run ANALYZE TABLE periodically? Perhaps with an increased dive count?
Or will it all happen automatically?
The query users use apps to get the data, so each run is a separate connection. They generally do NOT expect the rows to be up-to-date within just minutes.
I got a mysql database with approx. 1 TB of data. Table fuelinjection_stroke has apprx. 1.000.000.000 rows. DBID is the primary key that is automatically incremented by one with each insert.
I am trying to delete the first 1.000.000 rows using a very simple statement:
Delete from fuelinjection_stroke where DBID < 1000000;
This query is takeing very long (>24h) on my dedicated 8core Xeon Server (32 GB Memory, SAS Storage).
Any idea whether the process can be sped up?
I believe that you table becomes locked. I've faced same problem and find out that can delete 10k records pretty fast. So you might want to write simple script/program which will delete records by chunks.
DELETE FROM fuelinjection_stroke WHERE DBID < 1000000 LIMIT 10000;
And keep executing it until it deletes everything
Are you space deprived? Is down time impossible?
If not, you could fit in a new INT column length 1 and default it to 1 for "active" (or whatever your terminology is) and 0 for "inactive". Actually, you could use 0 through 9 as 10 different states if necessary.
Adding this new column will take a looooooooong time, but once it's over, your UPDATEs should be lightning fast as long as you do it off the PRIMARY (as you do with your DELETE) and you don't index this new column.
The reason why InnoDB takes so long to DELETE on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY (or first UNIQUE it finds...or whatever it feels like if it can't find PRIMARY or UNIQUE), so when you pull out one row, it now reorders your ENTIRE table physically on the disk for speed and defragmentation. So it's not the DELETE that's taking so long. It's the physical reordering after that row is removed.
When you create a new INT column with a default value, the space will be filled, so when you UPDATE it, there's no need for physical reordering across your huge table.
I'm not sure exactly what your schema is exactly, but using a column for a row's state is much faster than DELETEing; however, it will take more space.
Try setting values:
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT (for non-windows machine)
innodb_buffer_pool_size=25GB (currently it is close to 21GB)
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=0...1000 (try different values, beginning with 200)
References:
MySQL docs for description of different variables.
MySQL Server Setting Tuning
MySQL Performance Optimization basics
http://bugs.mysql.com/bug.php?id=28382
What indexes do you have?
I think your issue is that the delete is rebuilding the index on every iteration.
I'd delete the indexes if any, do the delete, then re-add the indexes. It'll be far faster, (I think).
I was having the same problem, and my table has several indices that I didn't want to have to drop and recreate. So I did the following:
create table keepers
select * from origTable where {clause to retrieve rows to preserve};
truncate table origTable;
insert into origTable null,keepers.col2,...keepers.col(last) from keepers;
drop table keepers;
About 2.2 million rows were processed in about 3 minutes.
Your database may be checking for records that need to be modified in a foreign key (cascades, delete).
But I-Conica answer is a good point(+1). The process of deleting a single record and updating a lot of indexes during done 100000 times is inefficient. Just drop the index, delete all records and create it again.
And of course, check if there is any kind of lock in the database. One user or application can lock a record or table and your query will be waiting until the user release the resource or it reachs a timeout. One way to check if your database is doing real work or just waiting is lauch the query from a connection that sets the --innodb_lock_wait_timeout parameter to a few seconds. If it fails at least you know that the query is OK and that you need to find and realease that lock. Examples of locks are Select * from XXX For update and uncommited transactions.
For such long tables, I'd rather use MYISAM, specially if there is not a lot of transactions needed.
I don't know exact ans for ur que. But writing another way to delete those rows, pls try this.
delete from fuelinjection_stroke where DBID in
(
select top 1000000 DBID from fuelinjection_stroke
order by DBID asc
)
I'm deleting rows using a cron tab set to run every hour. For performance and less fragmentation, what is the best way to do this?
Also, should I run optimize table after the delete has finished?
The answer will depend on your data and how many rows you're deleting at a time.
If possible, delete the rows with a single query (rather than one query per row). For example:
DELETE FROM my_table WHERE status="rejected"
If possible, use an indexed column in your WHERE clause. This will help it select the rows that need to be deleted without doing a full table scan.
If you want to delete all the data, use TRUNCATE TABLE.
If deleting the data with a single query is causing performance problems, you could try limiting how many rows it deletes (by adding a LIMIT clause) and running the delete process more frequently. This would spread the deletes out over time.
Per the documentation, OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns).
Optimizing the table can be very expensive. If you can, try deleting your data and optimizing the table once per day (at night). This will limit any impact to your users.
I am deleting approximately 1/3 of the records in a table using the query:
DELETE FROM `abc` LIMIT 10680000;
The query appears in the processlist with the state "updating". There are 30m records in total. The table has 5 columns and two indexes, and when dumped to SQL the file about 9GB.
This is the only database and table in MySQL.
This is running on a machine with 2GB of memory, a 3 GHz quad-core processor and a fast SAS disk. MySQL is not performing any reads or writes other than this DELETE operation. No other "heavy" processes are running on the machine.
This query has been running for more than 2 hours -- how long can I expect it to take?
Thanks for the help! I'm pretty new to MySQL, so any tidbits about what's happening "under the hood" while running this query are definitely appreciated.
Let me know if I can provide any other information that would be pertinent.
Update: I just ran a COUNT(*), and in 2 hours, it's only deleted 200k records. I think I'm going to take Joe Enos' advice and see how well inserting the data into a new table and dropping the previous table performs.
Update 2: Sorry, I actually misread the number. In 2 hours, it's not deleted anything. I'm confused. Any suggestions?
Update 3: I ended up using mysqldump with --where "true LIMIT 10680000,31622302" and then importing the data into a new table. I then deleted the old table and renamed the new one. This took just over half an hour.
Don't know if this would be any better, but it might be worth thinking about doing the following:
Create a new table and insert 2/3 of the original table into the new one.
Drop the original table.
Rename the new table to the original table's name.
This would prevent the log file from having all the deletes, but I don't know if inserting 20m records is faster than deleting 10m.
You should post the table definition.
Also, to know why is it taking to much time, try to enable the profile mode on the delete request via :
SET profiling=1;
DELETE FROM abc LIMIT 10680000;
SET profiling=0;
SHOW PROFILES;
SHOW PROFILE ALL FOR QUERY X; (X is the ID of your query shown in SHOW PROFILES)
and post what it returns (But I think the query must end to return the profiling data)
http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html
Also, I think you'll get more responses on ServerFault ;)
When you run this query, the InnoDB log file for the database is used to record all the details of the rows that are deleted - and if this log file isn't large enough from the outset it'll be auto-extended as and when necessary (if configured to do so) - I'm not familiar with the specifics but I expect this auto-extension is not blindingly fast. 2 hours does seem like a long time - but doesn't surprise me if the log file is growing as the query is running.
Is the table from which the records are being deleted on the end of a foreign key (i.e. does another table reference it through a FK constraint)?
I hope your query ended by now ... :) but from what I've seen, LIMIT with large numbers (and I never tried this kind of numbers) is very slow. I would try something based on the pk like
DELETE FROM abc WHERE abc_pk < 10680000;
I have a process that imports a lot of data (950k rows) using inserts that insert 500 rows at a time. The process generally takes about 12 hours, which isn't too bad. Normally doing a query on the table is pretty quick (under 1 second) as I've put (what I think to be) the proper indexes in place. The problem I'm having is trying to run a query when the import process is running. It is making the query take almost 2 minutes! What can I do to make these two things not compete for resources (or whatever)? I've looked into "insert delayed" but not sure I want to change the table to MyISAM.
Thanks for any help!
Have you tried using priority hints?
SELECT HIGH_PRIORITY ... and INSERT LOW_PRIORITY ...
12 hours to insert 950k rows is pretty heavy duty. How big are these rows? What kind of indexes are on them? Even if the actual data insertion goes quickly, the continual updating of the indexes will definitely cause performance degradation for anything using those table(s) at the time.
Are you doing these imports with the bulk INSERT syntax (insert into tab (x) values (a), (b), (c), etc...) or one INSERT per row? Doing the bulk insert will require a longer index updating period (as it has to generate index data for 500 rows) than doing it for a single row. There will be no doubt be some sort of internal lock placed on the indexes while the data's updated, in which case you're contending with 950k/500 = 1,900 locking sessions at minimum.
I found that on some of my bulk-insert scripts (an http log analyzer for some custom data mining), it was quicker to DISABLE indexes on the relevant tables, then reenabling/rebuilding them after the data dump was completed. If I remember right, it was about 37 minutes to insert 200,000 rows of hit data with keys enabled, and about 3 minutes with no indexing.
So I finally found the slowdown while searching during the import of my data. I had one query like this:
SELECT * FROM `properties` WHERE (state like 'Florida%') and (county like 'Hillsborough%') ORDER BY created_at desc LIMIT 0, 50
and when I ran an EXPLAIN on it, I found out it was scanning around 215,000 rows (even with proper indexes on state and county in place). I then ran an EXPLAIN on the following query:
SELECT * FROM `properties` WHERE (state = 'Florida') and (county = 'Hillsborough') ORDER BY created_at desc LIMIT 0, 50
and saw that it only had to scan 500 rows. Considering that the actual result set was something like 350, I think I identified the slowdown.
I've made the switch to not using "like" in my queries and am very happy with the snappier results.
Thanks to everyone for your help and suggestions. They are much appreciated!
You can try import your data to some auxiliary table and then merge it into the main table. You don't lose performance in your main table, and I think your db can manage the merge much faster than the multiple insertions.