I'm deleting rows using a cron tab set to run every hour. For performance and less fragmentation, what is the best way to do this?
Also, should I run optimize table after the delete has finished?
The answer will depend on your data and how many rows you're deleting at a time.
If possible, delete the rows with a single query (rather than one query per row). For example:
DELETE FROM my_table WHERE status="rejected"
If possible, use an indexed column in your WHERE clause. This will help it select the rows that need to be deleted without doing a full table scan.
If you want to delete all the data, use TRUNCATE TABLE.
If deleting the data with a single query is causing performance problems, you could try limiting how many rows it deletes (by adding a LIMIT clause) and running the delete process more frequently. This would spread the deletes out over time.
Per the documentation, OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns).
Optimizing the table can be very expensive. If you can, try deleting your data and optimizing the table once per day (at night). This will limit any impact to your users.
Related
If I have a table in SQL database and put index on every column of it, what would happen to CRUD?
I think CREATE statement will definitely be slower and READ will be faster. But I don't know what about UPDATE and DELETE.
On one side, since there are WHERE clauses in UPDATE and DELETE statements, I guess that part will be faster. But since these 2 operations will also modify other columns, I guess that part will be slower. Then which part will count more and what's the final impact on UPDATE and DELETE?
DELETE will definitely be slower because every deleted row will require deleting the row from each index. Of course, that is offset by any increase in speed based on the WHERE clause.
UPDATE might be slower or faster. Filtering might be faster, depending on the WHERE clause. On the other hand, every column bing modified would need index updates.
I have a table in which i need to add a column, The table has millions of record. For existing record i have to update the column value(which will be different for each record). Running individual update query will take lots of time. Is there a way where this can be achieved with minimum amount of locking time for the table.
in your case there can be two ways,
if your additional column is a derived column (values can be derived from existing columns) then simple update query will be enough
if its not derived one then write all the values in file (probably scripts) and import that file as new column in your table. (or export existing table and modify that file with new column values and then import)
in above two ways, import is faster with less amount of locking.
hope that helps :)
When the hash value cannot be generated in the database, the only thing you can do is run individual updates. You may get some performance improvements by batching several updates into one transaction.
I have a table called research_words which has some hundred million rows.
Every day I have tens of million of new rows to be added, about 5% of them are totally new rows, and 95% are updates which have to add to some columns in that row. I don't know which is which so I use:
INSERT INTO research_words
(word1,word2,origyear,cat,numbooks,numpages,numwords)
VALUES
(34272,268706,1914,1,1,1,1)
ON DUPLICATE KEY UPDATE
numbooks=numbooks+1,numpages=numpages+1,numwords=numwords+1
This is an InnoDB table where the primary key is over word1,word2,origyear,cat.
The issue I'm having is that I have to insert the new rows each day and it's taking longer than 24 hours to insert each days rows! Obviously I can't have it taking longer than a day to insert the rows for the day. I have to find a way to make the inserts faster.
For other tables I've had great success with ALTER TABLE ... DISABLE KEYS; and LOAD DATA INFILE, which allows me to add billions of rows in less than an hour. That would be great, except that unfortunately I am incrementing to columns in this table. I doubt disabling the keys would help either because surely it will need them to check whether the row exists in order to add it.
My scripts are in PHP but when I add the rows I do so by an exec call directly to MySQL and pass it a text file of commands, instead of sending them with PHP, since it's faster this way.
Any ideas to fix the speed issue here?
Old question, but perhaps worth an answer all the same.
Part of the issue stems from the large number of inserts being run essentially one at a time, with a unique index update after each one.
In these instances, a better technique might be to select n rows to insert and put them in a temp table, left join them to the destination table, calculate their new values (in OP's situation IFNULL(dest.numpages+1,1) etc.) and then run two further commands - an insert where the insert fields are 1 and an update where they're greater. The updates don't require an index refresh, so they run much faster; the inserts don't require the same ON DUPLICATE KEY logic.
I have a question when I try to remove over 100,000 rows from a mysql table the server freezes and non of its websites can be accessed anymore!
I waited 2 hours and then restarted the server and restored the account.
I used following query:
DELETE FROM `pligg_links` WHERE `link_id` > 10000
-
SELECT* FROM `pligg_links` WHERE `link_id` > 10000
works perfectly
Is there a better way to do this?
You could delete the rows in smaller sets. A quick script that deletes 1000 rows at a time should see you through.
"Delete from" can be very expensive for large data sets.
I recommend using partitioning.
This may be done slightly differently in PostgreSQL and MySQL, but in PostgreSQL you can create many tables that are "partitions" of the larger table or on a partition. Queries and whatnot can be run on the larger table. This can greatly increase the speed with which you can query given you partition correctly. Also, you can delete a partition by simply dropping it. This happens very very quickly because it is somewhat equivalent to dropping a table.
Documentation for table partitioning can be found here:
http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
Make sure you have an index on link_id column.
And try to delete with chunks like 10.000 in a time.
Deleting from table is very costy operation.
Actually i queried optimize table query for one table. then i didn't do any operation on that table. then again i'm querying optimize table query at the end of every month. but the data in the table may be changed once in four or 8 months. is it create any problem in performance of the mysql query?
If you don't do DML operations on the table, OPTIMIZE TABLE is useless.
OPTIMIZE TABLE cleans the table of deleted records, sorts the index pages (brings the physical order of the pages in consistence to logical one) and recalculates the statistics.
For the duration of the command, the table is unavailable both for reading and writing, and the command may take long for large tables.
Did your read the manual about OPTIMIZE? And do you have a problem you want to solve using OPTIMIZE? If not, don't use this statement at all.
If the data doesn't quite change over a period of 4-8 months it should not create any issue with performance for the end of month report.
However if the count of rows that are changed in the 4-8 months period is huge then you would want to rebuild indexes/analyze the tables so that the queries run fine after the load.