MySQL performance DELETE or UPDATE? - mysql

I have a MyISAM table with more than 10^7 rows. When adding data to it, I have to update ~10 rows at the end. Is it faster to delete them and then insert the new ones, or is it faster to update those rows? Data that should be updated is not part of the index. What about index/data fragmentation

UPDATE is by far much faster.
When you UPDATE, the table records are just being rewritten with new data.
When you DELETE, the indexes should be updated (remember, you delete the whole row, not only the columns you need to modify) and datablocks may be moved (if you hit the PCTFREE limit)
And all this must be done again on INSERT.
That's why you should always use
INSERT ... ON DUPLICATE KEY UPDATE
instead of REPLACE.
The former one is an UPDATE operation in case of a key violation, while the latter one is DELETE / INSERT.

It is faster to update. You can also use INSERT ON DUPLICATE KEY UPDATE
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
For more details read update documentation

Rather than deleting or updating data for the sake of performance, I would consider partitioning.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html
This will allow you to retain the data historically and not degrade performance.

Logically DELETE+ADD = 2 actions, UPDATE = 1 action. Also deleting and adding new changes records IDs on auto_increment, so if those records have relationships that would be broken, or would need updates too. I'd go for UPDATE.

using an update where Column='something' should use an index as long as the search criteria is in the index (whether it's a seek or scan is a completely different issue).
if you are doing these updates a lot but dont' have an index on the criteria column, i would recommend creating an index on the column that you are using. that should help speed things up.

Related

MYSQL: SELECT or DELETE, which is better in terms of performance to avoid duplication

I have a table with millions of records. I have to make sure records are unique. I wonder whether a SELECT.. where is better or DELETE..where?
Question Update: I want to keep only UNIQUE RECORDS.
Further update
I am running threads and for some unknown reasons they are inserting dups in tabke despite of checking... most probably due to simultaneous running of SELECT. So asking that if checking of record existence is costlier than simply deleting rows matching records?
Use INSERT IGNORE to avoid errors about duplicate keys when accidentally inserting the same record twice.
Note: That only checks any UNIQUE keys (including the `PRIMARY KEY, if specified.)
But... That points out that you _do not have a UNIQUE key to prevent duplicates in the first place!
So, your next question will be how to add a UNIQUE key when there are already duplicates. Correct.
Trying to discover the duplicates and delete them is complex and tedious, but possibly faster.
This is straightforward:
CREATE TABLE new LIKE real;
ALTER TABLE new ADD UNIQUE ... -- some UNIQUE KEY to avoid duplicates
# stop writes to `real` -- application-specific
INSERT IGNORE INTO new SELECT * FROM real;
RENAME TABLE real TO old,
new TO real;
# allow writes again.
DROP TABLE old;

Performance of VALUES(col_name) function in the UPDATE clause

The question is about SQL legacy code for MySQL database.
It is known, that when doing INSERT ... ON DUPLICATE KEY UPDATE statement VALUES(col_name) function can be used to refer to column values from the INSERT portion instead of passing there exact values:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c)
My legacy code contains a lot of huge inserts in parametrized style (they are used in batch-inserts):
INSERT INTO table (a,b,c, <...dozens of params...>) VALUES (?,?,?,<...dozens of values...>)
ON DUPLICATE KEY UPDATE b=?, c=?, <...dozens of params...>
The question is: would it increase performance of batch-inserts if I will change all these queries to use VALUES(col_name) function (in UPDATE portion)?
My queries are executed from java code using jdbc driver. So, what I guess, is that for long text values it should significantly reduce size of queries. What about MySQL it self? Would it really in general give me increasing of speed?
Batched inserts can may run 10 times as fast and one row at a time. The reason for this is all the network, etc, overhead.
Another technique is to change from a single batched IODKU into two statements -- one to insert the new rows, one to do the updates. (I don't know if that will run any faster.) Here is a discussion of the two steps, in the context of "normalization".
Another thing to note: If there is an AUTO_INCREMENT involved (not as one of the columns mentioned), then IODKU may "burn" ids for the cases where it does an 'update'. That is, the IODKU (and INSERT IGNORE and a few others) get all the auto_incs that it might need, then proceeds to use the ones it does need and waste the others.
You get into "diminishing returns" if you try to insert more than a few hundred rows in a batch. And you stress the rollback log.

Millions of MySQL Insert On Duplicate Key Update - very slow

I have a table called research_words which has some hundred million rows.
Every day I have tens of million of new rows to be added, about 5% of them are totally new rows, and 95% are updates which have to add to some columns in that row. I don't know which is which so I use:
INSERT INTO research_words
(word1,word2,origyear,cat,numbooks,numpages,numwords)
VALUES
(34272,268706,1914,1,1,1,1)
ON DUPLICATE KEY UPDATE
numbooks=numbooks+1,numpages=numpages+1,numwords=numwords+1
This is an InnoDB table where the primary key is over word1,word2,origyear,cat.
The issue I'm having is that I have to insert the new rows each day and it's taking longer than 24 hours to insert each days rows! Obviously I can't have it taking longer than a day to insert the rows for the day. I have to find a way to make the inserts faster.
For other tables I've had great success with ALTER TABLE ... DISABLE KEYS; and LOAD DATA INFILE, which allows me to add billions of rows in less than an hour. That would be great, except that unfortunately I am incrementing to columns in this table. I doubt disabling the keys would help either because surely it will need them to check whether the row exists in order to add it.
My scripts are in PHP but when I add the rows I do so by an exec call directly to MySQL and pass it a text file of commands, instead of sending them with PHP, since it's faster this way.
Any ideas to fix the speed issue here?
Old question, but perhaps worth an answer all the same.
Part of the issue stems from the large number of inserts being run essentially one at a time, with a unique index update after each one.
In these instances, a better technique might be to select n rows to insert and put them in a temp table, left join them to the destination table, calculate their new values (in OP's situation IFNULL(dest.numpages+1,1) etc.) and then run two further commands - an insert where the insert fields are 1 and an update where they're greater. The updates don't require an index refresh, so they run much faster; the inserts don't require the same ON DUPLICATE KEY logic.

Insert Into one duplicate update xx - Does it also update the INDEXes of the table?

I have a table with 30M+ rows, and each index update is expensive.
I sometimes have to update and/or add 5000+ rows in a single insert.
Sometimes all rows are new, sometimes some are new.
I cannot use update - since I don't know which is already in the table, so I use INSERT .. ON DUPLICATE KEY UPDATE for a single column.
This sometimes take a lot of time >5 sec.
Is there a better way to do it? maybe I did not explain myself clear enough :)
Are you issuing 5000+ separate insert statements? If so, lock the table while doing the inserts; it'll go a lot faster.
I added BEGIN TRANSACTION and COMMIT to perform the insert, and it enhanced performance by x4 to x10.

What's the difference between delete from table_a and truncate table table_a in MySQL?

What's the difference between DELETE FROM table_a and TRUNCATE TABLE table_a in MySQL?
Truncate is much faster
Truncate resets autoincrements
Truncate is not transaction safe - it will autocommit
Delete doesn't have to remove all rows
Truncate Documentation
Delete Documentation
Delete allows you to use a WHERE clause so only certain rows are deleted. Truncate will remove all rows as well as resetting any auto_increment columns you may have.
Truncate:
Works by deallocating all the data pages in the table.
Will delete all data - you cannot restrict it with a WHERE clause.
Deletions are not logged.
Triggers are not fired.
Cannot be used if any foreign keys reference the table.
Resets auto id counters.
Faster.
Delete:
Works by deleting row by row.
Can be restricted with a WHERE clause.
Deletions are logged in the transaction log (if you have logging on obviously) so the delete can be recovered if necessary (depending on your logging settings).
Triggers are fired.
Can be used on tables with foreign keys (dependant on your key cascade settings).
Slower.
Truncate resets the id count, if it auto increments
Not sure about MySQL (which is why my answer is wiki) but I can tell you one difference in at least one other DBMS. The delete from command honors transactions (allowing rollback) and triggers, making it a much slower way of clearing out a table. Truncate, on the other hand, just annihilates the rows without the possibility of rollback and without running any triggers on the deleted rows.
If there are no foreign keys, truncate table drops & recreates the table, which is much faster than deleting individual rows.
A truncate also resets any auto increment counters.
ON DELETE triggers do not fire when a table is truncated.
More information can be found in the MySQL online documentation.
Data deleted using delete is stored in temporary table hence you can use transaction with delete command whereas truncate command directly deletes the data hence you cannot recover using transaction(rollback).
This is why truncate is faster than delete.
Delete allows to use where clause whereas Truncate doesn't.
Delete: delete rows and space allocated by mysqldata can be roll back againyou can use it with WHERE clauseTruncate:It is similar to delete. But the difference is you can't roll back data again and you cann't use WHERE clause with it.