Millions of MySQL Insert On Duplicate Key Update - very slow - mysql

I have a table called research_words which has some hundred million rows.
Every day I have tens of million of new rows to be added, about 5% of them are totally new rows, and 95% are updates which have to add to some columns in that row. I don't know which is which so I use:
INSERT INTO research_words
(word1,word2,origyear,cat,numbooks,numpages,numwords)
VALUES
(34272,268706,1914,1,1,1,1)
ON DUPLICATE KEY UPDATE
numbooks=numbooks+1,numpages=numpages+1,numwords=numwords+1
This is an InnoDB table where the primary key is over word1,word2,origyear,cat.
The issue I'm having is that I have to insert the new rows each day and it's taking longer than 24 hours to insert each days rows! Obviously I can't have it taking longer than a day to insert the rows for the day. I have to find a way to make the inserts faster.
For other tables I've had great success with ALTER TABLE ... DISABLE KEYS; and LOAD DATA INFILE, which allows me to add billions of rows in less than an hour. That would be great, except that unfortunately I am incrementing to columns in this table. I doubt disabling the keys would help either because surely it will need them to check whether the row exists in order to add it.
My scripts are in PHP but when I add the rows I do so by an exec call directly to MySQL and pass it a text file of commands, instead of sending them with PHP, since it's faster this way.
Any ideas to fix the speed issue here?

Old question, but perhaps worth an answer all the same.
Part of the issue stems from the large number of inserts being run essentially one at a time, with a unique index update after each one.
In these instances, a better technique might be to select n rows to insert and put them in a temp table, left join them to the destination table, calculate their new values (in OP's situation IFNULL(dest.numpages+1,1) etc.) and then run two further commands - an insert where the insert fields are 1 and an update where they're greater. The updates don't require an index refresh, so they run much faster; the inserts don't require the same ON DUPLICATE KEY logic.

Related

mysql too many indexe keys

I working on database optimization where there is a bulk insert from .csv file (around 3800 records) at an interval of every 15 minutes.
For this, I'm running a mis.sql file through cron. This file contains Nine (09) mysql queries that performs duplicate removal from the table where bulk insert is targeted, Inner join Inserts, Deletes and Updates (ALTER, DELETE, INSERT & UPDATE).
Recently, a problem is being experienced with a query that runs just prior to the Bulk Insert query. The query is -
ALTER IGNORE TABLE pb ADD UNIQUE INDEX(hn, time);
ERROR 1069 (42000): Too many keys specified; max 64 keys allowed
On encountering above error, all the subsequent queries are being skipped. Then I checked table pb and found that there are 64 Unique Index Keys created with same cardinal value along with 02 Index Keys and 01 Primary Key.
While trying to remove one of the Unique Indexes, it's taking too much of time (almost 15 mins for 979,618 records), but at the end, it's not being removed.
Is there any solution to this problem?
The first thing: Why is there an ALTER TABLE command at all? New data should change the data not the database design. So while INSERT, UPDATE and DELETE are valid options in such a script ALTER TABLE doesn't belong there. Remove it.
As to deleting the index: That should only take a fraction of a second. There is nothing to build or rebuild, simply to remove.
DROP INDEX index_name ON tbl_name;
The only reason for this taking so long I can think of is that there isn't even a short time slice when no inserts, updates and deletes take place. So maybe you'll have to stop your job for a moment (or run it on an empty file), drop all those unnecessary indexes (only keep one), and start your job again.

how many records can be deleted using a single transaction in mysql innodb

I wanted to delete old records from 10 related tables every 6 months using primary keys and foreignkeys. I am planning to do it in a single transaction block, because in case of any failure I have to rollback the changes. My queries will be somethign like this
DELETE FROM PARENT_TABLE WHERE PARENT_ID IN (1, 2, 3,etc);
DELETE FROM CHILD_TABLE1 WHERE PARENT_ID IN (1, 2, 3,etc);
The records to delete will be around 1million. Is it safe to delete all these in a single transaction? how will be the performanace?
Edit
To be more clear on my question. I will detail my execution plan
I am first retreiving primary keys of all the records from the parent table which has to be deleted and store it to a temporary table
START TRANSACITON
DELETE FROM CHILD_ONE WHERE PARENT_ID IN (SELECT * FROM TEMP_ID_TABLE);
DELETE FROM CHILD_TWO WHERE PARENT_ID IN (SELECT * FROM TEMP_ID_TABLE);
DELETE FROM PARENT_TABLE WHERE PARENT_ID IN (SELECT * FROM TEMP_ID_TABLE);
COMMIT;
ROLLBACK on any failure.
Given that I can have around a million records to delete from all these tables, is it safe to put everything inside a single transaction block?
You can probably succeed. But it is not wise. Something random (eg, a network glitch) could come along to cause that huge transaction to abort. You might be blocking other activity for a long time. Etc.
Are the "old" records everything older than date X? If so, it would much more efficient to make use of PARTITIONing for DROPping old rows. We can discuss the details. Oops, you have FOREIGN KEYs, which are incompatible with PARTITIONing. Do all the tables have FKs?
Why do you wait 6 months before doing the delete? 6K rows a day would would have the same effect and be much less invasive and risky.
IN ( SELECT ... )
has terrible performance, use a JOIN instead.
If some of the tables are just normalizations, why bother deleting from them?
Would it work to delete 100 ids per transaction? That would be much safer and less invasive.
First of all: Create a proper backup AND test it before you start to delete the records
The number of record you asked for is mostly depends on the configuration (hardware) of your database server. You have to test it out, how many records could be deleted on that specific server without problems. Start with e.g. 1000 records then increase the amount in each iteration while it seems to be too slow. If you have replication, the setup and the slave's performance affects the row number too (too much write requests could cause serious delay in replication).
An advice: Remove all foreign keys and indexes (except the primary key and the indexes related to the where clauses you use to perform the action) if possible before you start the delete.
Edit:
If the count of records which will be deleted is larger than the count of records which will not, consider to just copy the records into a new table, then rename the old and new tables. For the first step, copy the structure of table using the CREATE TABLE .. LIKE statement, then drop all unnecessary indexes and constraints, copy the records, add the indexes, then rename the tables. (Copy the lastest new records from the original table into the copy if necessary), then you can drop the old table.
what i believe first you have to move the data in another database then
use single Transaction to delete all 10 table which is very safe to rollback immediately and delete the data from live data base when interaction of user is very less more info

MySQL Delete Performance

I'm deleting rows using a cron tab set to run every hour. For performance and less fragmentation, what is the best way to do this?
Also, should I run optimize table after the delete has finished?
The answer will depend on your data and how many rows you're deleting at a time.
If possible, delete the rows with a single query (rather than one query per row). For example:
DELETE FROM my_table WHERE status="rejected"
If possible, use an indexed column in your WHERE clause. This will help it select the rows that need to be deleted without doing a full table scan.
If you want to delete all the data, use TRUNCATE TABLE.
If deleting the data with a single query is causing performance problems, you could try limiting how many rows it deletes (by adding a LIMIT clause) and running the delete process more frequently. This would spread the deletes out over time.
Per the documentation, OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns).
Optimizing the table can be very expensive. If you can, try deleting your data and optimizing the table once per day (at night). This will limit any impact to your users.

MySQL/ASP - Delete Duplicate Rows

MySQL/ASP - Delete Duplicate Rows
I have a table with 100,000 rows called 'photoSearch'. When transferring the data from other tables (that took bloody ages and I was bloody tired), I accidentally forgot to remove the test transfer I did, which left 3500 rows in the table before I transferred everything over in one go.
The ID column is 'photoID' (INT) and I need to remove all duplicates that have a photoID of less than 6849. If I could just remove the duplicates, it would be less painful than to delete the table and start another transfer.
Has anybody got any suggestions on the most practical and safest way to do this?
UPDATE:
I actually answered my own question. I backed up my table for safety, and then I ran this:
ALTER IGNORE TABLE photoSearch ADD UNIQUE INDEX unique_id_index (photoID);
This removed all 3500 duplicates in under a minute :)
Traditional method
Backup your existing table photoSearch to something like tmp_photoSearch using a
create table tmp_photoSearch select * from photoSearch;
After that, you can perform data massage into table tmp_photoSearch.
Once you have gotten the results as expected,
perform a swap table
rename table photoSearch to photoSearch_backup, tmp_photoSearch to photoSearch;
To increase insert speed (if the bottle-neck is not on network transfer),
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable

MySQL performance DELETE or UPDATE?

I have a MyISAM table with more than 10^7 rows. When adding data to it, I have to update ~10 rows at the end. Is it faster to delete them and then insert the new ones, or is it faster to update those rows? Data that should be updated is not part of the index. What about index/data fragmentation
UPDATE is by far much faster.
When you UPDATE, the table records are just being rewritten with new data.
When you DELETE, the indexes should be updated (remember, you delete the whole row, not only the columns you need to modify) and datablocks may be moved (if you hit the PCTFREE limit)
And all this must be done again on INSERT.
That's why you should always use
INSERT ... ON DUPLICATE KEY UPDATE
instead of REPLACE.
The former one is an UPDATE operation in case of a key violation, while the latter one is DELETE / INSERT.
It is faster to update. You can also use INSERT ON DUPLICATE KEY UPDATE
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
For more details read update documentation
Rather than deleting or updating data for the sake of performance, I would consider partitioning.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html
This will allow you to retain the data historically and not degrade performance.
Logically DELETE+ADD = 2 actions, UPDATE = 1 action. Also deleting and adding new changes records IDs on auto_increment, so if those records have relationships that would be broken, or would need updates too. I'd go for UPDATE.
using an update where Column='something' should use an index as long as the search criteria is in the index (whether it's a seek or scan is a completely different issue).
if you are doing these updates a lot but dont' have an index on the criteria column, i would recommend creating an index on the column that you are using. that should help speed things up.