I am deleting rows in order of hundreds of thousands from a remote DB. Each delete has it's own target eg.
DELETE FROM tablename
WHERE (col1=c1val1 AND col2=c2val1) OR (col1=c1val2 AND col2=c2val2) OR ...
This has been almost twice as fast for me than individual queries, but I was wondering if there's a way to speed this up more, as I haven't been working with SQL very long.
Create a temporary table and fill it with all your value pairs, one per row. Name the columns the same as the matching columns in your table.
CREATE TEMPORARY TABLE donotwant (
col1 INT NOT NULL,
col2 INT NOT NULL,
PRIMARY KEY (c1val, c2val)
);
INSERT INTO donotwant VALUES (c1val1, c2val1), (c1val2, c2val2), ...
Then execute a multi-table delete based on the JOIN between these tables:
DELETE t1 FROM `tablename` AS t1 JOIN `donotwant` USING (col1, col2);
The USING clause is shorthand for ON t1.col1=donotwant.col1 AND t1.col2=donotwant.col2, assuming the columns are named the same in both tables, and you want the join condition where both columns are equal to their namesake in the joined table.
Generally speaking, the fastest way to do bulk DELETEs is to put the ids to be deleted into a temp table of some sort, then use that as part of the query:
DELETE FROM table
WHERE (col1, col2) IN (SELECT col1, col2
FROM tmp)
Inserting can be done via a standard:
INSERT INTO tmp VALUES (...), (...), ...;
statement, or by using the DB's bulk-load utility.
I doubt it makes much difference to performance but you can write that kind of thing this way...
DELETE
FROM table
WHERE (col1,col2) IN(('c1val1','c2val1'),('c1val2','c2val2')...);
Related
I have two tables, say, table_1 and table_2, they have the same columns, for example. they both have col1, col2, ..., col100.
Now, I want to replace all the content in table_1 with table_2. Note that we want to keep table_1 as it is queried by an external user.
which is the best practice, replace, drop, update, or append? and how to implement it using MySQL?
You can use INSERT INTO SELECT
Step:1 - TRUNCATE TABLE table_name
Step:2
INSERT INTO table_name(column names)
SELECT
column names
FROM
another_table
WHERE
condition;
What is the best practice to update 2 million rows data in MySQL?
As update by primary id, I have to update 1 by 1. It is so slow.
Like below
UPDATE table SET col1=value1 WHERE id=1;
UPDATE table SET col1=value2 WHERE id=2;
UPDATE table SET col1=value3 WHERE id=3;
UPDATE table SET col1=valueN WHERE id=N;
Assuming table is innodb (look at SHOW CREATE TABLE table output) one of the reasons its slow is an insufficient innodb_buffer_pool_size.
I also assume its in the default auto_commit=1 mode meaning each SQL line is a single transaction.
Its generally best to avoid processes that need to change every row in a table.
You could split this into a number of threads doing updates and that should get though the list quicker. If the table is MyISAM it won't scale this way.
Good way to update many rows in one query with INSERT statement with condition ON DUPLICATE KEY UPDATE. The statement will update the old row if there is a duplicate value in a UNIQUE index or PRIMARY KEY. See documentation.
INSERT INTO table (id, col1, col2, col3)
VALUES (%s, %s, %s, %s)
ON DUPLICATE KEY UPDATE
col1 = VALUES(col1),
col2 = VALUES(col2),
col3 = VALUES(col3);
To update really big amount of date like 2 millions rows try to separate you data to several queries by several thousand rows.
Also notice optimization tips: Optimizing INSERT Statements and Bulk Data Loading for InnoDB Tables
I have a MySQL 8 RDS (innodb) instance that I am trying to insert / update to.
The target table contains approx 120m rows and I am trying to insert 2.5m to the table from a csv. Some of the data in the source table may already exist in the target table which is constrained by a primary key, in which case update.
Having done some research I have found that the quickest way seems to be to do a bulk load from the source table into a temporary table, then a
insert into target_table
select col1, col2 from source table a
on duplicate key update col1 = a.col1, col2 = a.col2
However this seems to be taking hours.
Is there a best practice to optimise inserts of this sort?
Would it be quicker to separate the inserts into inserts an updates separately? Can I disable indexes in the target table (I know this is possible for myisam)?
Thanks
I've two tables one is the main table having data and I want to insert data from another existing table having about 13 million records. I'm using the query to insert from another table i.e.
insert into table1 ( column1, col2 ...) select col1, col2... from table2;
But, unfortunately the query fails as lock wait timeout comes Error 1205.
What is the best way to do it in least time without timeout.
If you have a primary key on table2, then you can use that for ordering and inserting in batches:
insert into table1 ( column1, col2 ...)
select col1, col2...
from table2
order by <primary key>
limit 0, 100000
Then repeat this for additional values. (Of course, the 100,000 is arbitrary. A larger value might work. A smaller value might be necessary.)
Another possibility is to remove all indexes and insert triggers from table1, try the insert without them, and then add them back after the new data is in the table.
Should I include col3 & col4 in my index on MyTable if this is the only query I intend to run on my database?
Select MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;
The tables I'm using have about half a million rows in them. For the purposes of my question, col1 & col2 are a unique set found in both tables.
Here's the example table definition if you really need to know:
CREATE TABLE MyTable
(col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10));
CREATE TABLE MyOtherTable
(col1 varchar(10), col2 varchar(10));
So, should it be this?
CREATE MyIdx ON MyTable (col1,col2);
Or this?
CREATE MyIdx ON MyTable (col1,col2,col3,col4);
adding columns col3 and col4 will not help because you're just pulling those values after finding them using columns col1 and col2. The speed would normally come from making sure columns col1 and col2 are indexed.
You should actually split those indexes since you're not using them together:
CREATE MyIdx ON MyTable (col1);
CREATE MyIdx ON MyTable (col2);
I don't think a combined index will help you in this case.
CORRECTION: I think I've misspoken, since you intend to use only that query on the two tables and never have the individual columns joined in isolation. In your case it appears you could get some speed up by putting them together. It would be interesting to benchmark this to see just how much of a speedup you'd see on 1/2 million rows using a combined index versus individual ones. (You should still not use columns col3 and col4 in the index, since you're not joining anything by them.)
A query returning half a million rows joined from two tables is never going to be very fast - because it's returning half a million rows.
An index on col1,col2 seems sufficient (as a secondary index), but depending on what other columns you have, adding (col3,col4) might make it a covering index.
In InnoDB it might be to make the primary key (col1,col2), then it will cluster it, which is something of a win.
But once again, if your query joins 500,000 rows with no other WHERE clause, and returns 500,000 rows, it's not going to be fast, becuase it needs to fetch all of the rows to return them.
I don't think anyone else mentioned it, so I'm adding that you should have a compound (col1,col2) index on both tables:
CREATE MyIdx ON MyTable (col1,col2);
CREATE MyOtherIdx ON MyOtherTable (col1,col2);
And another point. An index on (col1,col2,col3,col4) will be helpful if you ever need to use a DISTINCT variation of your query:
Select DISTINCT
MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;