I have a MySQL 8 RDS (innodb) instance that I am trying to insert / update to.
The target table contains approx 120m rows and I am trying to insert 2.5m to the table from a csv. Some of the data in the source table may already exist in the target table which is constrained by a primary key, in which case update.
Having done some research I have found that the quickest way seems to be to do a bulk load from the source table into a temporary table, then a
insert into target_table
select col1, col2 from source table a
on duplicate key update col1 = a.col1, col2 = a.col2
However this seems to be taking hours.
Is there a best practice to optimise inserts of this sort?
Would it be quicker to separate the inserts into inserts an updates separately? Can I disable indexes in the target table (I know this is possible for myisam)?
Thanks
Related
What is the best practice to update 2 million rows data in MySQL?
As update by primary id, I have to update 1 by 1. It is so slow.
Like below
UPDATE table SET col1=value1 WHERE id=1;
UPDATE table SET col1=value2 WHERE id=2;
UPDATE table SET col1=value3 WHERE id=3;
UPDATE table SET col1=valueN WHERE id=N;
Assuming table is innodb (look at SHOW CREATE TABLE table output) one of the reasons its slow is an insufficient innodb_buffer_pool_size.
I also assume its in the default auto_commit=1 mode meaning each SQL line is a single transaction.
Its generally best to avoid processes that need to change every row in a table.
You could split this into a number of threads doing updates and that should get though the list quicker. If the table is MyISAM it won't scale this way.
Good way to update many rows in one query with INSERT statement with condition ON DUPLICATE KEY UPDATE. The statement will update the old row if there is a duplicate value in a UNIQUE index or PRIMARY KEY. See documentation.
INSERT INTO table (id, col1, col2, col3)
VALUES (%s, %s, %s, %s)
ON DUPLICATE KEY UPDATE
col1 = VALUES(col1),
col2 = VALUES(col2),
col3 = VALUES(col3);
To update really big amount of date like 2 millions rows try to separate you data to several queries by several thousand rows.
Also notice optimization tips: Optimizing INSERT Statements and Bulk Data Loading for InnoDB Tables
I need to rename MySQL table and create a new MySQL table at the same time.
There is critical live table with large number of records. master_table is always inserted records from scripts.
Need to backup the master table and create a another master table with same name at the same time.
General SQL is is like this.
RENAME TABLE master_table TO backup_table;
Create table master_table (id,value) values ('1','5000');
Is there a possibility to record missing data during the execution of above queries?
Any way to avoid missing record? Lock the master table, etc...
What I do is the following. It results in no downtime, no data loss, and nearly instantaneous execution.
CREATE TABLE mytable_new LIKE mytable;
...possibly update the AUTO_INCREMENT of the new table...
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;
By renaming both tables in one statement, they are swapped atomically. There is no chance for any data to be written "in between" while there is no table to receive the write. If you don't do this atomically, some writes may fail.
RENAME TABLE is virtually instantaneous, no matter how large the table. You don't have to wait for data to be copied.
If the table has an auto-increment primary key, I like to make sure the new table starts with an id value greater than the current id in the old table. Do this before swapping the table names.
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA='mydatabase' AND TABLE_NAME='mytable';
I like to add some comfortable margin to that value. You want to make sure that the id values inserted to the old table won't exceed the value you queried from INFORMATION_SCHEMA.
Change the new table to use this new value for its next auto-increment:
ALTER TABLE mytable_new AUTO_INCREMENT=<increased value>;
Then promptly execute the RENAME TABLE to swap them. As soon as new rows are inserted to the new, empty table, it will use id values starting with the increased auto-increment value, which should still be greater than the last id inserted into the old table, if you did these steps promptly.
Instead of renaming the master_backup table and recreating it, you could
just create a backup_table with the data from the master_table for the first backup run.
CREATE TABLE backup_table AS
SELECT * FROM master_table;
If you must add a primary key to the backup table then run this just once, that is for the first backup:
ALTER TABLE backup_table ADD CONSTRAINT pk_backup_table PRIMARY KEY(id);
For future backups do:
INSERT INTO backup_table
SELECT * FROM master_table;
Then you can delete all the data in the backup_table found in the master_table like:
DELETE FROM master_table A JOIN
backup_table B ON A.id=B.id;
Then you can add data to the master_table with this query:
INSERT INTO master_table (`value`) VALUES ('5000'); -- I assume the id field is auto_incrementable
I think this should work perfectly even without locking the master table, and with no missing executions.
I'm trying to FULLTEXT index into my table. That table content 3 million records.It was very difficult to insert that index using Alter table statement or Create index statement. Therefor easiest way to create new table and 1st add index and load the data. How can I load existing table data into newly created table? I'm using Xammp MySql database.
I don't know why creating a full text index on an existing table would be difficult. You just do:
create fulltext index idx_table_col on table(col)
Usually, it is faster to add indexes to already loaded tables than to load data into an empty table that has indexes pre-defined.
EDIT:
You can do the load by using insert. The following will insert the first 100,000 rows:
insert into newtable
select *
from oldtable
order by id
limit 0, 100000;
You can put this in a loop (via a stored procedure in MySQL or at the application level). Perhaps this will return faster. Each time you run it, you would change the offset value in limit.
I would expect that the overall time for creating an index would be less than using insert, but for your purposes, you might find this more convenient.
INSERT INTO newTable SELECT * FROM oldTable;
After your new table and index on it is created.
This is given you want to copy all columns. You can select specific columns as well.
I searched Internet and Stack Overflow for my trouble, but couldn't find a good solution.
I have a table (MySql MyISAM) containing 300,000 rows (one column is blob field).
I must use:
DELETE FROM tablename WHERE id IN (1,4,7,88,568,.......)
There are nearly 30,000 id's in the IN syntax.
It takes nearly 1 hour. Also It does not make the .MYD file smaller although I delete 10% of it, so I run OPTIMIZE TABLE... command. It also lasts long...(I should use it, because disk space matters for me).
What's a way to improve performance when deleting the data as above and recover space? (Increasing buffer size? which one? or else?)
With IN, MySQL will scan all the rows in the table and match the record against the IN clause. The list of IN predicates will be sorted, and all 300,000 rows in the database will get a binary search against 30,000 ids.
If you do this with JOIN on a temporary table (no indexes on a temp table), assuming id is indexed, the database will do 30,000 binary lookups on a 300,000 record index.
So, 300,000 binary searches against 30,000 records, or 30,000 binary searches against 300,000 records... which is faster? The second one is faster, by far.
Also, delaying the index rebuilding with DELETE QUICK will result in much faster deletes. All records will simply be marked deleted, both in the data file and in the index, and the index will not be rebuilt.
Then, to recover space and rebuild the indexes at a later time, run OPTIMIZE TABLE.
The size of the list in your IN() statement may be the cause. You could add the IDs to a temporary table and join to do the deletes. Also, as you are using MyISAM you can use the DELETE QUICK option to avoid the index hit whilst deleting:
For MyISAM tables, if you use the QUICK keyword, the storage engine
does not merge index leaves during delete, which may speed up some
kinds of delete operations.
I think the best approach to make it faster is to create a new table and insert into it the rows which you dont want to delete and then drop the original table and then you can copy the content from the table to the main table.
Something like this:
INSERT INTO NewTable SELECT * FROM My_Table WHERE ... ;
Then you can use RENAME TABLE to rename the copy to the original name
RENAME TABLE My_Table TO My_Table_old, NewTable TO My_Table ;
And then finally drop the original table
DROP TABLE My_Table_old;
try this
create a table name temptable with a single column id
insert into table 1,4,7,88,568,......
use delete join something like
DELETE ab, b FROM originaltable AS a INNER JOIN temptable AS b ON a.id= b.id where b.id is null;
its just an idea . the query is not tested . you can check the syntax on google.
I am deleting rows in order of hundreds of thousands from a remote DB. Each delete has it's own target eg.
DELETE FROM tablename
WHERE (col1=c1val1 AND col2=c2val1) OR (col1=c1val2 AND col2=c2val2) OR ...
This has been almost twice as fast for me than individual queries, but I was wondering if there's a way to speed this up more, as I haven't been working with SQL very long.
Create a temporary table and fill it with all your value pairs, one per row. Name the columns the same as the matching columns in your table.
CREATE TEMPORARY TABLE donotwant (
col1 INT NOT NULL,
col2 INT NOT NULL,
PRIMARY KEY (c1val, c2val)
);
INSERT INTO donotwant VALUES (c1val1, c2val1), (c1val2, c2val2), ...
Then execute a multi-table delete based on the JOIN between these tables:
DELETE t1 FROM `tablename` AS t1 JOIN `donotwant` USING (col1, col2);
The USING clause is shorthand for ON t1.col1=donotwant.col1 AND t1.col2=donotwant.col2, assuming the columns are named the same in both tables, and you want the join condition where both columns are equal to their namesake in the joined table.
Generally speaking, the fastest way to do bulk DELETEs is to put the ids to be deleted into a temp table of some sort, then use that as part of the query:
DELETE FROM table
WHERE (col1, col2) IN (SELECT col1, col2
FROM tmp)
Inserting can be done via a standard:
INSERT INTO tmp VALUES (...), (...), ...;
statement, or by using the DB's bulk-load utility.
I doubt it makes much difference to performance but you can write that kind of thing this way...
DELETE
FROM table
WHERE (col1,col2) IN(('c1val1','c2val1'),('c1val2','c2val2')...);