How to insert 13 million records with selected columns of a table to another existing table? - mysql

I've two tables one is the main table having data and I want to insert data from another existing table having about 13 million records. I'm using the query to insert from another table i.e.
insert into table1 ( column1, col2 ...) select col1, col2... from table2;
But, unfortunately the query fails as lock wait timeout comes Error 1205.
What is the best way to do it in least time without timeout.

If you have a primary key on table2, then you can use that for ordering and inserting in batches:
insert into table1 ( column1, col2 ...)
select col1, col2...
from table2
order by <primary key>
limit 0, 100000
Then repeat this for additional values. (Of course, the 100,000 is arbitrary. A larger value might work. A smaller value might be necessary.)
Another possibility is to remove all indexes and insert triggers from table1, try the insert without them, and then add them back after the new data is in the table.

Related

What is the best practice to update 2 million rows data in MySQL?

What is the best practice to update 2 million rows data in MySQL?
As update by primary id, I have to update 1 by 1. It is so slow.
Like below
UPDATE table SET col1=value1 WHERE id=1;
UPDATE table SET col1=value2 WHERE id=2;
UPDATE table SET col1=value3 WHERE id=3;
UPDATE table SET col1=valueN WHERE id=N;
Assuming table is innodb (look at SHOW CREATE TABLE table output) one of the reasons its slow is an insufficient innodb_buffer_pool_size.
I also assume its in the default auto_commit=1 mode meaning each SQL line is a single transaction.
Its generally best to avoid processes that need to change every row in a table.
You could split this into a number of threads doing updates and that should get though the list quicker. If the table is MyISAM it won't scale this way.
Good way to update many rows in one query with INSERT statement with condition ON DUPLICATE KEY UPDATE. The statement will update the old row if there is a duplicate value in a UNIQUE index or PRIMARY KEY. See documentation.
INSERT INTO table (id, col1, col2, col3)
VALUES (%s, %s, %s, %s)
ON DUPLICATE KEY UPDATE
col1 = VALUES(col1),
col2 = VALUES(col2),
col3 = VALUES(col3);
To update really big amount of date like 2 millions rows try to separate you data to several queries by several thousand rows.
Also notice optimization tips: Optimizing INSERT Statements and Bulk Data Loading for InnoDB Tables

Removing duplicate rows within a trigger in SQL

So I've made table1 and table2 and a trigger such that when there's an insert on table1, data gets inserted into table2 from table1. I then go another step with another trigger where after insert on table2, it inserts into another table, table3 with data from table2. The trigger in place is 'FOR EACH ROW', so unfortunately, when a second insert happens on table1, it goes into table2, and table3 reads in the new, second row, AND the first row again.
Ideally to prevent this from happening, or to reduce the impact, it would make sense to remove duplicates at the end or at the start of a respective trigger so it's not exponentially filling up tables with duplicate rows. However, I've not been able to find a way to do it thus far within a trigger. Is it even possible? Any help? The tables also do not have Primary or Foreign Keys. Thanks in advance.
An example of what I've tried so far:
DELETE FROM table2 WHERE rowid NOT IN (SELECT MIN(rowid) FROM table2 GROUP BY col1, col2, col3, ...);
Though I think this is for SQLite? As I've seen this working for SQLite databases, whereas here I just get an error saying it can't recognise the column rowid.
I also tried WHERE NOT EXISTS during insert, which works for not inserting duplicates in the first place, however I need an update as part of the trigger which changes some column values, so it won't work in this case as the rows will always be different from their initial insert.

Renumbering auto-increment values in rows in MySQL

I have a table with an auto-incrementing primary key. Because I allow my data to be deleted, there are now numerous gaps in the primary keys. For example, if I have row 1, 2 and 3, and 2 gets deleted, I only have rows 1 and 3 left (meaning the number 2 is vacant and empty).
This has proven to give me problems now that I am attempted to conduct a full data migration. So I'm wondering if it's possible at all to simply re-index everything - i.e. run some kind of MySQL UPDATE query such that I update all the rows' primary keys into a smooth, running order.
How do I go about doing this?
I think the easiest way is to create an empty clone of the table and copy all rows from your original one. That way all AUTO_INCREMENT fields are re-assigned.
1) Copy the table:
CREATE TABLE clone_table LIKE original_table;
2) Copy all rows, manually specifying all fields except your AUTO_INCREMENT one (id?) using the INSERT INTO ... SELECT ... syntax:
INSERT INTO clone_table (field1, field2, fieldN)
SELECT field1, field2, fieldN FROM original_table ORDER BY id;
3) Optionally, you can now delete old table and rename the clone:
DROP TABLE original_table;
RENAME TABLE clone_table TO original_table;

Dropping duplicate MySQL rows based on column data

I have a table called sg with the following columns:
player_uuid, player_name, coins, kills, deaths, and wins
However, I ran into an issue that caused some duplicate rows and some of those rows been modified. So, I am wondering how to drop the rows with older data. That said...
How do I drop the duplicate rows where player_uuid is the same? But I only want to drop the rows where coins, kills, deaths, or wins is smaller than it's duplicate.
Example data: http://i.stack.imgur.com/Xieod.png
In this case, I want to keep the row with 46 deaths and delete the row with 43 deaths.
Failing to come up with a single delete statement due to the way the data is structured: 3 Delete statements instead:
The way it works is: Find if there are multiple rows for a given UUID, and determined which row is to be kept (Max value of the given column), then join back on itself and determine which rows are not to be kept, store in temporary table and delete all that is marked in that temporary table from the main data table (called someTable). The benefit of this approach is: If you have more then 1 duplicate (3,4,5 rows till infinity), they will also be deleted.
CREATE TEMPORARY TABLE tempTable AS
SELECT a.player_uuid, a.kills, b.keepRow
FROM someTable a
LEFT JOIN (SELECT MAX(kills) AS kills, player_uuid, 1 AS keepRow
FROM sometable
GROUP BY player_uuid
HAVING COUNT(*)>1
) b ON a.player_uuid=b.player_uuid AND a.kills=b.kills
WHERE b.keepRow!=1;
DELETE a.* FROM someTable a, tempTable b
WHERE a.player_uuid=b.player_uuid AND a.kills=b.kills;
Repeat for the other columns (wins,coins,deaths) by replacing all kills with the other column names.
Always test delete code first :)
Also: While you are at it:
At a unique index to prevent this from happening again:
CREATE UNIQUE INDEX idx_st_nn_1 ON someTable(player_uuid);
When you then try to insert a faulty record, your code will just get an error in return. The best code to handle inserts in that case would be:
INSERT INTO someTable(player_uuid,kills) VALUES ('someplayer',1000)
ON DUPLICATE KEY UPDATE kills=1000;
What also helps is having some time indicator column: Then only one delete would have to be executed:
ALTER TABLE someTable ADD COLUMN (last_updated TIMESTAMP);
Timestamps update them selves, so no code changes required to use this.

Delete many rows in MySQL

I am deleting rows in order of hundreds of thousands from a remote DB. Each delete has it's own target eg.
DELETE FROM tablename
WHERE (col1=c1val1 AND col2=c2val1) OR (col1=c1val2 AND col2=c2val2) OR ...
This has been almost twice as fast for me than individual queries, but I was wondering if there's a way to speed this up more, as I haven't been working with SQL very long.
Create a temporary table and fill it with all your value pairs, one per row. Name the columns the same as the matching columns in your table.
CREATE TEMPORARY TABLE donotwant (
col1 INT NOT NULL,
col2 INT NOT NULL,
PRIMARY KEY (c1val, c2val)
);
INSERT INTO donotwant VALUES (c1val1, c2val1), (c1val2, c2val2), ...
Then execute a multi-table delete based on the JOIN between these tables:
DELETE t1 FROM `tablename` AS t1 JOIN `donotwant` USING (col1, col2);
The USING clause is shorthand for ON t1.col1=donotwant.col1 AND t1.col2=donotwant.col2, assuming the columns are named the same in both tables, and you want the join condition where both columns are equal to their namesake in the joined table.
Generally speaking, the fastest way to do bulk DELETEs is to put the ids to be deleted into a temp table of some sort, then use that as part of the query:
DELETE FROM table
WHERE (col1, col2) IN (SELECT col1, col2
FROM tmp)
Inserting can be done via a standard:
INSERT INTO tmp VALUES (...), (...), ...;
statement, or by using the DB's bulk-load utility.
I doubt it makes much difference to performance but you can write that kind of thing this way...
DELETE
FROM table
WHERE (col1,col2) IN(('c1val1','c2val1'),('c1val2','c2val2')...);