Importing MySQL records with duplicate keys

Importing MySQL records with duplicate keys - mysql

I have two MySQL databases with identical table structure, each populated with several thousand records. I need to merge the two into a single database but I can't import one into the other because of duplicate IDs. It's a relational database with many linked tables (fields point to other table record IDs).
Edit: The goal is to have data from both databases in one final database, without overwriting data, and updating foreign keys to match with new record IDs.
I'm not sure how to go about merging the two databases. I could write a script I suppose, but there's many tables and it would take a while to do. I wondered if anyone else had encountered this problem, and the best way to go about it?

Just ignore the duplicates. The first time the key is inserted, it'll be inserted. The second time it will be ignored.
INSERT IGNORE INTO myTable (SELECT * FROM myOtherTable );
See the full INSERT syntax here.

The trick was to increment the IDs in one database by 1000 (or something won't overlap data in the target database), then import it.
Thanks for everyone's answers.

Are the duplicate IDs supposed to correspond to each other? You could create a new table with an auto increment field and save the existing keys as two columns.
That would just be a 'bulk copy' though. If there is some underlying relationship then that would dictate how to combine the data.

If you have two tables A1 and A2 and you want to merge this to AA you can do this:
INSERT INTO aa SELECT * FROM A1;
INSERT INTO aa SELECT * FROM A2 ON DUPLICATE KEY
UPDATE aa.nonkeyfield1 = a1.nonkeyfield1,
aa.nonkeyfield2 = a1.nonkeyfield2, ....;
This will overwrite fields with duplicate keys with A2 data.
A slightly slower method with simpler syntax is:
INSERT INTO aa SELECT * FROM A1;
REPLACE INTO aa SELECT * FROM A2;
This will do the same thing, but will not update duplicate rows, but instead delete the row from A1 first and then reinsert the data from A2.
If you want to merge a whole database with foreign keys, this will not work, because it will break the links between tables.
If you have a whole database and you do not want to overwrite data
I'd import the first database as normal into database A.
import the second database into a database B.
Set all foreign keys as on update cascade.
Double check this.
Now run the following statement on all tables on database B.
SELECT #increment:= MAX(pk) FROM A.table1;
UPDATE B.table1 SET pk = pk + #increment WHERE pk IS NOT NULL
ORDER BY pk DESC;
(The where clause is to stop MySQL from giving an error in strict mode)
If you write a script with those two lines per table in your database you can then insert all tables into database AA, remember to disable foreign key checks during the update with
SET foreign_key_checks = 0;
... do lots of inserts ...
SET foreign_key_checks = 1;
Good luck.

Create a new database table with an autoincrimented primary key as the first column. Then add the column names from your databases and import each one. Then just drop the old primary field, and rename the new one to match your primary name.

Related

How to remove duplicate entries from database which as over 10000 records with no id field

I have database like the following with 10K rows. How to delete duplicate if all fields are same. I don't want to search for any specific company. Is there a way to search and find any multiple entries with all same fields get deleted. Thanks

This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Note: This command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
OR
Simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table

This below query used to find the duplicate entry using all fields:
Select * from Table group by company_name,city,state,country having count(*)>1;

r - dbWriteTable or a MySQL Delete query?

I can't seem to find the answer to this anywhere. I am reading a csv into a data frame using the read.csv function. Then I am writing the data frame contents to a mysql table using dbWriteTable. This works great for the initial run to create the table, but I each run after this needs to do either an insert or an update depending on whether the record already exists in the table.
The 1st column in the data frame is the primary key, and the other records contain data that might change every time I pull a new copy of the csv. Each time I pull the CSV, if the primary key already exists, I want it to update that record with the new data, and if the primary key does not exist(eg: a new key since the last run), I want it to just insert the record into the table.
This is my current dbWriteTable. This creates the table just fine the 1st time it's run, and also inserts a "Timestamp" column into the table that is set to "on update CURRENT_TIMESTAMP" so that I know when each record was last updated.
dbWriteTable(mydb, value=csvData, name=Table, row.names=FALSE, field.types=list(PrimaryKey="VARCHAR(10)",Column2="VARCHAR(255)",Column3="VARCHAR(255)",Timestamp="TIMESTAMP"), append=TRUE)
Now the next time I run this, I simply want it to update any PrimaryKeys that are already in the table, and add any new ones. I also don't want to lose any records in the event a PrimaryKey disappears from the CSV source.
Is it possible to do this kind of update using dbWriteTable, or some other R function?
If that's not possible, is it possible to just run a mysql query that would delete any duplicate PrimaryKey records and keep just the 1 record with the most current timestamp? So I would run a dbWriteTable to append the new data, and then run a MySQL query to prune out the older records.
Obviously I couldn't define that 1st column as an actual PrimaryKey in the DB as my append/delete solution wouldn't work due to duplicate keys, and that's fine, I can always add an auto increment integer column to the table for the "real" primary key if needed.
Thoughts?

Consider using a temp table (an exact replica of final table but with less records) and then run an INSERT and UPDATE query into final table which will handle both cases without overlap (plus primary keys are constraints and queries will error out if attempts are made to duplicate any):
records to append if not exists - using the LEFT JOIN NULL query
records to update if does exist. - using the UPDATE INNER JOIN query
Concerning the former there is a regular debate among SQL coders if LEFT JOIN NULL or NOT IN or NOT EXISTS is the optimal solution which of course "depends". Left Join used here does avoid subqueries. But consider those avenues if needed.
# DELETE LAST SET OF TEMP DATA
dbSendQuery(mydb, "DELETE FROM tempTable")
# APPEND R DATA FRAME TO TEMP DATA
dbWriteTable(mydb, value=csvData, name=tempTable, row.names=FALSE,
field.types=list(PrimaryKey="VARCHAR(10)", Column2="VARCHAR(255)",
Column3="VARCHAR(255)", Timestamp="TIMESTAMP"),
append=TRUE, overwrite=FALSE)
# LEFT JOIN ... NULL QUERY TO APPEND NEW RECORDS NOT IN TABLE
dbSendQuery(mydb, "INSERT INTO finalTable (Column1, Column2, Column3, Timestamp)
SELECT Column1, Column2, Column3, Timestamp
FROM tempTable f
LEFT JOIN finalTable t
ON f.PrimaryKey = t.PrimaryKey
WHERE f.PrimaryKey IS NULL;")
# UPDATE INNER JOIN QUERY TO UPDATE MATCHING RECORDS
dbSendQuery(mydb, "UPDATE finalTable f
INNER JOIN tempTable t
ON f.PrimaryKey = t.PrimaryKey
SET f.Column1 = t.Column1,
f.Column2 = t.Column2,
f.Column3 = t.Column3,
f.Timestamp = t.Timestamp;")
For the most part, queries above will be compliant in most SQL backends should you ever need to change databases. Some RDMS do not support UPDATE INNER JOIN but equivalent alternatives are available. Finally, the beauty of this route is all processing is handled in the SQL engine and not in R.

Sounds like you're trying to do an upsert.
I'm kind of rusty with MySQL but the general idea is that you need to have a staging table to upload the new CSV, and then in the database itself do the insert/update.
For that you need to use dbSendQuery with INSERT ON DUPLICATE UPDATE.
http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

How can I duplicate a record in MS ACCESS with ASP/VBscript and primary key?

I'm searching for a simple method to duplicate a record (many many columns) in a MSAccess database. I'm working with ASP classic/VBscript pages and ADODB to execute SQL queries.
The record I want to duplicate has a primary key, so I cannot use the standard query
INSERT INTO mytable SELECT * FROM mytable WHERE id=... ---> error
I only found this workaround (a temporary table), but I think it's not the best option:
<%
set RS=Conn.Execute("SELECT * INTO temptable FROM mytable WHERE id="&id
set RS=Conn.Execute("ALTER TABLE temptable ALTER COLUMN id INTEGER")
set RS=Conn.Execute("SELECT MAX(id) AS maxid FROM mytable")
maxid=RS("maxid")+1
set RS=Conn.Execute("UPDATE temptable SET id="&maxid)
set RS=Conn.Execute("INSERT INTO mytable SELECT * FROM temptable")
set RS=Conn.Execute("DROP TABLE temptable")
%>
I could also specify all column names in my query, but I have a huge number of columns, and they're often modified (I don't want to keep track of all db structure changes)
Any better solution? Thanks

The probelm with duplicating the record is that primary keys are unique. If the record must exist in the same table, it has to have a different primary key. I suggest adding another column(s) to the table where you can put the origianl primary key. You can make a reasonalbe query that duplicates the record, saves the original primary key and assigns a new unique key to this record.
Frequently, the primary key is meaningless, i.e., assigned by the database (autonumber). You can use the technique pointed out by OverMind to access all of the columns (fields).
I used Scripting Dictionaries and routines that copied fields from RecordSets to the Scripting Dictionaries, from Scripting Dictionaries to other scripting Dictionaries and from Scripting Dictionaries to RecordSets to avoid the stright line code for copying records from one table to another where the number of fields was large.
I will share the service routines, if you want, but the code that actually moves the data around is proprietry.

Does mySql have an update/insert combo which inserts if the update fails?

I'm not optimistic that this can be done without a stored procedure, but I'm curious if the following is possible.
I want to write a single query insert/update that updates a row if it finds a match and if not inserts into the table with the values it would have been updating.
So... something like
updateInsert into table_a set n = 'foo' where p='bar';
in the event that there is no row where p='bar' it would automatically insert into table_a set n = 'foo';
EDIT:
Based on a couple of comments I see that I need to clarify that n is not a PRIMARY KEY and the table actually needs the freedom to have duplicate rows. I just have a situation where a specific entry needs to be unique... perhaps I'm just mixing metaphors in a bad way and should pull this out into a separate table where this key is unique.

I would enforce this with the table schema - utilize a unique multi-column key on the target table and use INSERT IGNORE INTO - it should throw an error on a duplicate key, but the insert will ignore on error.

Remove repeat rows from MySQL table

Is there a way to remove all repeat rows from a MySQL database?

A couple of years ago, someone requested a way to delete duplicates. Subselects make it possible with a query like this in MySQL 4.1:
DELETE FROM some_table WHERE primaryKey NOT IN
(SELECT MIN(primaryKey) FROM some_table GROUP BY some_column)
Of course, you can use MAX(primaryKey) as well if you want to keep the newest record with the duplicate value instead of the oldest record with the duplicate value.
To understand how this works, look at the output of this query:
SELECT some_column, MIN(primaryKey) FROM some_table GROUP BY some_column
As you can see, this query returns the primary key for the first record containing each value of some_column. Logically, then, any key value NOT found in this result set must be a duplicate, and therefore it should be deleted.

These questions / answers might interest you :
How to delete duplicate records in mysql database?
How to delete Duplicates in MySQL table.
And idea that's often used when you are working with a big table is to :
Create a new table
Insert into that table the unique records (i.e. only one version of the duplicates in the original table, generally using a select distinct)
and use that new table in your application ; or drop the old table and rename the new one to the old name.
Good thing with this principle is you have the possibility to verify what's in the new table before dropping the old one -- always nice to check that sort of thing ^^

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008