I have a CSV file which I need to enter into my Database. My modus operandi is a Bulk insert. One of the columns has a uniqueness constraint attached to it but it is not the primary column. If there is a duplicate entry, it correctly skips the line and does not enter it into the database. (On command line it indicates Duplicates: n, where n is the total number of duplicates).
Is there anyway I can retrieve the duplicate row numbers ? For instance, using Show Warnings or Show Errors, it states the last MySQL errors and warnings, is there anyway I can retrieve the duplicates from MySQL alone ?
Thanks.
You could enter the data into a temporary table first, without the uniqueness constraint, and perform a query to find all the duplicates.
SELECT unique_column, count(*) c
FROM temp_tablename
GROUP BY unique_column
HAVING c > 1;
Then copy it from the temporary table to the real table with:
INSERT IGNORE INTO tablename SELECT * FROM temp_tablename;
Related
I'm trying to insert data in a table which has the following columns :
id, date, client, device, user
I have created a primary key on id and unique key on the combination of client, device, and user.
While inserting data I getting the following Error:-
Cause: java.sql.SQLException: Duplicate entry '200-217-Xiaomi-M200' for key 'uk_user_client_device'
I checked the data in the table using the following query using:-
SELECT user, client, device, COUNT(1) rowcount FROM mytable GROUP BY user, client, device HAVING COUNT(1) > 1;
This got an empty set in response so I am certain there are no duplicate keys in the table.
I also went through the logs and I found that the data was inserted in the table at the same time I got this error. So, the data was inserted yet I got the duplicate entry error.
I also confirmed that this doesn't happen always. Sometimes, the data is inserted without any issue and sometimes I get an error and the data is inserted anyway.
I've seen a few questions regarding this with no definitive answer. I'm unable to figure out why this error is being thrown.
Of course this query returns no rows:
SELECT user, client, device, COUNT(1) as rowcount
FROM mytable
GROUP BY user, client, device
HAVING COUNT(1) > 1;
You have specified that client/device/user is a primary key (or at least unique. Hence, there are no duplicates. The database enforces this.
If you attempt to insert a row that would create a duplicate, the database returns an error. The row is not inserted. The database ensure the data integrity. Yay!
Your error is saying that the data in the insert either duplicates existing data in the table. Or, if you are inserting multiple rows, then the data has duplicates within the insert.
You can solve this in multiple ways. A typical method is to ignore the insert using on duplicate key update:
insert into mytable ( . . . )
. . .
on duplicate key update user = values(user); -- this is a no-op
The database prevents duplicates from being inserted. This construct prevents an error -- although no duplicate rows are inserted, of course.
I have to run a series of checks (governed by the table "Checks") and store the results in a table "Checks_result" (in a mysql database).
The table "Checks" contains an identifier (checkno) and a sql-statement (possibly returning many rows with a single value) to be executed.
The table "Check_results" has to contain all the rows returned from the sql-statement, with a reference to checkno and an autoincrement row checkentry for each returned row from the sql-statement.
Is it possible to do this?
What I was suggesting was when your table has the 2 SQL statements, you should read each record and construct another SQL statement along the lines of:
insert into check_results(checkno, checkresult )
select 1, i.val1-i.val2 from import i;
The select just needs the checkno added into it and the checkentry should be an autoincrement column.
I have database like the following with 10K rows. How to delete duplicate if all fields are same. I don't want to search for any specific company. Is there a way to search and find any multiple entries with all same fields get deleted. Thanks
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Note: This command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
OR
Simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table
This below query used to find the duplicate entry using all fields:
Select * from Table group by company_name,city,state,country having count(*)>1;
I'm trying to merge multiple tables all sharing the same data structure into one singular table however, it seems as though upon inserting the tables all it is doing is inserting an x amount of rows to equal out with the amount in the source.
Table 1: 20,000 Data Rows
Index Table: 10,000 Data Rows
So if I were to go and insert Table 1 into the Index table using the following:
INSERT IGNORE INTO 'database1`.`Index` SELECT * FROM
`database1`.`Table1` ;
Using the above, it only inserts 10,000 rows of the available 20,000 to equal out.
My guess is that the other 10,000 are duplicate values and, since you're using IGNORE on the INSERT, the statement completes without error.
Since you're using INSERT IGNORE, I'm guessing you have 10000 duplicate keys which are being silently thrown away. Didn't you think it odd that you need that?
Depending on your table layout, you'll need some way to rejig your tables to get around the key constraint. E.g., create a new autoincrement key for the table you're inserting into, append the other table, and sort out the duplications somehow.
It depends on the structures of the tables. Can you show us "show create table ..." on your two tables?
Also, list the feedback you get from the mysql client (e.g., "Records: 100 Duplicates: 0 Warnings: 0") - that helps pinpoint what happened and why.
If there are values with UNIQUE KEYS in your table, you won't get all the inserts to succeed.
Did you tried. REPLACE INTO it's like INSERT but it't replacing by uniq fields.
REPLACE INTO user (id,name) VALUES (12,"John");
#if there is user with id = 12 it wil lreplace it.
I am wondering if there is a way to do this through one query.
Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using
SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID
What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?
also I realize the ID column should have been set to auto increment, but it wasn't
My favorite way of removing duplicates would be:
ALTER IGNORE TABLE info ADD UNIQUE (ID);
To explain a bit further (for reference, take a look here)
UNIQUE - you are adding unique index to ID column.
IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
The query that I use is generally something like
Delete from table where id in (
Select Max(id) from table
Group by (DUPFIELD)
Having count (*)>1)
You have to run this several times since it all only remove one duplicated row at a time, but it's fast.
The most efficient way is you do it in below steps:
Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;