Fix DB duplicate entries (MySQL bug)

Fix DB duplicate entries (MySQL bug) - mysql

I'm using MySQL 4.1. Some tables have duplicates entries that go against the constraints.
When I try to group rows, MySQL doesn't recognise the rows as being similar.
Example:
Table A has a column "Name" with the Unique proprety.
The table contains one row with the name 'Hach?' and one row with the same name but a square at the end instead of the '?' (which I can't reproduce in this textfield)
A "Group by" on these 2 rows return 2 separate rows
This cause several problems including the fact that I can't export and reimport the database. On reimporting an error mentions that a Insert has failed because it violates a constraint.
In theory I could try to import, wait for the first error, fix the import script and the original DB, and repeat. In pratice, that would take forever.
Is there a way to list all the anomalies or force the database to recheck constraints (and list all the values/rows that go against them) ?
I can supply the .MYD file if it can be helpful.

To list all the anomalies:
SELECT name, count(*) FROM TableA GROUP BY name HAVING count(*) > 1;
There are a few ways to tackle deleting the dups and your path will depend heavily on the number of dups you have.
See this SO question for ways of removing those from your table.
Here is the solution I provided there:
-- Setup for example
create table people (fname varchar(10), lname varchar(10));
insert into people values ('Bob', 'Newhart');
insert into people values ('Bob', 'Newhart');
insert into people values ('Bill', 'Cosby');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Adam', 'Sandler');
-- Show table with duplicates
select * from people;
-- Create table with one version of each duplicate record
create table dups as
select distinct fname, lname, count(*)
from people group by fname, lname
having count(*) > 1;
-- Delete all matching duplicate records
delete people from people inner join dups
on people.fname = dups.fname AND
people.lname = dups.lname;
-- Insert single record of each dup back into table
insert into people select fname, lname from dups;
-- Show Fixed table
select * from people;

Create a new table, select all rows and group by the unique key (in the example column name) and insert in the new table.

To find out what is that character, do the following query:
SELECT HEX(Name) FROM TableName WHERE Name LIKE 'Hach%'
You will se the ascii code of that 'square'.
If that character is 'x', you could update like this:(but if that column is Unique you will have some errors)
UPDATE TableName SET Name=TRIM(TRAILING 'x' FROM Name);

I'll assume this is a MySQL 4.1 random bug. Somes values are just changing on their own for no particular reason even if they violates some MySQL constraints. MySQL is simply ignoring those violations.
To solve my problem, I will write a prog that tries to resinsert every line of data in the same table (to be precise : another table with the same caracteristics) and log every instance of failures.
I will leave the incident open for a while in case someone gets the same problem and someone else finds a more practical solution.

Related

How to use "Insert Ignore Into" Mysql query

I have three tables :
user_details
user_badges
badges
I have a query where I use "insert ignore into" command to insert the badge id from badges and user id from user_details into user_badge.
$username=$_SESSION['username'];
$getlife="SELECT * FROM user_details WHERE username='$username'";
$getlifedb=mysqli_query($db, $getlife);
while ($row=mysqli_fetch_array($getlifedb)){
if($row['fullife'] == '1'){
$full="INSERT IGNORE INTO user_badge(user_id, badge_id)
SELECT u.user_id, b.badge_id
FROM user_details as u
CROSS JOIN badges as b
WHERE u.user_id='$id'
AND b.badge_id='8'";
mysqli_query($db, $full) or die(mysqli_error($db));
}
}
Below is the picture of user_badge table:
But every time I execute the command, the insert ignore doesnt work, but instead it just insert a field again. Can someone help me fix this issue.

INSERT IGNORE only works when a duplicating field in table is being set as UNIQUE. user_id in table user_badge is not set to UNIQUE, that's why it duplicates it.
If you need to have multiple badges for an user, but those are not supposed to be duplicated, you need to set the pair to be UNIQUE, -> UNIQUE(user_id,user_badge)
Without UNIQUE, INSERT IGNORE doesn't make any sense, as it doesn't throw any errors when inserting the same records.

From msql documentation "Use the INSERT IGNORE command rather than the INSERT command. If a record doesn't duplicate an existing record, then MySQL inserts it as usual. If the record is a duplicate, then the IGNORE keyword tells MySQL to discard it silently without generating an error."
I agree fully with #Flush Thunder. If you do not care having duplicate entries just use INSERT instead, but for what you´re writing this is not the case.
Apparently, your user must have a unique id, while may have several badges which is consistent with a unique user.

Remove Duplicates in MySQL

I have a database table that was generated by importing several thousand text documents each very large. For some reason, some files were imported multiple times.
I am trying to remove duplicate rows by using following query:
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX);
but I was getting an error
1062 - Duplicate entry
Apparently, IGNORE has been deprecated.
How can I remove duplicates from my database?
I guess I have to do a DELETE with a JOIN but I can't figure out the code.
The table is InnoDB and currently has about 40,000,000 rows (there should be about 17,000,000). Each row has a primary key.
Considering the size, I am hesitant to temporally change the table to MyISAM.

Each row has a primary key
Is a unique number?
Create an AUX table like this(assuming ID is the PK):
create table mytable_aux as (
select LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX, MIN(ID)
from mytable
group by LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX);
Then delete everything that is not in aux table:
delete from mytable where id not in (select aux.id from mytable_aux aux) ;

Assuming it is just one table and you have the SQL dump available...
CREATE the table with all the relationships established but no data inserted. Keep the INSERT statements stored in a separate .sql file.
Change all the INSERT statements to INSERT IGNORE.
Import the updated .sql file containing only the INSERT IGNORE statements. The duplicates will be automatically ignored.
Please note that, without comparing manually, you won't be able to figure out which or how many records were ignored.
However, if you're absolutely sure that you really don't need the duplicates based on the relationships defined on the table, then this approach works fairly well.
Also, if you'd like to do the same with multiple tables, you'll have to make sure that you CREATE all the tables at the start, define the foreign keys / dependencies AND, most importantly, arrange the new .sql file in such a manner that the table that has no dependency gets the INSERT statements loaded first. Likewise, the last set of INSERT statements will be for the table with the most number of dependencies.
Hope that helps.

If those are the only fields in your table you can always:
create table temp_unique as
select distinct LASTNAME, FIRSTNAME, HOUSENUMBER, STREET, CITY, ZIP, DOB, SEX
from mytable
then rename (or drop if you dare) mytable and rename temp_unique to mytable, then create your indexes (make sure to create any other indexes or FKs or whatever that already exist).
If you're working on a live table you'll have to delete the underlying records one at a time. That's quite a bit different -- add a uid then perform deletes. If that's your situation, let us know, we can refactor.

insert into 2column table, avoiding duplicate records

I have a table with 2 columns, userid and messageid. I am updating it automatically through a php script, but I can't get a working insert statement. I don't mind if there are duplicates of userid, or duplicates of messageid (in fact there should be duplicates of both), I just don't want any duplicate of the same combination of userid and messageid. Is there any way to write a query that will do this for me, or do I have to handle it at the php level?
I've probably tried 20 different queries that I found on here and google, but have not gotten it right. This was the last thing I tried:
INSERT INTO interests_join (userid, interestid)
VALUES (1, 4)
WHERE NOT EXISTS
(SELECT userid, interestid FROM interests_join WHERE userid = 1 AND interestid = 4)

You can add a UNIQUE KEY, sql will refuse to insert a new row that is a duplicate of an existing one.
ALTER TABLE `interests_join` ADD UNIQUE `row` (`userid`, `interestid`);
Then you'll have to check from PHP if the query was successful or not (error #1062). You can't apply the key if there are duplicate rows, you have to remove them first .

Inserting Persons with IDs in one query?

I need to add data to a MySQL database like that:
Person:
pId, nameId, titleId, age
Name:
nameId, name
Title:
titleId, title
I don't want to have any names or title more then once in the db so I didn't see a solution with LAST_INSERT_ID()
My approach looks like that:
INSERT IGNORE INTO Name(name) VALUES ("Peter");
INSERT IGNORE INTO Title(title) VALUES ("Astronaut");
INSERT INTO Person(nameId, titleId, age) VALUES ((SELECT nameId FROM Name WHERE name = "Peter"), (SELECT nameId FROM Name WHERE name = "Astronaut"), 33);
But I guess that's a quite dirty approach!?
If possible I want to add multiple persons with one query and without having anything more then one times in db.
Is this possible in a nice way? Thanks!

You could put title and name as two columns of your table and then:
set one UNIQUE index on each column if you don"t want to have two titles or two names identical in the DB
or set an UNIQUE index on (title,name) if you don't want to have two entries having both the same name and the same title.
If you really want to have separate tables, you could do as you suggested in your post, but wrapping all your insert statements in a TRANSACTION to allow rollback if you detect a duplicate somewhere.
See Design dilemma: If e-mail address already used, send e-mail "e-mail address already registered", but can't because can't add duplicate to table which appear to be exactly the same problem, but having name & email instead of name & titles.
START TRANSACTION;
INSERT INTO title(value) VALUES ("Prof.");
SELECT LAST_INSERT_ID() INTO #title_id;
-- Instead of using user-defined variable,
-- you should be able to use the last_insert_id
-- equivalent from the host language MySQL driver.
INSERT INTO username(value) VALUES ("Sylvain");
SELECT LAST_INSERT_ID() INTO #username_id;
-- Instead of using user-defined variable,
-- you should be able to use the last_insert_id
-- equivalent from the host language MySQL driver.
INSERT INTO account(username_id, email_id) VALUES (#username_id,#title_id);
COMMIT;
See LAST_INSERT_ID()
A third solution would be to SELECT before doing you insert to see in the entry are already present. But personally I wouldn't push to the check-before-set approach at the very least, this will require an extra query which is mostly superfluous if you use correctly indexes.

Inserting database row with values from another table

Basically, I have two tables: images and servers. When I want to insert a row into the images table, I need to specify a s_id as one of the fields. Problem is, I only have name, which is another field in the servers table. I need to find what s_id belongs to name, and then use that in my INSERT INTO query on the images table.
Maybe this image will help:
http://i.imgur.com/rYXbW.png
I only know the name field from the servers table, and I need to use it to get the s_id field from the servers table. When I have that, I can use it in my INSERT INTO query, as it's a foreign key.
I found this:
http://www.1keydata.com/sql/sqlinsert.html
But it just confused me even more.
One solution would be to run two queries. One to get the s_id, and one to run the insert query. But I'd like to limit the amount of queries I run if there's a reasonable alternative.
Thanks!

You can use the INSERT ... SELECT form, something like this (with real column names and values of course):
INSERT INTO images (s_id, u_id, name, filename, uploaded)
SELECT s_id, ...
FROM servers
WHERE name = 'the name'
I don't know where you're getting the u_id, name, filename, or uploaded column values for images but you can include them as literal values in the SELECT:
INSERT INTO images (s_id, u_id, name, filename, uploaded)
SELECT s_id, 11, 'pancakes', 'pancakes.jpg', '2011-05-28 11:23:42'
FROM servers
WHERE name = 'the name'
This sort of thing will insert multiple values if servers.name is not unique.

You should be able to do something like this, but you'll need to fill in the items in <> with the values you want to insert.
INSERT INTO images (s_id, u_id, name, filename, uploaded)
(SELECT s_id, <u_id>, <name>, <filename>, <uploaded>
FROM imgstore.servers
WHERE name = #server_name)
This is the syntax for SQL Server, but I think it will work with MySQL as well.
Here's an article on INSERT ... SELECT Syntax

Please see my comment above regarding a potential data integrity issue. I am assuming that the name field in your server table has a unique constraint placed on it.
There are a couple of ways that you can approach this INSERT, and I'm sure that some are better than others. I make no claim that my way is the best way, but it should work. I don't know how you're writing this query, so I'm going to use #FieldValue to represent the variable input. My approach is to use a subquery in your insert statement to get the data that you require.
INSERT INTO images (field1, field2... s_id) VALUES ('#field1val', '#field2val'... (SELECT s_id FROM servers WHERE name='#nameval'));

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Fix DB duplicate entries (MySQL bug) - mysql

Create a new table, select all rows and group by the unique key (in the example column name) and insert in the new table.

Related

How to use "Insert Ignore Into" Mysql query

Remove Duplicates in MySQL

insert into 2column table, avoiding duplicate records

Inserting Persons with IDs in one query?

Inserting database row with values from another table

Categories

Resources