Check for duplicates in a database and delete them

Check for duplicates in a database and delete them - mysql

I have a table structured as follows:
table(A, B)
They are both primary keys and they are needed to connect two entries in another table (i.e. they simbolize a friendship between users).
I need to check the table and, if (A,B) exists, delete an eventual (B,A) (or vice-versa).
Since the database is huge, I can't do this manually for every single entry each time.
Of course, I programmed the script that populated the database to check for this situation and avoid it, but we've been using that script on 8 different PCs and so the different dumps may have "reverse duplicates".

The problem has arisen because the relationship you are trying to describe is symmetrical - but the schema models an asymmetric association. The right to model the problem would be to maintain a table of relationships - then have a table linking users to relationships, e.g.
relationship:
id auto_increment
related:
r_id foreign key references relationship.id
u_id foreign key references user.id
primary key (r_id, u_id)
But to clean up the existing data...an obvious approach would be...
DELETE FROM yourtable d
WHERE A>B AND EXISTS (
SELECT 1
FROM yourtable r
WHERE r.A=d.B
AND r.B =d.A
)
However, if I recall correctly MySQL doesn't like using a subselect in a delete which references the same table as the delete. So....
SELECT d.A,d.B
INTO dups
FROM yourtable d, yourtable r
WHERE d.A>d.B
AND r.A=d.B
AND r.B =d.A;
then....
DELETE FROM yourtable
WHERE EXISTS (
SELECT 1 FROM dups
WHERE dups.A=yourtable.A
AND dups.B=yourtable.B
)
Not sure if the pushed predicate will still cause a problem, so if that doesn't work....
DELETE FROM yourtable
WHERE CONCAT(A, '/', B) IN (
SELECT CONCAT(A, '/' B) FROM dups
)

Related

Delete entry that is connected to 2 tables

table 1 is called (athlete) and table2 is called (training_session.id) the primary key to table 1 is ID, and the table 2 has the primary key Athelete_id
I want to delete a person from my database by using his name, which I've called "Pet". However, he is also connected to another table which stores his training session. So (ID 1) on table 1 is connected to table 2 (athlete id1)
I struggle a lot, I try using INNER JOIN.
DELETE athlete,training_session FROM athlete
INNER JOIN
training_session ON training_session.id = athlete.name
WHERE
athlete.name = "Pet;
I have something wrong with my syntax, is it correct to use Inner Join or have I misunderstood

You should have set up foreign key constraints with Cascade deletions to simplify the logic and all you would have needed than was to delete from athlete. So I would suggest you add it.
For more info you can take a look at:
http://www.mysqltutorial.org/mysql-on-delete-cascade/

Mysql, Insert new record into table B if foreign key exists in table A

There are a few similar questions on here. None provide a solution. I would like to INSERT a NEW record into table B, but only if a foreign key exists in table A. To be clear, I do not wish to insert the result of a select. I just need to know that the foreign key exists.
INSERT INTO tableB (tableA_ID,code,notes,created) VALUES ('24','1','test',NOW())
SELECT tableA_ID FROM tableA WHERE tableA_ID='24' AND owner_ID='9'
Clearly, the above does not work. But is this even possible? I want to insert the NEW data into tableB, only if the record for the row in tableA exists and belongs to owner_ID.
The queries I have seen so far relate to INSERTING the results from the SELECT query - I do not wish to do that.

Try this:
INSERT INTO tableB (tableA_ID,code,notes,created)
SELECT id, code, notes, created
FROM ( SELECT '24' as id, '1' as code, 'test' as notes, NOW() as created) t
WHERE EXISTS
(
SELECT tableA_ID
FROM tableA
WHERE tableA_ID='24' AND owner_ID='9'
)

I know it's a pretty much old answered question but it's highly ranked now in google search results and I think some addition may help someone in the future.
In some DB configuration, you may want to insert a row in a table that have two or more foreign keys. Let's say we have four tables in a chat application :
Users, Threads, Thread_Users and Messages
If we want a User to join a Thread we'll want to insert a row in Thread_Users in wich have two foreign keys : user_id, thread_id.
Then, we can use a query like this, to insert if both foreign keys exists, and silently fail otherwise :
INSERT INTO `thread_users` (thread_id,user_id,status,creation_date)
SELECT 2,3,'pending',1601465161690 FROM (SELECT 1 as nb_threads, 1 as nb_users) as tmp
WHERE tmp.nb_threads = (SELECT count(*) FROM `threads` WHERE threads.id = 2)
AND tmp.nb_users = (SELECT count(*) FROM `users` WHERE users.id = 3)
It's a little verbose but it does the job pretty well.
Application-side, we just have to raise an error if affectedRows = 0 and maybe trying to see which of the keys doesn'nt exists. IMHO, it's a better way to do the job than to execute two SELECT queries and THEN execute the INSERT especially when an inexistent foreign key probability is very low.

Delete all not referenced (by foreign key) records from a table in mysql

I have a address table which is referenced from 6 other tables (sometimes multiple tables). Some of those tables have around half a million records (and the address table around 750000 records). I want to have a periodical query running which deletes all records that are not referenced from any of the tables.
The following sub-queries is not a option, because the query never finishes - the scope is too big.
delete from address where address_id not in (select ...)
and not in (select ...) and not in (select ...) ...
What I was hoping was that I could use the foreign key constraint and I could simply delete all records for which the foreign key constraint does not stop me (because there is no reference to the table). I could not find a way to do this (or is there?). Anybody another good idea to tackle this problem?

You can try this ways
DELETE
address
FROM
address
LEFT JOIN other_table ON (address.id = other_table.ref_field)
LEFT JOIN other_table ON (address.id = other_table2.ref_field)
WHERE
other_table.id IS NULL AND other_table2.id IS NULL
OR
DELETE
FROM address A
WHERE NOT EXISTS (
SELECT 1
FROM other_table B
WHERE B.a_key = A.id
)

I always use this:
DELETE FROM table WHERE id NOT IN (SELECT id FROM OTHER table)

I'd do this by first creating a TEMPORARY TABLE (t) that is a UNION of the IDs in the 6 referencing tables, then run:
DELETE x FROM x LEFT JOIN t USING (ID) WHERE x.ID IS NULL;
Where x is the address table.
See 'Multiple-table syntax' here:
http://dev.mysql.com/doc/refman/5.0/en/delete.html
Obviously, your temporary table should have its PRIMARY KEY on ID. It may take some time to query and join, but I can't see a way round it. It should be optimized, unlike the multiple sub-query version.

Mysql, clean up loose ends

Is there some nice automated way/program for cleaning up my database? I have a couple of tables with relations that in some cases just point at records that doesn't exist.

Sounds like you are missing some Foreign Key Constraints.
Depending on the ON DELETE option, orphaned records will be deleted together with the referenced records, referencing columns set to NULL, or deleting will be rejected.
You will have to delete those existing entries manually using a query like this, before creating your constraints:
DELETE FROM table_a
WHERE ref_b IS NOT NULL
AND NOT EXISTS ( SELECT 1 FROM table_b WHERE table_b.id = table_a.ref_b )

How to set a database integrity check on foreign keys referenced fields

I have four Database Tables like these:
Book
ID_Book |ID_Company|Description
BookExtension
ID_BookExtension | ID_Book| ID_Discount
Discount
ID_Discount | Description | ID_Company
Company
ID_Company | Description
Any BookExtension record via foreign keys points indirectly to two different ID_Company fields:
BookExtension.ID_Book references a Book record that contains a Book.ID_Company
BookExtension.ID_Discount references a Discount record that contains a Discount.ID_Company
Is it possible to enforce in Sql Server that any new record in BookExtension must have Book.ID_Company = Discount.ID_Company ?
In a nutshell I want that the following Query must return 0 record!
SELECT count(*) from BookExtension
INNER JOIN Book ON BookExstension.ID_Book = Book.ID_Book
INNER JOIN Discount ON BookExstension.ID_Discount = Discount.ID_Discount
WHERE Book.ID_Company <> Discount.ID_Company
or, in plain English:
I don't want that a BookExtension record references a Book record of a Company and a Discount record of another different Company!

Unless I've misunderstood your intent, the general form of the SQL statement you'd use is
ALTER TABLE FooExtension
ADD CONSTRAINT your-constraint-name
CHECK (ID_Foo = ID_Bar);
That assumes existing data already conforms to the new constraint. If existing data doesn't conform, you can either fix the data (assuming it needs fixing), or you can limit the scope (probably) of the new constraint by also checking the value of ID_FooExtension. (Assuming you can identify "new" rows by the value of ID_FooExtension.)
Later . . .
Thanks, I did indeed misunderstand your situation.
As far as I know, you can't enforce that constraint the way you want to in SQL Server, because it doesn't allow SELECT queries within a CHECK constraint. (I might be wrong about that in SQL Server 2008.) A common workaround is to wrap a SELECT query in a function, and call the function, but that's not reliable according to what I've learned.
You can do this, though.
Create a UNIQUE constraint on Book
(ID_Book, ID_Company). Part of it will look like UNIQUE (ID_Book, ID_Company).
Create a UNIQUE constraint on Discount (ID_Discount, ID_Company).
Add two columns to
BookExtension--Book_ID_Company and
Discount_ID_Company.
Populate those new columns.
Change the foreign key constraints
in BookExtension. You want
BookExtension (ID_Book,
Book_ID_Company) to reference
Book (ID_Book, ID_Company). Similar change for the foreign key
referencing Discount.
Now you can add a check constraint to guarantee that BookExtension.Book_ID_Company is the same as BookExtension.Discount_ID_Company.

I'm not sure how [in]efficient this would be but you could also use an indexed view to achieve this. It needs a helper table with 2 rows as CTEs and UNION are not allowed in indexed views.
CREATE TABLE dbo.TwoNums
(
Num int primary key
)
INSERT INTO TwoNums SELECT 1 UNION ALL SELECT 2
Then the view definition
CREATE VIEW dbo.ConstraintView
WITH SCHEMABINDING
AS
SELECT 1 AS Col FROM dbo.BookExtension
INNER JOIN dbo.Book ON dbo.BookExtension.ID_Book = Book.ID_Book
INNER JOIN dbo.Discount ON dbo.BookExtension.ID_Discount = Discount.ID_Discount
INNER JOIN dbo.TwoNums ON Num = Num
WHERE dbo.Book.ID_Company <> dbo.Discount.ID_Company
And a unique index on the View
CREATE UNIQUE CLUSTERED INDEX [uix] ON [dbo].[ConstraintView]([Col] ASC)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Check for duplicates in a database and delete them - mysql

Related

Delete entry that is connected to 2 tables

Mysql, Insert new record into table B if foreign key exists in table A

Delete all not referenced (by foreign key) records from a table in mysql

Mysql, clean up loose ends

How to set a database integrity check on foreign keys referenced fields

Categories

Resources