Deleting duplicate records in join table - mysql

I have a HABTM association between user and role.
User can be an admin (role_id = 1) or a user (role_id = 2) for roles.
In the join table, roles_users, I have some redundant records. For ex:
I want to remove the duplicate records such as 1:1, 2:4.
Two questions:
Where's the best place to execute the sql script that removes the dups -- migration? script?
What is the sql query to remove the dups?

CREATE TABLE roles_users2 LIKE roles_users; -- this ensures indexes are preserved
INSERT INTO roles_users2 SELECT DISTINCT * FROM roles_users;
DROP TABLE roles_users;
RENAME TABLE roles_users2 TO roles_users;
and for the future, to prevent duplicate rows
ALTER TABLE roles_users ADD UNIQUE INDEX (role_id, user_id);
Or, you can do all of it in one step with ALTER TABLE IGNORE:
ALTER IGNORE TABLE roles_users ADD UNIQUE INDEX (role_id, user_id);
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

The simplest is to copy the data into a new table, minus the duplicates:
CREATE TABLE roles_users2 AS
SELECT DISTINCT * FROM roles_users
You can then choose one of the following:
Drop the old table, rename the new table to the old name and add indexes.
Truncate the old table and insert the rows from roles_users2 back into roles_users.

Related

MySQL renaming and create table at the same time

I need to rename MySQL table and create a new MySQL table at the same time.
There is critical live table with large number of records. master_table is always inserted records from scripts.
Need to backup the master table and create a another master table with same name at the same time.
General SQL is is like this.
RENAME TABLE master_table TO backup_table;
Create table master_table (id,value) values ('1','5000');
Is there a possibility to record missing data during the execution of above queries?
Any way to avoid missing record? Lock the master table, etc...
What I do is the following. It results in no downtime, no data loss, and nearly instantaneous execution.
CREATE TABLE mytable_new LIKE mytable;
...possibly update the AUTO_INCREMENT of the new table...
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;
By renaming both tables in one statement, they are swapped atomically. There is no chance for any data to be written "in between" while there is no table to receive the write. If you don't do this atomically, some writes may fail.
RENAME TABLE is virtually instantaneous, no matter how large the table. You don't have to wait for data to be copied.
If the table has an auto-increment primary key, I like to make sure the new table starts with an id value greater than the current id in the old table. Do this before swapping the table names.
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA='mydatabase' AND TABLE_NAME='mytable';
I like to add some comfortable margin to that value. You want to make sure that the id values inserted to the old table won't exceed the value you queried from INFORMATION_SCHEMA.
Change the new table to use this new value for its next auto-increment:
ALTER TABLE mytable_new AUTO_INCREMENT=<increased value>;
Then promptly execute the RENAME TABLE to swap them. As soon as new rows are inserted to the new, empty table, it will use id values starting with the increased auto-increment value, which should still be greater than the last id inserted into the old table, if you did these steps promptly.
Instead of renaming the master_backup table and recreating it, you could
just create a backup_table with the data from the master_table for the first backup run.
CREATE TABLE backup_table AS
SELECT * FROM master_table;
If you must add a primary key to the backup table then run this just once, that is for the first backup:
ALTER TABLE backup_table ADD CONSTRAINT pk_backup_table PRIMARY KEY(id);
For future backups do:
INSERT INTO backup_table
SELECT * FROM master_table;
Then you can delete all the data in the backup_table found in the master_table like:
DELETE FROM master_table A JOIN
backup_table B ON A.id=B.id;
Then you can add data to the master_table with this query:
INSERT INTO master_table (`value`) VALUES ('5000'); -- I assume the id field is auto_incrementable
I think this should work perfectly even without locking the master table, and with no missing executions.

How to get rid of duplicate results in a table

I have a table that has some duplicate results. For example:
`person_url` `movie_url`
1 2
1 2
2 3
Would become -->
`person_url` `movie_url`
1 2
2 3
I know how to do it by creating a new table,
create table tmp_credits (select distinct * from name);
However, it is a pretty large table and I have a couple indexes on it which will need to be re-created. How would I do this transformation in place, that is, without creating a new table?
You can add a UNIQUE index over your table's columns using the IGNORE keyword:
ALTER IGNORE TABLE name ADD UNIQUE INDEX (person_url, movie_url);
As stated in the manual:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
This will also prevent duplicates from being added in the future.
`create table temp
(col1 varchar(20),col2 varchar(20));
INSERT INTO temp VALUES
('1','one'),('2','two'),('2','two');
`select col1,col2 from temp
union
select col1,col2 from temp;
`
Have you considered just putting a semantic layer/view on top of the table that de-dups?
select person_url, movie_url
from name
group by person_url, movie_url

Is there a MySQL command drop all indexes except PRIMARY index?

I have a database table with one Index where the keyname is PRIMARY, Type is BTREE, Unique is YES, Packed is NO, Column is ID, Cardinality is 728, and Collation is A.
I have a script that runs on page load that adds entries to the MySQL database table and also removes duplicates from the Database Table.
Below is the script section that deletes the duplicates:
// Removes Duplicates from the MySQL Database Table
// Removes Duplicates from the MySQL Database Table based on 'Entry_Date' field
mysql_query("Alter IGNORE table $TableName add unique key (Entry_Date)");
// Deletes the added index created by the Removes Duplicates function
mysql_query("ALTER TABLE $TableName DROP INDEX Entry_Date");
Using the Remove Duplicates command above, an additional index is added to the table. The next line command is suppose to delete this added index.
The problem is that sometimes the added index created by the Removes Duplicates command does not get deleted by the following Delete added index command and therefore more indexes are added to the table. These additional indexes prevent the script from adding additional data to the database until I remove the added indexes by hand.
My Question:
Is there a command or short function that I can add to the script that will delete all indexes except the original index mentioned in the beginning of this post?
I did read the following post, but I don't know if this is the correct script to use:
How to drop all of the indexes except primary keys with single query
I don't think so, what you can do is create copies but that wouldn't copy the index. for example if you make
create table1 as (select * from table_2), he will make copy but without index or PK.
After all the comments I think I realize what is happening.
You actually allow duplicates in the database. You just want to clean them some times.
The problem is that the method you have chosen to clean them is through creating a Unique key and using the IGNORE option which causes duplicate lines to get dropped instead of failing the unique key creation. then you drop the unique key so that duplicate rows can be added again. your problem is that sometimes the unique key is not being dropped.
I suggest you delete the duplicates in another way. supposing that your table name is "my_table" and your primary key is my_mey_column then:
delete from my_table where my_key_column not in (select min(my_key_column) from my_table group by Entry_Date)
Edit: the above won't work due to limitation in mysql as pointed by #a_horse_with_no_name
try the three following queries instead:
create temporary table if not exists tmp_posting_data select id from posting_data where 1=2
insert into tmp_posting_data(id) select min(id) from posting_data group by Entry_Date
delete from Posting_Data where id not in (select id FROM tmp_posting_data)
As a final note, try to reconsider the need to allow the rows to be duplicated also as suggested by #a_horse_with_no_name. instead of allowing rows to be entered and then deleted, you can create the unique key once in the database like:
Alter table posting_data add unique key (Entry_Date)
and then, when you are inserting new data from the RSS use the following instead of "insert" use "replace" which will delete the old row if it is a duplicate on the primary key or any unique index
replace into posting_data (......) values(.....)

Mysql delete duplicates

I'm able to display duplicates in my table
table name reportingdetail and column name ReportingDetailID
SELECT DISTINCT ReportingDetailID from reportingdetail group by ReportingDetailID HAVING count(ReportingDetailID) > 1;
+-------------------+
| ReportingDetailID |
+-------------------+
| 664602311 |
+-------------------+
1 row in set (2.81 sec)
Dose anyone know how can I go about deleting duplicates and keep only one record?
I tired the following
SELECT * FROM reportingdetail USING reportingdetail, reportingdetail AS vtable WHERE (reportingdetailID > vtable.id) AND (reportingdetail.reportingdetailID=reportingdetailID);
But it just deleted everything and kept single duplicates records!
The quickest way (that I know of) to remove duplicates in MySQL is by adding an index.
E.g., assuming reportingdetailID is going to be the PK for that table:
mysql> ALTER IGNORE TABLE reportingdetail
-> ADD PRIMARY KEY (reportingdetailID);
From the documentation:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table or
if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key errors
occur. If IGNORE is specified, only the first row is used of rows with
duplicates on a unique key. The other conflicting rows are deleted.
Incorrect values are truncated to the closest matching acceptable
value.
Adding this index will both remove duplicates and prevent any future duplicates from being inserted. If you do not want the latter behavior, just drop the index after creating it.
The following MySQL commands will create a temporary table and populate it with all columns GROUPED by one column name (the column that has duplicates) and order them by the primary key ascending. The second command creates a real table from the temporary table. The third command drops the table that is being used and finally the last command renames the second temporary table to the current being used table name.
Thats a really fast solution. Here are the four commands:
CREATE TEMPORARY TABLE videos_temp AS SELECT * FROM videos GROUP by
title ORDER BY videoid ASC;
CREATE TABLE videos_temp2 AS SELECT * FROM videos_temp;
DROP TABLE videos;
ALTER TABLE videos_temp2 RENAME videos;
This should give you duplicate entries.
SELECT `ReportingDetailID`, COUNT(`ReportingDetailID`) AS Nummber_of_Occurrences FROM reportingdetail GROUP BY `ReportingDetailID` HAVING ( COUNT(`ReportingDetailID`) > 1 )

mySQL find dupes and remove them

I am wondering if there is a way to do this through one query.
Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using
SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID
What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?
also I realize the ID column should have been set to auto increment, but it wasn't
My favorite way of removing duplicates would be:
ALTER IGNORE TABLE info ADD UNIQUE (ID);
To explain a bit further (for reference, take a look here)
UNIQUE - you are adding unique index to ID column.
IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
The query that I use is generally something like
Delete from table where id in (
Select Max(id) from table
Group by (DUPFIELD)
Having count (*)>1)
You have to run this several times since it all only remove one duplicated row at a time, but it's fast.
The most efficient way is you do it in below steps:
Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;