Deduplicate entries in large table that share same values in multiple columns - mysql

We've got this big 'favorites' table, and we ran in to an issue that uncovered the fact that we dont have a unique constraint on user, favorite_type, and favorite_id. I've made a migration that will add an index on to these 3, but it won't work because we have existing data that has the same set of entries. There's other data in there too (updated_at, created_at, id) that is fine to lose, but makes it an imperfect match.
Is there a way in rails (3.2.x) to do this, or a way in (my)SQL?
I know I could pull all of them, then group by, and map over a delete of all extra elements, but it is a very large table (1mil+) and we can't have long-running migrations.

Copy the table structure to a new table, add the unique constraints, then insert all the records. The duplicates will fail due to the constraint.
CREATE TABLE tableTmp LIKE table;
Add the constraints then insert all the records into the temporary table.
INSERT INTO tableTmp SELECT * FROM table
Verify the entries then drop and rename.
DROP TABLE table;
RENAME TABLE tableTmp TO table;

Related

MySQL renaming and create table at the same time

I need to rename MySQL table and create a new MySQL table at the same time.
There is critical live table with large number of records. master_table is always inserted records from scripts.
Need to backup the master table and create a another master table with same name at the same time.
General SQL is is like this.
RENAME TABLE master_table TO backup_table;
Create table master_table (id,value) values ('1','5000');
Is there a possibility to record missing data during the execution of above queries?
Any way to avoid missing record? Lock the master table, etc...
What I do is the following. It results in no downtime, no data loss, and nearly instantaneous execution.
CREATE TABLE mytable_new LIKE mytable;
...possibly update the AUTO_INCREMENT of the new table...
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;
By renaming both tables in one statement, they are swapped atomically. There is no chance for any data to be written "in between" while there is no table to receive the write. If you don't do this atomically, some writes may fail.
RENAME TABLE is virtually instantaneous, no matter how large the table. You don't have to wait for data to be copied.
If the table has an auto-increment primary key, I like to make sure the new table starts with an id value greater than the current id in the old table. Do this before swapping the table names.
SELECT AUTO_INCREMENT FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA='mydatabase' AND TABLE_NAME='mytable';
I like to add some comfortable margin to that value. You want to make sure that the id values inserted to the old table won't exceed the value you queried from INFORMATION_SCHEMA.
Change the new table to use this new value for its next auto-increment:
ALTER TABLE mytable_new AUTO_INCREMENT=<increased value>;
Then promptly execute the RENAME TABLE to swap them. As soon as new rows are inserted to the new, empty table, it will use id values starting with the increased auto-increment value, which should still be greater than the last id inserted into the old table, if you did these steps promptly.
Instead of renaming the master_backup table and recreating it, you could
just create a backup_table with the data from the master_table for the first backup run.
CREATE TABLE backup_table AS
SELECT * FROM master_table;
If you must add a primary key to the backup table then run this just once, that is for the first backup:
ALTER TABLE backup_table ADD CONSTRAINT pk_backup_table PRIMARY KEY(id);
For future backups do:
INSERT INTO backup_table
SELECT * FROM master_table;
Then you can delete all the data in the backup_table found in the master_table like:
DELETE FROM master_table A JOIN
backup_table B ON A.id=B.id;
Then you can add data to the master_table with this query:
INSERT INTO master_table (`value`) VALUES ('5000'); -- I assume the id field is auto_incrementable
I think this should work perfectly even without locking the master table, and with no missing executions.

How to remove duplicate entries from database which as over 10000 records with no id field

I have database like the following with 10K rows. How to delete duplicate if all fields are same. I don't want to search for any specific company. Is there a way to search and find any multiple entries with all same fields get deleted. Thanks
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Note: This command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
OR
Simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table
This below query used to find the duplicate entry using all fields:
Select * from Table group by company_name,city,state,country having count(*)>1;

In mysql how to insert a column with huge data contained in table with no downtime

If a table in MySQL containing suppose 1 million record, how can I add a column at any position with no downtime expected.
MySQL's ALTER TABLE performance can become very frustrating with very large tables. ALTER statements makes a new temporary table, copies records from your existing table into the new table even if the data wouldn't strictly need to be copied, and then replaces the old table with the new table.
Suppose you have a table with one million records and if you try to add 3 columns in it, then it will certainly copy the table 3 times, which means coping 3 million records.
A faster way of adding columns is to create your own new table, then select all of the rows from the existing table into it. You can create the structure from the existing table, then modify the structure however you’d like, then select in the data. Make sure that you select the information into the new table in the same order as the fields are defined.
1. CREATE TABLE new_table LIKE table
2. INSERT INTO new_table SELECT * FROM table
3. RENAME TABLE table = old_table, table = new_table;
If you have foreign key constraints you can handle these foreign keys using
SET FOREIGN_KEY_CHECKS = 0;

mySQL: duplicating multiple records via temporary table, how to preserve autoincrement index?

I wish to duplicate a selection of records in a mySQL table.
The pk of the table is an autoincremented int.
I want to do this with one set of mysql queries (for performance reasons).
It seems like the fastest way to do this is to put the results of the selection into a temporary table,
make any changes needed, and reinsert the records back to the original table, like this:
CREATE TEMPORARY TABLE temp1234 ENGINE=MEMORY SELECT * FROM a_table WHERE column='my selection';
# do updates in temp1234; (altering FK's mainly)
INSERT INTO a_table SELECT * FROM temp1234;
But when I try to do this i get an error for duplicate PKs.
Now, I realise that I could alter the INSERT with SELECT query to exclude the pk/ID column, but as I am proceduraly generating these queries across multiple tables for a large data copying function, i want to avoid having to supply column names.
What is the best way around this problem?

how to delete duplicate records in mysql table

I'm having an issue with finding and deleting duplicate records, I have a table with IDs called CallDetailRecordID which I need to scan and delete records, the reason there are duplicates is that I'm exporting data to special arching engine works with MySQL and it doesn't support indexing.
I tried using "Select DISTINCT" but it dosn't work, is there is another way? I'm hoping I can create a store procedure and have it run weekly to perform clean up.
your help is highly appreciated.
Thank you
CREATE TABLE tmp_table LIKE table
INSERT INTO tmp_table (SELECT * FROM table GROUP BY CallDetailRecordID)
RENAME table TO old_table
RENAME tmp_table to table
Drop the old table if you want, add a LOCK TABLES statement at the beginning to avoid lost inserts.