I have a table called prospective_shop and one of the column name is 'username'. Username is not set as a primary key, but I wanted to remove all rows that have duplicate username. How can I do this the fastest way?
I tried doing the following:
ALTER IGNORE TABLE `prospective_shop` ADD UNIQUE INDEX idx_name (username);
but then it gives me:
Duplicate entry 'calista_shopp' for key 'idx_name'
delete from prospective_shop
where id not in
(
select * from
(
select min(id)
from prospective_shop
group by username
) x
)
You can just delete all records that are not the first ones for every unique username. By selecting min(id)` for every username group you make sure not to delete those but all the rest.
In MySQL you can't delete from a table you are selecting from at the same time. You can trick the engine by using another subselect as I did. The x is just an alias name for the temp table.
Related
So I have an existing MySQL users table with thousands of records in it. I have noticed duplicate records for users which is a problem that I need to address. I know that the way I need to do this is to somehow make 2 columns unique.
The duplicates are arising with records containing both the same server_id column, and also the same user_id column. These 2 columns are meant to be unique combined. So there should only ever be 1 user_id per server_id.
I have figured out how I can find these duplicates using the following query:
SELECT `server_id`, `user_id`, COUNT(*) AS `duplicates` FROM `guild_users` GROUP BY `server_id`, `user_id` HAVING `duplicates` > 1
From what I have read, I need to delete all duplicates first before I add any constraints. This is one of the things I am unsure about.
Question 1: How would I go about deleting all duplicates, but leaving 1 of each so the user still exists, just not the other duplicates.
Question 2: What is the best way of avoiding duplicates from being created? Should I create a unique constraint for both of the columns, or do something with primary keys instead?
In your table there must exist a primary key column like an id.
So you can use EXISTS to delete the duplicates and keep just 1:
delete gu from guild_users gu
where exists (
select 1 from guild_users
where
server_id = gu.server_id
and
user_id = gu.user_id
and
id > gu.id
)
After that you can create a unique constraint for the 2 columns:
alter table guild_users
add constraint un_server_user unique
(server_id, user_id);
You want to prevent this by adding a unique index:
create unique index unq_guild_users_server_user on guild_users(server_id, user_id);
If you have a primary key, you can delete the duplicates before adding the unique index:
delete g
from guild_users g left join
(select server_id, user_id, max(primary_key) as max_pk
from guild_users
group by server_id, user_id
) su
on gu.primary_key = su.max_pk
where su.max_pk is null;
i am trying to delete e-mail duplicates from table nlt_user
this query is showing correctly records having duplicates:
select [e-mail], count([e-mail])
from nlt_user
group by [e-mail]
having count([e-mail]) > 1
now how can i delete all records having duplicate but one?
Thank you
If MySQL version is prior 5.7.4 you can add a UNIQUE index on the column e-mail with the IGNORE keyword.
This will remove all the duplicate e-mail rows:
ALTER IGNORE TABLE nlt_user
ADD UNIQUE INDEX idx_e-mail (e-mail);
If > 5.7.4 you can use a temporary table (IGNORE not possible on ALTER anymore):
CREATE TABLE nlt_user_new LIKE nlt_user;
ALTER TABLE nlt_user_new ADD UNIQUE INDEX (emailaddress);
INSERT IGNORE INTO nlt_user_new SELECT * FROM nlt_user;
DROP TABLE nlt_user;
RENAME TABLE nlt_user_new TO nlt_user;
Try this :
delete n1 from nlt_user n1
inner join nlt_user n2 on n1.e-mail=n2.e-mail and n1.id>n2.id;
This will keep record with minimum ID value of duplicates and deletes remaining duplicate records
The rank function can be employed to retain only the unique values
1:Create a new table which contains only unique values
Example: nlt_user_unique
CREATE TABLE nlt_user_unique AS
(SELECT * FROM
(SELECT A.*,RANK() OVER (PARTITION BY email ORDER BY email) RNK
FROM nlt_user A)
where RNK=1)
2:Truncate the orignal table containing duplicates
truncate table nlt_user
3:Insert the unique rows from the table created in step 1 to your table nlt_user
INSERT INTO nlt_user()
SELECT email from nlt_user_unique;
We have a table business_users with a user_id and business_id and we have duplicates.
How can I write a query that will delete all duplicates except for one?
Completely identical rows
If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.
CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.
Introducing a unique constraint
If you only care about pairs of user_id and business_id, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.
CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.
One-shot removal
If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following:
DELETE FROM business_users WHERE id NOT IN
(SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);
A similar idea was previously suggested by this answer.
If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:
CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;
If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE line from the previous section.
Since you have a primary key, you can use that to pick which rows to keep:
delete from business_users
where id not in (
select id from (
select min(id) as id -- Make a list of the primary keys to keep
from business_users
group by user_id, business_id -- Group by your duplicated row definition
) as a -- Derived table to force an implicit temp table
);
In this way, you won't need to create/drop temp tables and such (except the implicit one).
You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again.
I have a huge table of products but there are lot of duplicate entries. The table has more than10 Thousand entries and I want to remove the duplicate entries in it without manually finding and deleting it. Please let me know if you can provide me a solution for this
You could use SELECT DISTINCT INTO TempTable, drop the original table, and then rename the temp one.
You should also add primary and unique keys to avoid this sort of thing in the future.
for full row duplicates try this.
select distinct * into mytable_tmp from mytable
drop table mytable
alter table mytable_tmp rename mytable
Seems the below statements will help you in resolving your requirements.
if the table(foo) has primary key field
First step
store key values in temporary table, give your unique conditions in group by clause
if you want to delete the duplicate email id, give email id in group by clause and give the primary key name in
select clause like either min(primarykey) or max(primarykey)
CREATE TEMPORARY TABLE temptable AS SELECT min( primarykey ) FROM foo GROUP BY uniquefields;
Second step
call the below delete statement and give the table name and primarykey columns
DELETE FROM foo WHERE primarykey NOT IN (SELECT * FROM temptable );
execute both the query combined in your query analyser or db tool.
If the table(foo) doesn't have a primary key filed
step 1
CREATE TABLE temp_table AS SELECT * FROM foo GROUP BY field or fileds;
step 2
DELETE FROM foo;
step 3
INSERT INTO foo select * from temp_table;
There are different solutions to remove duplicate rows and it fully depends upon your scenario to make use of one from them. The simplest method is to alter the table making the Unique Index on Product Name field:
alter ignore table products add unique index `unique_index` (product_name);
You can remove the index after getting all the duplicate rows deleted:
alter table products drop index `unique_index`;
Please let me know if this resolves the issue. If not I can give you alternate solutions for that.
You can add more than one column to a group by. I.E.
SELECT * from tableName GROUP BY prod_name HAVING count(prod_name) > 1
That will show the unique products. You can write it dump it to new table and drop the existing one.
I have a table in my database which has duplicate records that I want to delete. I don't want to create a new table with distinct entries for this. What I want is to delete duplicate entries from the existing table without the creation of any new table. Is there any way to do this?
id action
L1_name L1_data
L2_name L2_data
L3_name L3_data
L4_name L4_data
L5_name L5_data
L6_name L6_data
L7_name L7_data
L8_name L8_data
L9_name L9_data
L10_name L10_data
L11_name L11_data
L12_name L12_data
L13_name L13_data
L14_name L14_data
L15_name L15_data
see these all are my fields :
id is unique for every row.
L11_data is unique for respective action field.
L11_data is having company names while action is having name of the industries.
So in my data I'm having duplicate name of the companies in L11_data for their respective industries.
What I want is to have is unique name and other data of the companies in the particular industry stored in action. I hope I have stated my problem in a way that you people can understand it.
Yes, assuming you have a unique ID field, you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their group of values.
Example query:
DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)
Notes:
I freely chose "Table" and "ID" as representative names
The list of fields ("Field1, Field2, ...") should include all fields except for the ID
This may be a slow query depending on the number of fields and rows, however I expect it would be okay compared to alternatives
EDIT: In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
ALTER IGNORE TABLE 'table' ADD UNIQUE INDEX(your cols);
Duplicates get NULL, then you can delete them
DELETE
FROM table_x a
WHERE rowid < ANY (
SELECT rowid
FROM table_x b
WHERE a.someField = b.someField
AND a.someOtherField = b.someOtherField
)
WHERE (
a.someField,
a.someOtherField
) IN (
SELECT c.someField,
c.someOtherField
FROM table_x c
GROUP BY c.someField,
c.someOtherField
HAVING count(*) > 1
)
In above query the combination of someField and someOtherField must identify the duplicates distinctively.