So I have an existing MySQL users table with thousands of records in it. I have noticed duplicate records for users which is a problem that I need to address. I know that the way I need to do this is to somehow make 2 columns unique.
The duplicates are arising with records containing both the same server_id column, and also the same user_id column. These 2 columns are meant to be unique combined. So there should only ever be 1 user_id per server_id.
I have figured out how I can find these duplicates using the following query:
SELECT `server_id`, `user_id`, COUNT(*) AS `duplicates` FROM `guild_users` GROUP BY `server_id`, `user_id` HAVING `duplicates` > 1
From what I have read, I need to delete all duplicates first before I add any constraints. This is one of the things I am unsure about.
Question 1: How would I go about deleting all duplicates, but leaving 1 of each so the user still exists, just not the other duplicates.
Question 2: What is the best way of avoiding duplicates from being created? Should I create a unique constraint for both of the columns, or do something with primary keys instead?
In your table there must exist a primary key column like an id.
So you can use EXISTS to delete the duplicates and keep just 1:
delete gu from guild_users gu
where exists (
select 1 from guild_users
where
server_id = gu.server_id
and
user_id = gu.user_id
and
id > gu.id
)
After that you can create a unique constraint for the 2 columns:
alter table guild_users
add constraint un_server_user unique
(server_id, user_id);
You want to prevent this by adding a unique index:
create unique index unq_guild_users_server_user on guild_users(server_id, user_id);
If you have a primary key, you can delete the duplicates before adding the unique index:
delete g
from guild_users g left join
(select server_id, user_id, max(primary_key) as max_pk
from guild_users
group by server_id, user_id
) su
on gu.primary_key = su.max_pk
where su.max_pk is null;
Related
I have table where I wanted to store one more key for each user, following is the table,
I was trying to add app_reminder for each existing users. I did it by following
insert into users_settings (user_id, key)
select distinct user_id, 'app_reminder'
from users_settings;
Now I want to add where clause
SELECT DISTINCT user_id, 'app_reminder' WHERE key != 'app_reminder'
for preventing duplicate entries. I tried above one but it does not working.
Can someone kindly guide me about it, I would appreciate.
Thank you
One method simply uses conditional aggregation:
insert into users_settings (user_id, key)
select user_id, 'app_reminder'
from users_settings
group by user_id
having sum(key = 'app_reminder') = 0;
You might want a more generic solution. If you want to ensure that user/key pairs are never duplicated, then create a unique constraint or index on those columns:
alter table users_settings add constraint unq_users_settings_user_id_key
unique (user_id, key);
Then, you can skip inserting the rows using on duplicate key update:
insert into users_settings (user_id, key)
select distinct user_id, 'app_reminder'
from users_settings
on duplicate key update user_id = values(user_id);
The update does nothing, because the value is the same. MySQL skips doing the insert and does not return an error.
I have a address table which is referenced from 6 other tables (sometimes multiple tables). Some of those tables have around half a million records (and the address table around 750000 records). I want to have a periodical query running which deletes all records that are not referenced from any of the tables.
The following sub-queries is not a option, because the query never finishes - the scope is too big.
delete from address where address_id not in (select ...)
and not in (select ...) and not in (select ...) ...
What I was hoping was that I could use the foreign key constraint and I could simply delete all records for which the foreign key constraint does not stop me (because there is no reference to the table). I could not find a way to do this (or is there?). Anybody another good idea to tackle this problem?
You can try this ways
DELETE
address
FROM
address
LEFT JOIN other_table ON (address.id = other_table.ref_field)
LEFT JOIN other_table ON (address.id = other_table2.ref_field)
WHERE
other_table.id IS NULL AND other_table2.id IS NULL
OR
DELETE
FROM address A
WHERE NOT EXISTS (
SELECT 1
FROM other_table B
WHERE B.a_key = A.id
)
I always use this:
DELETE FROM table WHERE id NOT IN (SELECT id FROM OTHER table)
I'd do this by first creating a TEMPORARY TABLE (t) that is a UNION of the IDs in the 6 referencing tables, then run:
DELETE x FROM x LEFT JOIN t USING (ID) WHERE x.ID IS NULL;
Where x is the address table.
See 'Multiple-table syntax' here:
http://dev.mysql.com/doc/refman/5.0/en/delete.html
Obviously, your temporary table should have its PRIMARY KEY on ID. It may take some time to query and join, but I can't see a way round it. It should be optimized, unlike the multiple sub-query version.
I have a table called prospective_shop and one of the column name is 'username'. Username is not set as a primary key, but I wanted to remove all rows that have duplicate username. How can I do this the fastest way?
I tried doing the following:
ALTER IGNORE TABLE `prospective_shop` ADD UNIQUE INDEX idx_name (username);
but then it gives me:
Duplicate entry 'calista_shopp' for key 'idx_name'
delete from prospective_shop
where id not in
(
select * from
(
select min(id)
from prospective_shop
group by username
) x
)
You can just delete all records that are not the first ones for every unique username. By selecting min(id)` for every username group you make sure not to delete those but all the rest.
In MySQL you can't delete from a table you are selecting from at the same time. You can trick the engine by using another subselect as I did. The x is just an alias name for the temp table.
I have a problem with my queries in MySQL. My table has 4 columns and it looks something like this:
id_users id_product quantity date
1 2 1 2013
1 2 1 2013
2 2 1 2013
1 3 1 2013
id_users and id_product are foreign keys from different tables.
What I want is to delete just one row:
1 2 1 2013
Which appears twice, so I just want to delete it.
I've tried this query:
delete from orders where id_users = 1 and id_product = 2
But it will delete both of them (since they are duplicated). Any hints on solving this problem?
Add a limit to the delete query
delete from orders
where id_users = 1 and id_product = 2
limit 1
All tables should have a primary key (consisting of a single or multiple columns), duplicate rows doesn't make sense in a relational database. You can limit the number of delete rows using LIMIT though:
DELETE FROM orders WHERE id_users = 1 AND id_product = 2 LIMIT 1
But that just solves your current issue, you should definitely work on the bigger issue by defining primary keys.
You need to specify the number of rows which should be deleted. In your case (and I assume that you only want to keep one) this can be done like this:
DELETE FROM your_table WHERE id_users=1 AND id_product=2
LIMIT (SELECT COUNT(*)-1 FROM your_table WHERE id_users=1 AND id_product=2)
Best way to design table is add one temporary row as auto increment and keep as primary key. So we can avoid such above issues.
There are already answers for Deleting row by LIMIT. Ideally you should have primary key in your table. But if there is not.
I will give other ways:
By creating Unique index
I see id_users and id_product should be unique in your example.
ALTER IGNORE TABLE orders ADD UNIQUE INDEX unique_columns_index (id_users, id_product)
These will delete duplicate rows with same data.
But if you still get an error, even if you use IGNORE clause, try this:
ALTER TABLE orders ENGINE MyISAM;
ALTER IGNORE TABLE orders ADD UNIQUE INDEX unique_columns_index (id_users, id_product)
ALTER TABLE orders ENGINE InnoDB;
By creating table again
If there are multiple rows who have duplicate values, then you can also recreate table
RENAME TABLE `orders` TO `orders2`;
CREATE TABLE `orders`
SELECT * FROM `orders2` GROUP BY id_users, id_product;
You must add an id that auto-increment for each row, after that you can delet the row by its id.
so your table will have an unique id for each row and the id_user, id_product ecc...
We have a table business_users with a user_id and business_id and we have duplicates.
How can I write a query that will delete all duplicates except for one?
Completely identical rows
If you want to avoid completely identical rows, as I understood your question at first, then you can select unique rows to a separate table and recreate the table data from that.
CREATE TEMPORARY TABLE tmp SELECT DISTINCT * FROM business_users;
DELETE FROM business_users;
INSERT INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
Be careful if there are any foreign key constraints referencing this table, though, as the temporary deletion of rows might lead to cascaded deletions elsewhere.
Introducing a unique constraint
If you only care about pairs of user_id and business_id, you probably want to avoid introducing duplicates in the future. You can move the existing data to a temporary table, add a constraint, and then move the table data back, ignoring duplicates.
CREATE TEMPORARY TABLE tmp SELECT * FROM business_users;
DELETE FROM business_users;
ALTER TABLE business_users ADD UNIQUE (user_id, business_id);
INSERT IGNORE INTO business_users SELECT * FROM tmp;
DROP TABLE tmp;
The above answer is based on this answer. The warning about foreign keys applies just as it did in the section above.
One-shot removal
If you only want to execute a single query, without modifying the table structure in any way, and you have a primary key id identifying each row, then you can try the following:
DELETE FROM business_users WHERE id NOT IN
(SELECT MIN(id) FROM business_users GROUP BY user_id, business_id);
A similar idea was previously suggested by this answer.
If the above request fails, because you are not allowed to read and delete from a table in the same step, you can again use a temporary table:
CREATE TEMPORARY TABLE tmp
SELECT MIN(id) id FROM business_users GROUP BY user_id, business_id;
DELETE FROM business_users WHERE id NOT IN (SELECT id FROM tmp);
DROP TABLE tmp;
If you want to, you can still introduce a uniqueness constraint after cleaning the data in this fashion. To do so, execute the ALTER TABLE line from the previous section.
Since you have a primary key, you can use that to pick which rows to keep:
delete from business_users
where id not in (
select id from (
select min(id) as id -- Make a list of the primary keys to keep
from business_users
group by user_id, business_id -- Group by your duplicated row definition
) as a -- Derived table to force an implicit temp table
);
In this way, you won't need to create/drop temp tables and such (except the implicit one).
You might want to put a unique constraint on user_id, business_id so you don't have to worry about this again.