Delete all duplicate rows in mysql - mysql

i have MySQL data which is imported from csv file and have multiple duplicate files on it,
I picked all non duplicates using Distinct feature.
Now i need to delete all duplicates using SQL command.
Note i don't need any duplicates i just need to fetch only noon duplicates
thanks.
for example if number 0123332546666 is repeated 11 time i want to delete 12 of them.
Mysql table format
ID, PhoneNumber

Just COUNT the number of duplicates (with GROUP BY) and filter by HAVING. Then supply the query result to DELETE statement:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
http://sqlfiddle.com/#!9/a012d21/1
complete fiddle:
schema:
CREATE TABLE Table1
(`ID` int, `PhoneNumber` int)
;
INSERT INTO Table1
(`ID`, `PhoneNumber`)
VALUES
(1, 888),
(2, 888),
(3, 888),
(4, 889),
(5, 889),
(6, 111),
(7, 222),
(8, 333),
(9, 444)
;
delete query:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);

you could try using a left join with the subquery for min id related to each phonenumber ad delete where not match
delete m
from m_table m
left join (
select min(id), PhoneNumber
from m_table
group by PhoneNumber
) t on t.id = m.id
where t.PhoneNumber is null
otherwise if you want delete all the duplicates without mantain at least a single row you could use
delete m
from m_table m
INNER join (
select PhoneNumber
from m_table
group by PhoneNumber
having count(*) > 1
) t on t.PhoneNumber= m.PhoneNumber

Instead of deleting from the table, I would suggest creating a new one:
create table table2 as
select min(id) as id, phonenumber
from table1
group by phonenumber
having count(*) = 1;
Why? Deleting rows has a lot of overhead. If you are bringing the data in from an external source, then treat the first landing table as a staging table and the second as the final table.

Related

Delete duplicate entries while keeping one

I have a table but it has no unique ID or primary key.
It has 3 columns in total.
name
user_id
role_id
ben
1
2
ben
1
2
sam
1
3
I'd like to remove one entry with the name Ben.
So output would look like this
name
user_id
role_id
ben
1
2
sam
1
3
Most of the examples shows deleting duplicate entries with ID or primary key. However how would I retain one entry whilest removing the other ones?
Using the following query I was able to get duplicated rows
SELECT name, user_id, role_id, count(*) FROM some_table
GROUP BY name, user_id, role_id
HAVING count(*) > 1
To clarify, I am looking to delete these rows.
Prefer not creating a new table.
If you don't have to worry about other users accessing the table -
CREATE TABLE `new_table` AS
SELECT DISTINCT `name`, `user_id`, `role_id`
FROM `old_table`;
RENAME TABLE
`old_table` TO `backup`,
`new_table` TO `old_table`;
Or you could use your duplicates query to output lots of single row delete queries -
SELECT
`name`,
`user_id`,
`role_id`,
COUNT(*),
CONCAT('DELETE FROM some_table WHERE name=\'', `name`, '\' AND user_id=\'', `user_id`, '\' AND role_id=\'', `role_id`, '\' LIMIT 1;') AS `delete_stmt`
FROM `some_table`
GROUP BY `name`, `user_id`, `role_id`
HAVING COUNT(*) > 1;
Or you could temporarily add a SERIAL column and then remove it after the delete -
ALTER TABLE `some_table` ADD COLUMN `temp_id` SERIAL;
DELETE `t1`.*
FROM `some_table` `t1`
LEFT JOIN (
SELECT MIN(`temp_id`) `min_temp_id`
FROM `some_table`
GROUP BY `name`, `user_id`, `role_id`
) `t2` ON `t1`.`temp_id` = `t2`.`min_temp_id`
WHERE `t2`.`min_temp_id` IS NULL;
ALTER TABLE `some_table` DROP COLUMN `temp_id`;
Note that you are not saving anything by not having a primary key; mysql (at least with innodb) requires a primary key and will create a hidden one if you do not have one. So I would first add a primary key:
alter table some_table add id serial primary key;
Then you can easily remove duplicates with:
delete a from some_table a join some_table b on a.name=b.name and a.user_id=b.user_id and a.role_id=b.role_id and b.id < a.id;
I would take the duplicate records and put them into another table.
SELECT
name,
user_id,
role_id
INTO some_new_table
FROM some_table
GROUP BY name, user_id, role_id
HAVING count(*) > 1
Then you can delete those records from your source table
DELETE a
FROM some_table a
INNER JOIN some_new_table b
ON a.name = b.name
AND a.user_id = b.user_id
AND a.role_id = b.role_id
Finally you can then insert the deduped records back into your table.
INSERT INTO some_table
SELECT
name,
user_id,
role_id
FROM some_new_table
If the volume of dupes is very large you could also just create a new table with the deduped data. Truncate \ Drop the old table and then Insert \ Rename from the new table.

MYSQL SELECT to DELETE specified twice

There is a problem when I change SELECT to DELETE:
DELETE
FROM mitarbeiter
WHERE mitarbeiter.pers_nr=
(SELECT mitarbeiter.pers_nr
FROM mitarbeiter
LEFT JOIN kunde ON kunde.betreuer=mitarbeiter.pers_nr
group by mitarbeiter.pers_nr
order by count(*)
limit 1);
The error says...
Table 'mitarbeiter' is specified twice, both as a target for 'DELETE' and as a separate source for data
How can I change that?
You can replace the logic with a JOIN:
DELETE m
FROM mitarbeiter m JOIN
(SELECT m2.pers_nr
FROM mitarbeiter m2 LEFT JOIN
kunde k
ON k.betreuer = m2.pers_nr
GROUP BY m2.pers_nr
ORDER BY COUNT(*)
LIMIT 1
) m2
ON m.pers_nr = m2.pers_nr;
That said, the logic can probably be simplified, but it is strange. You are using COUNT(*) with a LEFT JOIN, so even non-matches in the second table get a count of 1. Knowing your intentions -- with sample data and desired results -- would help others figure out if another approach would work better.
Instead of where and subquery you could try usin a join based and a table as alias for the same subquery
DELETE m.*
FROM mitarbeiter m
INNER JOIN (
SELECT mitarbeiter.pers_nr
FROM mitarbeiter
LEFT JOIN kunde ON kunde.betreuer=mitarbeiter.pers_nr
group by mitarbeiter.pers_nr
order by count(*)
limit 1
) t ON t.pers_nr = m.pers_nr
Another workaround you could try is saving your subquery in a temporary table, then calling on rows from that table.
DROP TABLE IF EXISTS tempTable;
CREATE TEMPORARY TABLE tempTable
SELECT mitarbeiter.pers_nr
FROM mitarbeiter
LEFT JOIN kunde
ON kunde.betreuer=mitarbeiter.pers_nr
GROUP BY mitarbeiter.pers_nr
ORDER BY COUNT(*)
LIMIT 1;
DELETE FROM mitarbeiter
WHERE mitarbeiter.pers_nr
IN (
SELECT * FROM tempTable
);
DROP TABLE tempTable; -- cleanup
Warning: this may take a much longer time and/or larger space than using JOIN as in the other solutions.
Welcome to Stack Overflow! Does this help point you in the right direction?
I often find that result visualization is easier for my feeble brain when I use Common Table Expressions
Please note that MySQL prior to version 8.0 doesn't support the WITH clause
CREATE TABLE IF NOT EXISTS mitarbeiter (
pers_nr varchar(10),
PRIMARY KEY (pers_nr)
) DEFAULT CHARSET=utf8;
INSERT INTO mitarbeiter (pers_nr) VALUES
('Scum'),
('Worker'),
('Manager'),
('President');
CREATE TABLE IF NOT EXISTS kunde (
kunde_id int(3) NOT NULL,
betreuer varchar(10) NOT NULL,
PRIMARY KEY (kunde_id)
) DEFAULT CHARSET=utf8;
INSERT INTO kunde (kunde_id, betreuer) VALUES
(1, 'Scum'),
(2, 'Worker'),
(3, 'Worker'),
(4, 'Manager'),
(5, 'Manager'),
(6, 'Manager'),
(7, 'President'),
(8, 'President'),
(9, 'President'),
(10, 'President');
WITH s1
AS
(SELECT betreuer
, count(1) AS kunde_count_by_pers_nr -- JJAUSSI: find the kunde count by pers_nr
FROM kunde
GROUP BY betreuer
),
s2
AS
(SELECT MIN(kunde_count_by_pers_nr) AS kunde_count_by_pers_nr_min -- JJAUSSI: find the lowest kunde_count
FROM s1),
s3
AS
(SELECT s1.betreuer
FROM s1 INNER JOIN s2
ON s1.kunde_count_by_pers_nr = s2.kunde_count_by_pers_nr_min -- JJAUSSI: Retrieve all the betreuer values with the lowest kunde_count
)
SELECT * -- JJAUSSI: Test this result and see if it contains the records you expect to delete
FROM s3;
--DELETE -- JJAUSSI: Once you are confident in the results from s3, this DELETE can work
-- FROM mitarbeiter
-- WHERE pers_nr IN (SELECT betreuer
-- FROM s3);
SELECT * -- JJAUSSI: Check for the desired results (we successfully got rid of 'Scum'!)
FROM mitarbeiter;

SELECT all EXCEPT results in a subquery

I have a query that joins several tables (3 or 4) and gets me results as expected.
SELECT DISTINCT test_title, stt_id FROM student_tests
LEFT JOIN student_test_answers ON sta_stt_num = stt_id
JOIN tests ON stt_test_id = test_id
WHERE student_test_answer_id IS NULL
I have another query that shows another set of data, it basically is this:
SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1
So basically I want to NOT include the results of this second query in the first one. the test_id would be the joining fields.
I have tried a WHERE NOT EXISTS ( -the above query -) but that returns no results which is not correct. I also tried 'NOT IN ( )'
Is there a better way of doing this?
Try something like this:
(SELECT test_id, COUNT(*) AS theCount FROM tests
JOIN test_questions ON test_id= tq_test_id
WHERE type= 'THE_TYPE'
GROUP BY test_id
HAVING theCount = 1) outer
LEFT JOIN (
[OtherQuery]
) a ON outer.test_id = a.test_id
WHERE a.test_id IS NULL
Here is my answer. Left outer Join gives you the participant (test). If there are no matches in test_questions then it'll still return the test rows, but null for test_questions. So if you then look for any test_question.test_id that is null, you will get what you are looking for.
I would also be specific in your count clause and not do a count(*) just to ensure that mysql knows what you truly want to count.
create database test;
use test;
create table test
(
test_id int,
`the_type` varchar(20)
);
create table test_questions
(
test_question_id int,
test_id int,
`the_type` varchar(20)
);
insert into test values (1, 'history');
insert into test values (2, 'chemistry');
insert into test values (3, 'reading');
insert into test_questions values (1, 1, 'hard question');
insert into test_questions values (2, 1, 'medium question');
insert into test_questions values (3, 2, 'hard question');
insert into test_questions values (4, 2, 'easy question');
select * from test;
select * from test_questions;
select t.test_id, count(distinct t.test_id)
from test t
left outer join test_questions tq on tq.test_id = t.test_id
where
tq.test_id is null
group by
t.test_id
As written in the comment you should be able to do it like this:
SELECT
DISTINCT test_title,
olc_stt_i_num
FROM
student_tests
LEFT JOIN
student_test_answers
ON olc_sta_i_stt_num = olc_stt_i_num
INNER JOIN
ol_class_tests
ON stt_test_num = test_num
WHERE
student_test_answer_id IS NULL
-- added this: replace test_id with real column
AND ***test_id*** NOT IN (
SELECT
test_id
FROM
tests
JOIN
test_questions
ON test_id= tq_test_id
WHERE
type= 'THE_TYPE'
GROUP BY
test_id
HAVING
COUNT(*) = 1
)
What exactly is the second query? I only see one query here, and no sub-query. Also, a sqlfiddle of your precise schema would be helpful.
Anyhow, I think you want some sort of left excluding join. It looks something like this:
select test.test_id, count(*) as theCount
from tests test
join test_questions tq
on tq.test_id = test.test_id
left join tests excluded_test
on excluded_test.test_id = test.test_id
where tq.type = 'THE_TYPE'
and << Some condition that explains what excluded_test is >>
and excluded_test.test_id is null
EDIT: Yeah, there's definitely a lot of details missing from the original question (which had in some ways been mended), and which are still missing. Knowing the full table-structure of your example is important here, so it is difficult to provide a good concrete answer.

MySQL - Update with COUNT, is it possible?

I'm attempting to update a MySQL table to show column name 'processed' as '2' if there is duplicate entries for 'name' and 'address_1', but it's not working - as usual I think I'm just being a bit of a moron..
Here's what I'm trying
UPDATE `records`
SET `processed`='2', `count` = (SELECT COUNT(`user`)
FROM `records`
WHERE `name`<>''
AND `address_1`<>'')
WHERE `count`=> '1';
Basically, if there's more than one 'name' and 'address_1' then the 'processed' field needs updating to '2'..
You could use a query like this one to return duplicated names and addresses:
SELECT name, address_1, COUNT(*) cnt
FROM records
GROUP BY name, address_1
HAVING COUNT(*)>1
and then join this query to the records table, and update the column processed to 2 where the join succeeds:
UPDATE
records INNER JOIN (SELECT name, address_1, COUNT(*) cnt
FROM records
GROUP BY name, address_1
HAVING COUNT(*)>1) duplicates
ON records.name = duplicates.name
AND records.address_1=duplicates.address_1
SET
`processed`='2',
`count` = duplicates.cnt
WHERE
records.`name`<>''
AND records.`address_1`<>''

WITH clause in MySQL?

Does MySQL support common table expressions? For example in Oracle there's the WITH clause? :
WITH aliasname
AS
( SELECT COUNT(*) FROM table_name )
SELECT COUNT(*) FROM dept,aliasname
SELECT t.name,
t.num
FROM TABLE t
JOIN (SELECT c.id,COUNT(*) 'num1'
FROM TABLE1 c
WHERE c.column = 'a'
GROUP BY c.id) ta1 ON ta1.id = t.id
JOIN (SELECT d.id,COUNT(*) 'num2'
FROM TABLE2 d
WHERE d.column = 'a'
GROUP BY d.id) ta2 ON ta2.id = t.id
One way is to use a subquery:
SELECT COUNT(*)
FROM dept,
(
SELECT COUNT(*)
FROM table_name
) AS aliasname
Note that the , between the two tables will cross join the two tables the same as in your query you posted. IF there is any relation between them you can JOIN them instead.
No, MySQL does not support Common Table Expressions (CTE). So instead of using WITH tablealias as (....), you will have to do a subquery.
For example,
WITH totalcount AS
(select userid, count(*) as tot from logins group by userid)
SELECT a.firstname, a.lastname, b.tot
FROM users a
INNER JOIN
totalcount b
on a.userid = b.userid
can be re-written in MySQL as
SELECT a.firstname, a.lastname, b.totalcount
FROM users a
INNER JOIN
(select userid, count(*) as tot from logins group by userid) b
on a.userid = b.userid
So let's talk about WITH clause .
WITH clause and INNER JOIN otherwise JOIN are a kind of same , but WITH clause gives you much more latitude especially in WHERE clause ;
I am going to make a view that'll get values like count of users , user name and etc.
First (Creating our tables users and inserted_users) :
inserted_users table :
CREATE TABLE users (id BIGINT(10) AUTO INCEREMENT PRIMARY KEY , name VARCHAR(50))
users table :
CREATE TABLE users (id BIGINT(10) AUTO INCEREMENT PRIMARY KEY , name VARCHAR(50) , gender TINYINT(1))
Second (Inserting some values to work with) :
users table :
INSERT INTO users (name,gender) VALUES ('Abolfazl M' , 1)
I don't want to insert into inserted_users by query , but I want to add a TRUGGER which will insert data automatically to users_inserted table before data be inserted into users table.
Third (Creating trigger add_uinserted) :
DELIMITER $$
CREATE TRIGGER IF NOT EXISTS add_uinserted BEFORE INSERT ON users FOR EACH ROW
BEGIN
IF NEW.name <> '' THEN
INSERT INTO users_inserted (name) VALUES (NEW.name);
ELSE
INSERT INTO users (name,gender) VALUES ('Unknown',NEW.gender);
INSERT INTO users_inserted (name) VALUES ('Unknown');
END IF;
END$$
DELIMITER ;
Run the query and the trigger will be created and at last let's create a view to give us result from a query having WITH clause .
CREATE VIEW select_users AS
WITH GetCAll AS (
SELECT u1.id As Uid ,COUNT(u1.name) AS CnAll FROM users u1
)
SELECT u1.name AS NAME,CASE
WHEN s1.gender = 1 THEN "MALE"
WHEN s1.gender = 0 THEN "FEMALE"
ELSE "UNKNOWN"
END AS GENDER,CASE
WHEN u1.id = gca.Uid THEN "INSERTED TO users_inserted"
ELSE "NOT INSERTED TO users_inserted"
END AS INSERTED,gca.CnAll FROM GetCAll AS gca INNER JOIN users u1;
After you query got ran the view will be created and by calling the view select_users the data will be shown
Last step (calling the select_users view) :
SELECT * FROM select_users
Thanks for taking a look at my answer , and that's it !!