Basically, there is an abstract task on removing duplicates from db, which linked by it's id with several other tables...
I need to assign for each repeating row in table a unique group_id as max(id) of existing row. Please help
My question in picture
https://i.stack.imgur.com/CVYG1.png
You can group the entities (to find the group ids) at first and then update the group_id of entities.
Example:
UPDATE `t`,
(SELECT `name`, `surname`, MAX(`id`) AS `group_id` FROM `t`
WHERE 1 GROUP BY CONCAT(`name`, `surname`)) AS `t1`
SET `t`.`group_id` = `t1`.`group_id`
WHERE `t`.`name` = `t1`.`name` AND `t`.`surname` = `t1`.`surname`
Related
I have a table but it has no unique ID or primary key.
It has 3 columns in total.
name
user_id
role_id
ben
1
2
ben
1
2
sam
1
3
I'd like to remove one entry with the name Ben.
So output would look like this
name
user_id
role_id
ben
1
2
sam
1
3
Most of the examples shows deleting duplicate entries with ID or primary key. However how would I retain one entry whilest removing the other ones?
Using the following query I was able to get duplicated rows
SELECT name, user_id, role_id, count(*) FROM some_table
GROUP BY name, user_id, role_id
HAVING count(*) > 1
To clarify, I am looking to delete these rows.
Prefer not creating a new table.
If you don't have to worry about other users accessing the table -
CREATE TABLE `new_table` AS
SELECT DISTINCT `name`, `user_id`, `role_id`
FROM `old_table`;
RENAME TABLE
`old_table` TO `backup`,
`new_table` TO `old_table`;
Or you could use your duplicates query to output lots of single row delete queries -
SELECT
`name`,
`user_id`,
`role_id`,
COUNT(*),
CONCAT('DELETE FROM some_table WHERE name=\'', `name`, '\' AND user_id=\'', `user_id`, '\' AND role_id=\'', `role_id`, '\' LIMIT 1;') AS `delete_stmt`
FROM `some_table`
GROUP BY `name`, `user_id`, `role_id`
HAVING COUNT(*) > 1;
Or you could temporarily add a SERIAL column and then remove it after the delete -
ALTER TABLE `some_table` ADD COLUMN `temp_id` SERIAL;
DELETE `t1`.*
FROM `some_table` `t1`
LEFT JOIN (
SELECT MIN(`temp_id`) `min_temp_id`
FROM `some_table`
GROUP BY `name`, `user_id`, `role_id`
) `t2` ON `t1`.`temp_id` = `t2`.`min_temp_id`
WHERE `t2`.`min_temp_id` IS NULL;
ALTER TABLE `some_table` DROP COLUMN `temp_id`;
Note that you are not saving anything by not having a primary key; mysql (at least with innodb) requires a primary key and will create a hidden one if you do not have one. So I would first add a primary key:
alter table some_table add id serial primary key;
Then you can easily remove duplicates with:
delete a from some_table a join some_table b on a.name=b.name and a.user_id=b.user_id and a.role_id=b.role_id and b.id < a.id;
I would take the duplicate records and put them into another table.
SELECT
name,
user_id,
role_id
INTO some_new_table
FROM some_table
GROUP BY name, user_id, role_id
HAVING count(*) > 1
Then you can delete those records from your source table
DELETE a
FROM some_table a
INNER JOIN some_new_table b
ON a.name = b.name
AND a.user_id = b.user_id
AND a.role_id = b.role_id
Finally you can then insert the deduped records back into your table.
INSERT INTO some_table
SELECT
name,
user_id,
role_id
FROM some_new_table
If the volume of dupes is very large you could also just create a new table with the deduped data. Truncate \ Drop the old table and then Insert \ Rename from the new table.
I want to design voting system with two tables.
First table contains candidates' index and name.
The other one contains index, voter and candidate's index whom the voter support.
One voter can support multiple candidates.
I want a sql query that shows candidates' name with number of its supporters.
So the result looks like
John 12, Bob 8, David 3...
SELECT `name`, COUNT(table2.voter) AS `count`
FROM `table1`
LEFT JOIN `table2`
ON table1.idx = table2.support
ORDER BY COUNT(table2.voter) DESC;
The above query gave only one row with total number of voter.
Can anyone give me any hints?
SELECT `name`, COUNT(table2.voter) AS `count`
FROM `table1`
LEFT JOIN `table2` ON table1.idx = table2.support
GROUP BY `name`
ORDER BY COUNT(table2.voter) DESC;
You were missing a group by and hence getting only the first result.
You need to GROUP BY the non-aggregate column (name), otherwise the query will default to one group; the entire result set; and pick an arbitrary name:
SELECT `name`, COUNT(table2.voter) AS `count`
FROM `table1`
LEFT JOIN `table2`
ON table1.idx = table2.support
GROUP BY `name`
ORDER BY count DESC;
You can use column aliases in an ORDER BY, so I have update this also
I have a table full of duplicate data, based on multiple columns. I came up with this query to find all the duplicated rows
select *
from polls
group by server_id, product_id, poll_date
having count(*) > 1;
How do I update these results and set the "updated_by" field to "admin".
I tried doing this, but it doesn't work for me :(
update polls
set updated_by='admin'
group by server_id, product_id, poll_date
having count(*) > 1;
Thanks for your help
You should be able to join your SELECT with the UPDATE.
UPDATE `polls` AS `p1`
INNER JOIN (
SELECT *
FROM `polls`
GROUP BY `server_id`, `product_id`, `poll_date`
HAVING COUNT(*) > 1
) AS `p2`
ON `p2`.`server_id` = `p1`.`server_id`
AND `p2`.`product_id` = `p1`.`product_id`
AND `p2`.`poll_date` = `p1`.`poll_date`
SET `p1`.`updated_by` = 'admin';
Of course it would be better to directly join on the primary key (if you have one).
We have 2 tables called : "post" and " post_extra"
summery construction of "post" table's are: id,postdate,title,description
And for post_extra they are: eid,news_id,rating,views
"id" filed in the first table is related to "news_id" to the second table.
There are more than 100,000 records on the table, that many of them are duplicated. I want to keep only one record and remove duplicate records on "post" table that have the same title, and then remove the related record on "post_extra"
I ran this query on phpmyadmin but the server was crashed. And I had to restart it.
DELETE e
FROM Post p1, Post p2, Post_extra e
WHERE p1.postdate > p2.postdate
AND p1.title = p2.title
AND e.news_id = p1.id
How can I do this?
Suppose you have table named as 'tables' in which you have the duplicate records.
Firstly you have to do group by column on which you want to delete duplicate.But I am not doing it with group by.I am writing self join instead of writing nested query or creating temporary table.
SELECT * FROM `names` GROUP BY title, id having count(title) > 1;
This query return number of duplicate records with their title and id.
You don't need to create the temporary table in this case.
To Delete duplicate except one record:
In this table it should have auto increment column. The possible solution that I've just come across:
DELETE t1 FROM tables t1, tables t2 WHERE t1.id > t2.id AND t1.title = t2.title
if you want to keep the row with the lowest auto increment id value OR
DELETE t1 FROM tables t1, tables t2 WHERE t1.id < t2.id AND t1.title = n2.title
if you want to keep the row with the highest auto increment id value.
You can cross check your solution,by selecting the duplicate records again by given query:
SELECT * FROM `tables` GROUP BY title, id having count(title) > 1;
If it return 0 result, then you query is successful.
This will keep entries with the lowest id for each title
DELETE p, e
FROM Post p
left join Post_extra e on e.news_id = p.id
where id not in
(
select * from
(
select min(id)
from post
group by title
) x
)
SQLFiddle demo
You can delete duplicate record by creating a temporary table with unique index on the fields that you need to check for the duplicate value
then issue
Insert IGNORE into select * from TableWithDuplicates
You will get a temporary table without duplicates .
then delete the records from the original table (TableWithDuplicates) by JOIN the tables
Should be something like
CREATE TEMPORARY TABLE `tmp_post` (
`id` INT(10) NULL,
`postDate` DATE NULL,
`title` VARCHAR(50) NULL,
`description` VARCHAR(50) NULL, UNIQUE INDEX `postDate_title_description` (`postDate`, `title`, `description`) );
INSERT IGNORE INTO tmp_post
SELECT id,postDate,title,description
FROM post ;
DELETE post.*
FROM post
LEFT JOIN tmp_post tmp ON tmp.id = post.id
WHERE tmp.id IS NULL ;
Sorry I didn't tested this code
I came across a scenario where I need to "upgrade" a table with data I obtain from another query. I am adding missing values so I will need to insert, but I cant seem to get it right.
The destination table is the following
CREATE TABLE `documentcounters` (
`UID` int,
`DataChar`,
`SeqNum` ,
`LastSignature`,
`DocumentType`,
`SalesTerminal`,
`Active`,
PRIMARY KEY (`UID`)
) ENGINE=InnoDB
and I am trying to do something like
INSERT INTO documentcounters
SELECT Q1.in_headers, -1,NULL, 17,0,0 FROM
(SELECT DISTINCT(DocumentSeries) as in_headers FROM transactionsheaders )AS Q1
LEFT JOIN
(SELECT DISTINCT(DataChar) as in_counters FROM documentcounters)AS Q2
ON Q1.in_headers=Q2.in_counters WHERE Q2.in_counters IS NULL;
I left UID out because I want the insert statement to create it, but I get a "Column count doesn't match" which makes sense (darn!)
Doing something like
INSERT INTO `documentcounters`
(`DataChar`,`SeqNum`,`LastSignature`,`DocumentType`,`SalesTerminal`,`Active`)
VALUES
(
(SELECT Q1.in_headers FROM
(SELECT DISTINCT(DocumentSeries) as in_headers FROM transactionsheaders )AS Q1
LEFT JOIN
(SELECT DISTINCT(DataChar) as in_counters FROM documentcounters)AS Q2
ON Q1.in_headers=Q2.in_counters WHERE Q2.in_counters IS NULL),-1,NULL,17,0,0
);
yields a "Subquery returns more than 1 row" error.
Any ideas how I can make this work?
Cheers
INSERT INTO `documentcounters`
(`DataChar`,`SeqNum`,`LastSignature`,`DocumentType`,`SalesTerminal`,`Active`)
SELECT Q1.in_headers, -1,NULL, 17,0,0 FROM
(SELECT DISTINCT(DocumentSeries) as in_headers FROM transactionsheaders )AS Q1
LEFT JOIN
(SELECT DISTINCT(DataChar) as in_counters FROM documentcounters)AS Q2
ON Q1.in_headers=Q2.in_counters WHERE Q2.in_counters IS NULL;
This will work if UID is defined as auto_increment.
If you want the INSERT to create the UID values, then UID must be defined as an auto-incrementing column.
CREATE TABLE `documentcounters` (
`UID` INT NOT NULL AUTO_INCREMENT,
...