duplicate database entries - mysql

I've accidentally managed to add duplicate entries into my database. The database contains a list of telephone numbers and they are routed via the information contained in the value field. The id field is unique per entry, and the UUID and username fields should be identical but shouldn't exist in the table more than once.
Data has been blanked in the screenshot for data protection.
The following command allowed me to identify I had duplicate entries which can be seen in the screenshot above.
select * uuid, count(*) from usr_preferences group by uuid having count(*) > 1;
I'm after some help on how I could delete entries where the UUID count is more than one but one entry must remain. deleting the duplicate UUID with the highest id number would be preferred.
Is there a way to display the results before deleting them?
MySQL version - mysql Ver 14.14 Distrib 5.7.38-41, for Linux (x86_64) using 6.2
Thanks

Could you give the following bit of code a go? Please make sure you have the database backed up before running this.
DELETE b FROM `test` a, `test` b where b.uuid = a.uuid and b.id > a.id;
I've expanded on your text data to make sure it will remove both duplicates and triplicates leaving the lowest ID. You can find my testing at this DB Fiddle.
https://www.db-fiddle.com/f/sUr6V6UP9tZ1Ya8eESid33/0
Hope this sorts you issue.

Try the following for MySQL v5.7:
set #rn=0;
set #uuid=null;
delete from usr_preferences where id in
(
select D.id
from
(
select id, uuid,
case
when #uuid <> uuid then
#rn:=1
else
#rn:=#rn+1
end as rn,
#uuid:=uuid
from usr_preferences order by id,uuid
) D
where D.rn>1
);
Select * From usr_preferences;
See a demo from db-fiddle.
Important Note:
Test the query before using it on your table, and take a backup of your table before running this query on it.
For MySQL v8.0 and above you may try the following:
with cte as
(
select id, row_number() over (partition by uuid order by id) as rn
from usr_preferences
)
delete U From
usr_preferences U join cte C
On U.id = C.id
where C.rn > 1;

Related

What sql query to use for only deleting duplicate results for wp_comments table?

I need to finish the select query below. The query shows me the count of comments with the same comment_id.I just ultimately want to delete the duplicates and leave the non duplicates alone.This is a wordpress database
screenshot of my current query results
SELECT `comment_ID`, `comment_ID`, count(*) FROM `wp_comments` GROUP BY `comment_ID` HAVING COUNT(*) > 1 ORDER BY `count(*)` ASC
example of 2 entries I need to delete one
First back up your bad table in case you goof something up.
CREATE TABLE wp_commments_bad_backup SELECT * FROM wp_comments;
Do you actually have duplicate records here (duplicate in all columns) ? If so, try this
CREATE TABLE wp_comments_deduped SELECT DISTINCT * FROM wp_comments;
RENAME TABLE wp_comments TO wp_comments_not_deduped;
RENAME TABLE wp_comments_deduped TO wp_comments;
If they don't have exactly the same contents and you don't care which contents you keep from each pair of duplicate rows, try something like this:
CREATE TABLE wp_comments_deduped
SELECT comment_ID,
MAX(comment_post_ID) comment_post_ID,
MAX(comment_author) comment_author,
MAX(comment_author_email) comment_author_email,
MAX(comment_author_url) comment_author_url,
MAX(comment_author_IP) comment_author_IP,
MAX(comment_date) comment_date,
MAX(comment_date_gmt) comment_date_gmt,
MAX(comment_content) comment_content,
MAX(comment_karma) comment_karma,
MAX(comment_approved) comment_approved,
MAX(comment_agent) comment_agent,
MAX(comment_type) comment_type,
MAX(comment_parent) comment_parent,
MAX(user_id) user_id
FROM wp_comments
GROUP BY comment_ID;
RENAME TABLE wp_comments TO wp_comments_not_deduped;
RENAME TABLE wp_comments_deduped TO wp_comments;
Then you'll need to doublecheck whether your deduplicating worked:
SELECT comment_ID, COUNT(*) num FROM wp_comments GROUP BY comment_ID;
Then, once you're happy with it, put back WordPress's indexes.
Pro tip: Use a plugin like Duplicator when you migrate from one WordPress setup to another; its authors have sorted out all this data migration for you.
I would recommand add a unique key to the table make it auto incremental call it tempId , so you would be able to to distinguish between one duplicate set, use below query to remove duplicate copies and at the end remove that '`tempid' column:
DELETE FROM `wp_comments`
WHERE EXISTS (
SELECT `comment_ID` , MIN(`tempid`) AS `tempid`
FROM `wp_comments` as `dups`
GROUP BY `comment_ID`
HAVING
COUNT(*) > 1
AND `dups`.`comment_ID` = `wp_comments`.`comment_ID`
AND `dups`.`tempid` = `wp_comments`.`tempid`
)
I'm not clear on why there appear to be two different fields both named 'column_ID' from the same table, but I believe this will delete only the first of the two identical records. Before running a DELETE statement, however, be sure to make a backup of the original table.
DELETE
TOP 1 *
FROM
'wp_comments'
WHERE
comment_ID IN
(
SELECT
comment_ID,
r,
(comment_ID + '_' + r) AS unique
FROM
(
SELECT
`comment_ID`,
`comment_ID`,
RANK() OVER (PARTITION BY 'comment_id' ORDER BY 'comment_id') AS r
FROM
'wp_comments'
)
WHERE
r>1
)

MySQL Query that traverses pointer fields

Due to some legacy code, I have 2 MySQL tables with the below structure (simplified):
Invoice (ID, InvoiceNo, First_Item)
InvoiceItem (ID, Details, Next_Item)
Obviously, there are many InvoiceItems for each Invoice.
The legacy app expects you to load the Invoice row first, then load the first item from the InvoiceItem table using the Invoice's First_Item value. To get each successive InvoiceItem row, you would then follow its Next_Item value until you hit a null value.
Is there a way to write MySQL SQL that would bring back all InvoiceItem(s) for a given Invoice? i.e follow the Invoice's First_Item and then traverse all the Invoice_Items's Next_Item pointers.
Thanks
Bill.
You want do a recursive query , but mysql < 8 does not support it .
This is a solution that work on you small dataset ( from sqlfiddle )
select id,
details ,
next_item
from (select * from invoiceitem
order by id ) inv_itm,
(select #iis := 0 ) init
where find_in_set(id, #iis)
and not find_in_set(9999999999, #iis)
and length(#iis := concat(#iis, ',', ifnull(next_item,9999999999))) ;
This solution will work only if for each invoices id of items are in ascending order .
This solution is inspired by How to create a MySQL hierarchical recursive query
You need to plan a upgrade to 5.7 or 8.0 , because bellow you will have no security update soon .
see https://en.wikipedia.org/wiki/MySQL#Release_history

CTE recursive query in select statement

I have two MySQL server version 8.0, one for local development and another on an Heroku Instance, more precisely on Heroku i'm using a service called JAWSDB.
For my project I have to use the following CTE query, because the structure of the table tree_structure is hierarchical.
The purpose of the query is that for every row in tree_structure I have to get all of its child, and then count how many user in user_roles table are present in that particular row and its child.
SELECT mtr.id,
mtr.parent_id,
mtr.name,
mtr.manager_id,
CONCAT(users.nome, ' ', users.cognome) as resp_name,
(
with recursive cte (id, name, parent_id) as (
select id,
name,
parent_id
from tree_structure as tr_rec
where tr_rec.parent_id = mtr.id
and tr_rec.session_id = '2018'
union all
select tr.id,
tr.name,
tr.parent_id
from tree_structure as tr
inner join cte
on tr.parent_id = cte.id
WHERE tr.session_id = '2018'
)
select count(distinct (user_id))
from user_roles as ur_count
where ur_count.structure_id in (select distinct(id) from cte)
) as utenti
FROM tree_structure as mtr
LEFT JOIN users ON mtr.manager_id = users.id
WHERE level = 0
The problems is that on my local server it works whereas on the heroku instance it gaves me the following error:
unknow columns mtr.id in where clause.
Has someone any ideas of what is causing this error?
Thanks in advance and sorry for my bad english.
You have an ambiguous table reference in the CTE:
SELECT
....
(with recursive cte (id, name, parent_id) as (
....
from tree_structure as tr_rec -- here you have aliased the table
where tr_rec.id = tree_structure.id -- here you refer to the table and its alias
and tr_rec.session_id = '2018'
union all
....
)
....
) as utenti
....
Table tree_structure is used in the subselect and in the outermost select. The good practice is to make an unique alias for every table reference you have used.
Also you have a typo in the condition that should check self-referencing of the hierarcy root node:
where tr_rec.id = tr_rec.parent_id
and tr_rec.session_id = '2018'
OK guys I found out why the query was wrong. Apparently since MySQL version 8.0.14 they introduced support for using external parameters within subqueries.
My local version was 8.0.16 but the online version was 8.0.11 so because of this my query didn't work.

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

I have below query in mysql where I want to check if branch id and year of finance type from branch_master are equal with branch id and year of manager then update status in manager table against branch id in manager
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
(
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)
)
)
but getting error
Table 'm1' is specified twice, both as a target for 'UPDATE' and as a
separate source for data
This is a typical MySQL thing and can usually be circumvented by selecting from the table derived, i.e. instead of
FROM manager AS m2
use
FROM (select * from manager) AS m2
The complete statement:
UPDATE manager
SET status = 'Y'
WHERE branch_id IN
(
select branch_id
FROM (select * from manager) AS m2
WHERE (branch_id, year) IN
(
SELECT branch_id, year
FROM branch_master
WHERE type = 'finance'
)
);
The correct answer is in this SO post.
The problem with here accepted answer is - as was already mentioned multiple times - creating a full copy of the whole table. This is way far from optimal and the most space complex one. The idea is to materialize the subset of data used for update only, so in your case it would be like this:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT * FROM(
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance')
) t
)
Basically you just encapsulate your previous source for data query inside of
SELECT * FROM (...) t
Try to use the EXISTS operator:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE EXISTS (SELECT 1
FROM (SELECT m2.branch_id
FROM branch_master AS bm
JOIN manager AS m2
WHERE bm.type = 'finance' AND
bm.branch_id = m2.branch_id AND
bm.year = m2.year) AS t
WHERE t.branch_id = m1.branch_id);
Note: The query uses an additional nesting level, as proposed by #Thorsten, as a means to circumvent the Table is specified twice error.
Demo here
Try :::
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
(SELECT DISTINCT branch_id
FROM branch_master
WHERE type = 'finance'))
AND m1.year IN ((SELECT DISTINCT year
FROM branch_master
WHERE type = 'finance'))
The problem I had with the accepted answer is that create a copy of the whole table, and for me wasn't an option, I tried to execute it but after several hours I had to cancel it.
A very fast way if you have a huge amount of data is create a temporary table:
Create TMP table
CREATE TEMPORARY TABLE tmp_manager
(branch_id bigint auto_increment primary key,
year datetime null);
Populate TMP table
insert into tmp_manager (branch_id, year)
select branch_id, year
from manager;
Update with join
UPDATE manager as m, tmp_manager as tmp_m
inner JOIN manager as man on tmp_m.branch_id = man.branch_id
SET status = 'Y'
WHERE m.branch_id = tmp_m.branch_id and m.year = tmp_m.year and m.type = 'finance';
This is by far the fastest way:
UPDATE manager m
INNER JOIN branch_master b on m.branch_id=b.branch_id AND m.year=b.year
SET m.status='Y'
WHERE b.type='finance'
Note that if it is a 1:n relationship the SET command will be run more than once. In this case that is no problem. But if you have something like "SET price=price+5" you cannot use this construction.
Maybe not a solution, but some thoughts about why it doesn't work in the first place:
Reading data from a table and also writing data into that same table is somewhat an ill-defined task. In what order should the data be read and written? Should newly written data be considered when reading it back from the same table? MySQL refusing to execute this isn't just because of a limitation, it's because it's not a well-defined task.
The solutions involving SELECT ... FROM (SELECT * FROM table) AS tmp just dump the entire content of a table into a temporary table, which can then be used in any further outer queries, like for example an update query. This forces the order of operations to be: Select everything first into a temporary table and then use that data (instead of the data from the original table) to do the updates.
However if the table involved is large, then this temporary copying is going to be incredibly slow. No indexes will ever speed up SELECT * FROM table.
I might have a slow day today... but isn't the original query identical to this one, which souldn't have any problems?
UPDATE manager as m1
SET m1.status = 'Y'
WHERE (m1.branch_id, m1.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)

MySQL add to Columns with Group By doesn't show all rows

I'm using this query:
SELECT sender_userid,receiver_userid,message,message_read,`datetime`,sender_userid + receiver_userid AS message_token FROM messages
WHERE (receiver_userid='1000000172' OR sender_userid='1000000172') AND friendship_status=1 AND receiver_history=1
GROUP BY message_token;
The results are:
If I drop the Group By I get this result:
You will see in the second image there are 2 different 'message_token' results.
Why when I Group By this do I only get one result? Shouldn't it show both?
Is there a way to get both unique 'mesage_token' results?
After doing a bit of research i found out that this is a bug for the following MySQL versions: 5.1, 5.5, 5.6, 5.7.
Here's the bug report page where the develelopers recognize it's now working as it should.
From the same page, the code to reproduce it:
CREATE TABLE t1 (a INT, b VARCHAR(1), INDEX(b,a)) ENGINE=MyISAM;
INSERT INTO t1 VALUES (2,'s'),(5,'h'),(3,'q'),(1,'a'),(3,'v'),
(6,'u'),(7,'s'),(5,'y'),(1,'z'),(5,'i'),(2,'y');
SELECT b, max(a) FROM t1 WHERE b = 'i' OR a = 2 GROUP BY b;