MySQL: Finding duplicates across multiple fields

MySQL: Finding duplicates across multiple fields - mysql

Background: my employer has a database powered by a really old version of MySQL (3.23). I have been asked to find duplicate serial numbers and MAC addresses in the database.
I was able to find the duplicate serial numbers, but since this version of MySQL doesn't support subqueries, I had to resort to using a temporary table. These are the two SQL statements I ended up using:
CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_Duplicate_Serials
SELECT Serial
FROM Inventory
WHERE Serial IS NOT NULL
GROUP BY Serial
HAVING COUNT(Serial) > 1
SELECT DeviceName, Model, Inventory.Serial
FROM Inventory
INNER JOIN Inventory_Duplicate_Serials
ON Inventory.Serial = Inventory_Duplicate_Serials.Serial
ORDER BY Serial
Now I need to find the duplicate MAC addresses. The problem is the "Inventory" table has three MAC address fields (MAC, MAC2, and MAC3). So, for example, if the value of an item's "MAC" field is the same as the value of another item's "MAC2" field, I need to know about it. How do I go about doing this? Thank you for your time.
UPDATE: Solved. I ended up creating two temporary tables (Inventory_All_MACs and Inventory_Duplicate_MACs). These are the five queries:
CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_All_MACs
SELECT MAC
FROM Inventory
WHERE MAC != ''
CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_All_MACs
SELECT MAC2 AS MAC
FROM Inventory
WHERE MAC2 != ''
CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_All_MACs
SELECT MAC3 AS MAC
FROM Inventory
WHERE MAC3 != ''
CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_Duplicate_MACs
SELECT MAC
FROM Inventory_All_MACs
GROUP BY MAC
HAVING COUNT(MAC) > 1
SELECT DeviceName, Model, Inventory_Duplicate_MACs.MAC AS DuplicateMAC, Inventory.MAC, MAC2, MAC3
FROM Inventory_Duplicate_MACs
INNER JOIN Inventory
ON Inventory.MAC = Inventory_Duplicate_MACs.MAC
OR Inventory.MAC2 = Inventory_Duplicate_MACs.MAC
OR Inventory.MAC3 = Inventory_Duplicate_MACs.MAC
ORDER BY Inventory_Duplicate_MACs.MAC, DeviceName, Model
Thanks everybody!

A 'simple' solution that comes to mind is to create a second temporary table that lists all MAC addresses in one column, so you would need to create three entries for one entry from the first temporary table.

CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_Mac
SELECT Mac
FROM Inventory
INSERT INTO Inventory_Mac
SELECT Mac2
FROM Inventory
INSERT INTO Inventory_Mac
SELECT Mac3
FROM Inventory
CREATE TEMPORARY TABLE IF NOT EXISTS Inventory_Duplicate_Mac
SELECT Mac, COUNT(*) AS cnt
FROM Inventory_Mac
GROUP BY Mac
HAVING COUNT(*) > 1
SELECT DeviceName, Model, im.Mac, i.Mac, i.Mac2, i.Mac3
FROM Inventory_Duplicate_Mac AS im
JOIN Inventory AS i
ON i.Mac = im.Mac
OR i.Mac2 = im.Mac
OR i.Mac3 = im.Mac
ORDER BY im.Mac

Not 100% sure on this answer but it could be worth a try using LEFT JOINS e.g:
SELECT address1
FROM addresses
LEFT JOIN Inventory_Duplicate_Addresses ad1
ON Addresses.MAC = ad1.mac
LEFT JOIN Inventory_Duplicate_Addresses ad2
ON Addresses.MAC = ad2.mac2
LEFT JOIN Inventory_Duplicate_Addresses ad3
ON Addresses.MAC = ad3.mac3

Related

duplicate database entries

I've accidentally managed to add duplicate entries into my database. The database contains a list of telephone numbers and they are routed via the information contained in the value field. The id field is unique per entry, and the UUID and username fields should be identical but shouldn't exist in the table more than once.
Data has been blanked in the screenshot for data protection.
The following command allowed me to identify I had duplicate entries which can be seen in the screenshot above.
select * uuid, count(*) from usr_preferences group by uuid having count(*) > 1;
I'm after some help on how I could delete entries where the UUID count is more than one but one entry must remain. deleting the duplicate UUID with the highest id number would be preferred.
Is there a way to display the results before deleting them?
MySQL version - mysql Ver 14.14 Distrib 5.7.38-41, for Linux (x86_64) using 6.2
Thanks

Could you give the following bit of code a go? Please make sure you have the database backed up before running this.
DELETE b FROM `test` a, `test` b where b.uuid = a.uuid and b.id > a.id;
I've expanded on your text data to make sure it will remove both duplicates and triplicates leaving the lowest ID. You can find my testing at this DB Fiddle.
https://www.db-fiddle.com/f/sUr6V6UP9tZ1Ya8eESid33/0
Hope this sorts you issue.

Try the following for MySQL v5.7:
set #rn=0;
set #uuid=null;
delete from usr_preferences where id in
(
select D.id
from
(
select id, uuid,
case
when #uuid <> uuid then
#rn:=1
else
#rn:=#rn+1
end as rn,
#uuid:=uuid
from usr_preferences order by id,uuid
) D
where D.rn>1
);
Select * From usr_preferences;
See a demo from db-fiddle.
Important Note:
Test the query before using it on your table, and take a backup of your table before running this query on it.
For MySQL v8.0 and above you may try the following:
with cte as
(
select id, row_number() over (partition by uuid order by id) as rn
from usr_preferences
)
delete U From
usr_preferences U join cte C
On U.id = C.id
where C.rn > 1;

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

I have below query in mysql where I want to check if branch id and year of finance type from branch_master are equal with branch id and year of manager then update status in manager table against branch id in manager
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
(
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)
)
)
but getting error
Table 'm1' is specified twice, both as a target for 'UPDATE' and as a
separate source for data

This is a typical MySQL thing and can usually be circumvented by selecting from the table derived, i.e. instead of
FROM manager AS m2
use
FROM (select * from manager) AS m2
The complete statement:
UPDATE manager
SET status = 'Y'
WHERE branch_id IN
(
select branch_id
FROM (select * from manager) AS m2
WHERE (branch_id, year) IN
(
SELECT branch_id, year
FROM branch_master
WHERE type = 'finance'
)
);

The correct answer is in this SO post.
The problem with here accepted answer is - as was already mentioned multiple times - creating a full copy of the whole table. This is way far from optimal and the most space complex one. The idea is to materialize the subset of data used for update only, so in your case it would be like this:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT * FROM(
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance')
) t
)
Basically you just encapsulate your previous source for data query inside of
SELECT * FROM (...) t

Try to use the EXISTS operator:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE EXISTS (SELECT 1
FROM (SELECT m2.branch_id
FROM branch_master AS bm
JOIN manager AS m2
WHERE bm.type = 'finance' AND
bm.branch_id = m2.branch_id AND
bm.year = m2.year) AS t
WHERE t.branch_id = m1.branch_id);
Note: The query uses an additional nesting level, as proposed by #Thorsten, as a means to circumvent the Table is specified twice error.
Demo here

Try :::
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
(SELECT DISTINCT branch_id
FROM branch_master
WHERE type = 'finance'))
AND m1.year IN ((SELECT DISTINCT year
FROM branch_master
WHERE type = 'finance'))

The problem I had with the accepted answer is that create a copy of the whole table, and for me wasn't an option, I tried to execute it but after several hours I had to cancel it.
A very fast way if you have a huge amount of data is create a temporary table:
Create TMP table
CREATE TEMPORARY TABLE tmp_manager
(branch_id bigint auto_increment primary key,
year datetime null);
Populate TMP table
insert into tmp_manager (branch_id, year)
select branch_id, year
from manager;
Update with join
UPDATE manager as m, tmp_manager as tmp_m
inner JOIN manager as man on tmp_m.branch_id = man.branch_id
SET status = 'Y'
WHERE m.branch_id = tmp_m.branch_id and m.year = tmp_m.year and m.type = 'finance';

This is by far the fastest way:
UPDATE manager m
INNER JOIN branch_master b on m.branch_id=b.branch_id AND m.year=b.year
SET m.status='Y'
WHERE b.type='finance'
Note that if it is a 1:n relationship the SET command will be run more than once. In this case that is no problem. But if you have something like "SET price=price+5" you cannot use this construction.

Maybe not a solution, but some thoughts about why it doesn't work in the first place:
Reading data from a table and also writing data into that same table is somewhat an ill-defined task. In what order should the data be read and written? Should newly written data be considered when reading it back from the same table? MySQL refusing to execute this isn't just because of a limitation, it's because it's not a well-defined task.
The solutions involving SELECT ... FROM (SELECT * FROM table) AS tmp just dump the entire content of a table into a temporary table, which can then be used in any further outer queries, like for example an update query. This forces the order of operations to be: Select everything first into a temporary table and then use that data (instead of the data from the original table) to do the updates.
However if the table involved is large, then this temporary copying is going to be incredibly slow. No indexes will ever speed up SELECT * FROM table.
I might have a slow day today... but isn't the original query identical to this one, which souldn't have any problems?
UPDATE manager as m1
SET m1.status = 'Y'
WHERE (m1.branch_id, m1.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)

Update a MySQL table to double aggregate of a construct, which depends on the table itself

I need to update a table, but to get the new value it seems that I need to create a temporary table. The reason is that I need to calculate sum of the max. Can I do it?
The pseudocode looks like this:
UPDATE users u SET usersContribution = [CREATE TEMPORARY TABLE IF NOT EXISTS tmpTbl3 AS
(SELECT ROUND(max(zz.zachetTimestamp - d.answerDate)) as answerDateDiff
FROM zachet zz
JOIN discussionansw d ON d.zachetid=zz.zachetId and d.usersid=zz.usersId and
zz.zachetTimestamp > d.answerDate
WHERE zz.whoTalk=u.userid and
NOT EXISTS (SELECT * FROM discussionansw
WHERE zachetid=zz.zachetId and usersid=u.userid')
GROUP BY zz.zachetId)]
SELECT SUM(answerDateDiff) FROM tmpTbl3;"
I used a brackets to show the part, which have to be done, but ignored by UPDATE query...
I have both max and sum and I do not see a way to avoid tmp table. But if you can I we'll be glad to have such a solution.
I put here THE ANSWER, which I get with help of #flaschenpost and this post: SQL Update to the SUM of its joined values
CREATE TEMPORARY TABLE IF NOT EXISTS t0tmpTbl3 AS
(SELECT zz.whoTalk, ROUND(max(zz.zachetTimestamp - d.answerDate)) as answerDateDiff
FROM zachet zz
JOIN discussionansw d ON d.zachetid=zz.zachetId and d.usersid=zz.usersId and
zz.zachetTimestamp > d.answerDate
WHERE
NOT EXISTS (SELECT * FROM discussionansw WHERE zachetid=zz.zachetId and usersid=zz.whoTalk)
GROUP BY zz.zachetId);
UPDATE users u
JOIN (SELECT whoTalk, SUM(answerDateDiff) sumAnswerDateDiff
FROM t0tmpTbl3 GROUP BY whoTalk) t
ON u.usersId=t.whoTalk
SET u.usersContribution=sumAnswerDateDiff;

Could you just break it into two Queries?
drop temporary table if exists tmp_maxsumofsomething;
create temporary table tmp_maxsumofsomething
select max(), sum(), ...
from zachet z inner join discussionansw a on ...
group by...
;
update u inner join tmp_maxsumofsomething t on ... set u.... = t...
Temporary Tables are just visible in the connection where they have been created, so Thread Safety is given.
EDIT: As long as your Queries make any sense, you could try:
DROP TEMPORARY TABLE IF EXISTS tmpTbl3;
CREATE TEMPORARY TABLE tmpTbl3
SELECT zz.whoTalk as userId, ROUND(max(zz.zachetTimestamp - d.answerDate)) as answerDateDiff
FROM zachet zz, discussionansw d
WHERE d.zachetid=zz.zachetId
and d.usersid=zz.usersId and zz.zachetTimestamp > d.answerDate
# What do you mean ? by:
# and (SELECT count(*) FROM discussionansw
# WHERE zachetid=zz.zachetId and usersid=u.userid) = 0
# Think about a reasonable WHERE NOT EXISTS clause!
GROUP BY zz.whoTalk
Then you have your Temp-Table to join to:
update users u
inner join tmpTbl3 tm on u.userId = tm.userId
set u.usersContribution = tm.answerDateDiff
If you are brave enough to write an application needing those queries, you should not be scared to learn a bit more of some concepts of SQL and MySQL. You are here for the exploration of concepts, not to hire Programmers for free.

MySQL INSERT INTO table 1 SELECT table 2 with different column name

I have a table (pdt_1) in database (db_1) and another table (pdt_2) in another database (db_2).
I met pdt_1 and pdt_2 to find pdt_1 products not present and published in pdt_2.
functional code :
SELECT * FROM db_1.pdt_1 AS lm
WHERE lm.product_sku
NOT IN (SELECT DISTINCT product_cip7 FROM db_2.pdt_2)
AND lm.product_publish=‘Y'
finally, I need to insert the result of this query in pdt_2.
However, the structure of pdt_1 and pdt_2 are different.
Example:
- columns's names
- columns's numbers
I also need an auto_increment id for pdt_1 products inserted into pdt_2.
I need help.
NB : sorry for my poor english :(

If you want a new table with just the id and product_sku, try:
INSERT INTO new_table # with id and product_sku from first table
SELECT pdt_1.id,
pdt_1.product_sku
FROM db_1.pdt_1
LEFT JOIN db_2.pdt_2
ON pdt_1.product_sku = pdt_2.product_cip7
WHERE pdt_2.product_cip7 IS NULL
AND pdt_1.product_publish = 'Y'

sql query for deleting rows with NOT IN using 2 columns

I have a table with a composite key composed of 2 columns, say Name and ID. I have some service that gets me the keys (name, id combination) of the rows to keep, the rest i need to delete. If it was with only 1 row , I could use
delete from table_name where name not in (list_of_valid_names)
but how do I make the query so that I can say something like
name not in (valid_names) and id not in(valid_ids)
// this wont work since they separately dont identity a unique record or will it?

Use mysql's special "multiple value" in syntax:
delete from table_name
where (name, id) not in (select name, id from some_table where some_condition);
If your list is a literal list, you can still use this approach:
delete from table_name
where (name, id) not in (select 'john', 1 union select 'sally', 2);

Actually, no I retract my comment about needing special juice or being stuck with (AND OR'ing all your options).
Since you have a list of values of what you want to retain, dump that into a temporary table. Then do a delete against the base table for what does not exist in the temporary table (left outer join). I suck at mysql syntax or I'd cobble together your query. Psuedocode is approximate
DELETE
B
FROM
BASE B
LEFT OUTER JOIN
#RETAIN R
ON R.key1 = B.key1
AND R.key2 = B.key
WHERE
R.key1 IS NULL

The NOT EXISTS version:
DELETE
b
FROM
BaseTable b
WHERE
NOT EXISTS
( SELECT
*
FROM
RetainTable r
WHERE
(r.key1, r.key2) = (b.key1, b.key2)
)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL: Finding duplicates across multiple fields - mysql

A 'simple' solution that comes to mind is to create a second temporary table that lists all MAC addresses in one column, so you would need to create three entries for one entry from the first temporary table.

Related

duplicate database entries

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

Update a MySQL table to double aggregate of a construct, which depends on the table itself

MySQL INSERT INTO table 1 SELECT table 2 with different column name

sql query for deleting rows with NOT IN using 2 columns

Categories

Resources