MySQL - Duplicate elimination and Preserving Valuable Data? - mysql

Scenario : I have few duplicate contacts in a table. The duplicates are identified, I can just delete them but the problem is I don't want to lose the data the duplicate might have and the original don't. Any tips?
Sample data :
ID Name Email School Dupe_Flag Key
1 AAA a#a X 1
2 AAB JKL 1
3 BBB b#b MNO X 2
4 BBC 2
Desired output :
ID Name Email School Dupe_Flag Key
1 AAA a#a X 1
2 AAB a#a JKL 1
3 BBB b#b MNO X 2
4 BBC b#b MNO 2
How are 2 records related? : They both have the same Key Value with only one column having the Dupe_Flag SET which is the duplicate column.
In the above case ID 1 is going to be deleted but email info from ID 1 should be applied to ID 2.
What is the Data? : I have few hundred rows and few 100 duplicates. UPDATE statement for each row is cumbersome and is not feasible.
Business rules for determining what data takes priority :
If a column from the original/good record (Dupe_Flag is NOT set) has no data and if the corresponding Dupe record (has the same Key value) column has data then that original record column should be updated.
Any help/script is really appreciated! Thanks guys :)

Assuming empty values are null, something like this should output the desired data:
SELECT
a.ID,
IF(a.DupeFlag IS NULL, IF(a.Name IS NULL, b.Name, a.Name), a.Name) AS Name,
IF(a.DupeFlag IS NULL, IF(a.Email IS NULL, b.Email, a.Email), a.Email) AS Email,
IF(a.DupeFlag IS NULL, IF(a.School IS NULL, b.School, a.School), a.School) as School,
a.DupeFlag,
a.key
FROM
table a,
table b
WHERE
a.Key = b.Key AND
a.ID != b.ID
GROUP BY
a.ID
Note that turning this in an UPDATE statement is pretty straight-forward

I don't know the specifics of this problem but it is probably better to avoid this problem by setting the columns to "unique" so if a query tries to create a duplicate it will fail. I think the elegant solution to this problem is to avoid it at the point of data entry.
I like using this query for tracking down dupes:
select * from table group by `Email` having count(Email) > 1

While this uses a bunch of nested SELECTS, and isn't really a full solution, it should either spark something else, or possibly push in the right direction.
select * from
(select r1.ID,r1.Name,coalesce(r1.Email,r2.Email) as Email,
coalesce(r1.School,r2.School) as School,r1.Dupe_Flag,r1.Key from
(select * from test1 where Dupe_Flag IS NULL) as r1 left outer join
(select * from test1 where Dupe_Flag IS NOT NULL) as r2 on r1.KEY=r2.Key)
as results
Yields:
ID Name Email School Dupe_Flag Key
2 AAB a#a JKL NULL 1
4 BBC b#b MNO NULL 2
Based on your example data.

The rows are unique, so there's no problem. Please recheck your example data.

Related

SQL Validate a column with the same column

I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.

Get data from table with id from another table

i have two Tables users_data and users_statistics
users_data:
id money position uid
1 1000 20921 3
2 3000 8742 0
3 2000 23214 3
users_statistics:
id lastname lastlogin
1 Hans 13.05.2200
2 Uwe 10.03.1900
3 Herbert 13.42.2421
Now, i want to SELECT all lastname WHERE uid = 3
My try was
SELECT `lastname` FROM users_statistics
JOIN users_data USING (id)
WHERE `uid` = 3
With this Query he returns me all 3 Rows, but why?
In the second row the uid is 0...
I hop someone can help, thanks in advance.
Sounds weird but now it works...
I only changed the uid to users_data.uid
SELECT `lastname` FROM users_statistics
JOIN users_data USING (id)
WHERE users_data.uid = 3
Now he only gives me the two rows with uid = 3
When someone can explain this, let me know it :D
your query works for me:
http://sqlfiddle.com/#!9/c5025/1
Also tested under local MySQL 5.5.38

mysql update rows of column in second table from column in first table where another column in first table matches column in second table

My title may be a little confusing, but this is basically what I want to do:
I have two tables:
Table 1 = Site
Columns: SiteID SiteName Address
1 Germany 123 A Street
2 Poland 234 B Street
3 France 354 F Street
4 England 643 C Street
5 Russia 968 G Street
Table 2 = Site_New
Columns: SiteID SiteName Address
1 Germany
2 France
3 Russia
I wan't to update the Address column in table 2 with the Address in table 1 where SiteName in table 2 = SiteName in table 1. As you can see there are sites in table 1 that are not in table 2, so I do not care about copying those addresses to table 2.
I was trying this code:
update Site_New set Address = (select Site.Address from Site where Site_New.SiteName=Site.SiteName)
but I was getting error code 1242: "Subquery returns more than 1 row."
Any idea on how this can be done?
You are better off using update/join syntax:
update Site_New sn join
Site s
on sn.SiteName = s.SiteName
set sn.Address = s.Address;
However, based on your sample data, your correlated subquery should not cause such an error.
Perhaps the join should be on SiteId rather than SiteName:
update Site_New sn join
Site s
on sn.SiteId = s.SiteId
set sn.Address = s.Address;
you need to do a select with your update like so
UPDATE site_new sn,
( SELECT
sn1.address as _address, sn1.sitename as _sitename
FROM site_new sn1
JOIN site s on s.sitename = sn1.sitename
) t
SET sn.address = t._address
WHERE sn.sitename = t._sitename

mySQL circular query

I have a two tables say table 1: Member and table 2: Info. ID is primary key and A & B are primary keys in second table.
A represents introduced from and B represents someone who introduces. So the entry 002 -> 001 means that 001 introduces 002.
I want to have a query to show those Name in which 001 is not involved, meaning that those people who 001 introduces AND those people who introduce 001 are NOT involved.
This is what i have so far.
SELECT DISTINCT Info.A
FROM Info
WHERE NOT (A="001" OR B="001")
UNION
SELECT DISTINCT Info.B
FROM Info
WHERE NOT (A="001" OR B="001)
The expected result should be 004 but my query is also including 003. Any suggestions?
You could get all the ids where 001 is involved, then the result is NOT IN those ids.
SELECT * FROM Member
WHERE ID NOT IN (
SELECT IF(A = '001', B, A)
FROM Info WHERE A = '001' OR B = '001'
UNION
SELECT '001'
)
THE SQLFIDDLE.

Show only account numbers with different pointTypes

I will try to explain what I was asked to do to the best of my ability.
Let's say that we have developer access to DataBase A, which has many tables inside, but we are mostly concerned about two tables. The first one is called Accounts, and the second one Campaign. Now, inside the Campaign table we have many fields, but the most important are CampaignTypeID, and AccountID.
Inside the Accounts table the most important field are AccountID, and CustomerNumber. In the Accounts table we have many customers who have participated in different campaigns; therefore we can say that a single costumer can have many different CampaignTypeIDs under their account.
Now here is what I was asked to do: Show one CustomerNumber for each CampaignTypeID (5 types total). (I think that repeated CustomerNumbers are acceptable)
(Ex.)
CampaignTypeID CustomerNumber
1 34535
2 23525
3 23423
4 52355
5 23525
This is the query I used:
SELECT top 5 CustomerNumber[Customer Number], CampaignTypeID ,
FROM A.Account a
JOIN A.Campaign c ON a.AccountID = c.AccountID
WHERE CampaignTypeID IN (5)
GROUP BY CustomerNumber, CampaignTypeID
The result of this query would be something like:
(Ex.)
CampaignTypeID CustomerNumber
5 34535
5 23525
5 23423
5 52355
5 23525
Not exactly what I wanted, at first I plugged in all of the CampaignTypeIDs into the WHERE clause, but that would only return repeated CampaignTypeIDs.
(Ex.)
CampaignTypeID CustomerNumber
3 34535
3 23525
4 23423
5 52355
5 56678
3 23525
As you can see at this point my only option was to enter each CampaignTypeID, one by one. Then I would copy each one of those to a spread sheet. What I showed you above was just an example, but what I had to do had actually 40 different CampaignTypeIDs. It was a very tedious job that I know can be made A LOT more efficient.
If possible I would like to know a more efficient way to complete this task.
Thanks!
UPDATE: Alright, a small thing I should have added. The CampaignTypeID are not sequential, they are more like 5,6,7,8,9,10,50,65,110,250,1104,1114. Would this complicate thing?
SCHEMA
CREATE TABLE Accounts
(
AccountID int auto_increment primary key,
CustomerID int(20)NOT NULL
);
INSERT INTO Accounts
(CustomerID)
VALUES
(24),(22),(35),(256),(1246),(11),(224),(55),(664),(773),(234),(568),(245),(986),(768);
CREATE TABLE Campaign
(
CampaignID int auto_increment primary key,
CampaignTypeID int(20) NOT NULL,
CONSTRAINT fk_AccountID FOREIGN KEY(AccountID)
REFERENCES Accounts(AccountID)
);
INSERT INTO Campaign
(CampaignTypeID)
VALUES
(6),(7),(8),(9),(10),(245),(1140),(1150),(1160),(1170),(1180),(1190),(1240),(1250),(1260);
Select
CampaignTypeID,
Min(CustomerNumber)
From
A.Account a
Inner Join
A.Campaign c
On a.AccountID = c.AccountID
Where
CampaignTypeID Between 1 And 5
Group By
CampaignTypeID