Remove duplication of combination 2 columns - mysql

I want to remove all duplicates where combination of first name and last name is same
table users
mysql> select * from users;
+----+------------+-----------+
| id | LastName | FirstName |
+----+------------+-----------+
| 1 | Kowalski | Jan |
| 2 | Malinowski | Marian |
| 3 | Malinowski | Marian |
| 4 | Kowalski | Jan |
| 5 | Malinowski | Marian |
| 6 | Malinowski | Marian |
+----+------------+-----------+
I've created script
set #x = 1;
set #previous_name = '';
DELETE FROM users where id IN (SELECT id from (
select id, #previous_name,IF (CONCAT(FirstName, LastName) = #previous_name, #x:= #x + 1, IF(#previous_name:=CONCAT(FirstName, LastName), #x, IF(#x:=1, #x, #x))) as occurance
from users order by CONCAT(FirstName, LastName)
) AS occurance_table where occurance_table.occurance > 1);
but sql returns error
ERROR 1292 (22007): Truncated incorrect DOUBLE value: 'JanKowalski'
I found a few similar questions, but solution were remove and word form syntax.
I want to prepare db for adding unique constrain for 2 columns, so I want to clear table from duplications.
What is best way to reach it?

I tried with the query mentioned in Answer section.
I believe that does not work. Instead I have modified the query to work
DELETE FROM users
WHERE id NOT IN
(
SELECT MIN(a.id)
FROM (SELECT * FROM users) a
GROUP BY a.LastName, a.FirstName
)
Please do correct me if I am wrong. #juergen

There is no need for a script. A single query is enough:
delete u1
from users u1
left join
(
select min(id) as min_id
from users
group by LastName, FirstName
) u2 on u1.id = u2.min_id
where u2.min_id is null
The subselect gets the lowest user id for each unique set of name. Joining to that you can delete everything else.

Related

How to SELECT top different value using order by matrics

i have a table like this
i want to get the row of each table that have min responsetime
i have tried this query :
select tablename,
index1,
index2,
min(responsetime)
from tableconf
group by tablename
order by responsetime asc
but it doesn't give what i want
the output that i want is
+------------------+------------------+--------+--------------+
| tablename | index1 | index2 | responsetime |
+------------------+------------------+--------+--------------+
| salesorderheader | TotalDue | NULL | 6.1555 |
| salesterritory | Name | NULL | 11.66667 |
| store | BusinessEntityId | Name | 3.6222 |
| previous | previous | NULL | 5.03333 |
| NONE | NONE | NULL | 5.6 |
+------------------+------------------+--------+--------------+
what query i should use for get the output that i want
Select the minimum date per table name. Use an IN clause on these to get the rows:
select *
from tableconf
where (tablename, responsetime) in
(
select tablename, min(responsetime)
from tableconf
group by tablename
);
(Edited from previous answer)
I don't know if all SQL syntax accept a comma separated where parameter. Another option building off of the highest voted answer right now utilizes a join:
select *
from tableconf t
inner join (
select tablename, min(responsetime) min_rt
from tableconf t2
group by tablename
) t3 on t.tablename = t2.tablename and t.responsetime = t2.min_rt

Merge duplicated emails and points mysql

I have a MySQL table which has three columns:
Userid | Email | Points
---------------------------------------------------------
1 | jdoe#company.com | 20
2 | jdoe%40company.com | 25
3 | rwhite#company.com | 14
4 | rwhite%40company.com| 10
What I want to do is to delete duplicate email and merge points. I want my table to look like this:
Userid | Email | Points
---------------------------------------------------------
1 | jdoe#company.com | 45
3 | rwhite#company.com | 24
How would my query look like to return my desire table?
Anyone knows how to do this ?
Thanks in advance!
Are you looking for something like this?
SELECT MIN(userid) userid, email, SUM(points) points
FROM
(
SELECT userid, REPLACE(email, '%40', '#') email, points
FROM table1
) q
GROUP BY email
Output:
| USERID | EMAIL | POINTS |
|--------|--------------------|--------|
| 1 | jdoe#company.com | 45 |
| 3 | rwhite#company.com | 24 |
Here is SQLFiddle demo
Now if you want to deduplicate your table in-place you can do
-- Fix emails
UPDATE table1
SET email = REPLACE(email, '%40', '#')
WHERE email LIKE '%\%40%';
-- Merge points for duplicate records
UPDATE table1 t JOIN
(
SELECT email, SUM(points) points
FROM table1
GROUP BY email
HAVING COUNT(*) > 1
) q ON t.email = q.email
SET t.points = q.points;
-- Delete all duplicate records except ones with lowest `userid`
DELETE t
FROM table1 t JOIN
(
SELECT MIN(userid) userid, email
FROM table1
GROUP BY email
HAVING COUNT(*) > 1
) q ON t.email = q.email
WHERE t.userid <> q.userid;
Here is SQLFiddle demo
Use this query assuming you want to match email as is without any modification
SELECT MIN(user_id), SUM(points)as points, email FROM table_name GROUP BY email

INNER JOIN same table

I am trying to get some rows from the same table. It's a user table: user has user_id and user_parent_id.
I need to get the user_id row and user_parent_id row. I have coded something like this:
SELECT user.user_fname, user.user_lname
FROM users as user
INNER JOIN users AS parent
ON parent.user_parent_id = user.user_id
WHERE user.user_id = $_GET[id]
But it doesn't show the results. I want to display user record and its parent record.
I think the problem is in your JOIN condition.
SELECT user.user_fname,
user.user_lname,
parent.user_fname,
parent.user_lname
FROM users AS user
JOIN users AS parent
ON parent.user_id = user.user_parent_id
WHERE user.user_id = $_GET[id]
Edit:
You should probably use LEFT JOIN if there are users with no parents.
You can also use UNION like
SELECT user_fname ,
user_lname
FROM users
WHERE user_id = $_GET[id]
UNION
SELECT user_fname ,
user_lname
FROM users
WHERE user_parent_id = $_GET[id]
Perhaps this should be the select (if I understand the question correctly)
select user.user_fname, user.user_lname, parent.user_fname, parent.user_lname
... As before
Your query should work fine, but you have to use the alias parent to show the values of the parent table like this:
select
CONCAT(user.user_fname, ' ', user.user_lname) AS 'User Name',
CONCAT(parent.user_fname, ' ', parent.user_lname) AS 'Parent Name'
from users as user
inner join users as parent on parent.user_parent_id = user.user_id
where user.user_id = $_GET[id];
I don't know how the table is created but try this...
SELECT users1.user_id, users2.user_parent_id
FROM users AS users1
INNER JOIN users AS users2
ON users1.id = users2.id
WHERE users1.user_id = users2.user_parent_id
Lets try to answer this question, with a good and simple scenario, with 3 MySQL tables i.e. datetable, colortable and jointable.
first see values of table datetable with primary key assigned to column dateid:
mysql> select * from datetable;
+--------+------------+
| dateid | datevalue |
+--------+------------+
| 101 | 2015-01-01 |
| 102 | 2015-05-01 |
| 103 | 2016-01-01 |
+--------+------------+
3 rows in set (0.00 sec)
now move to our second table values colortable with primary key assigned to column colorid:
mysql> select * from colortable;
+---------+------------+
| colorid | colorvalue |
+---------+------------+
| 11 | blue |
| 12 | yellow |
+---------+------------+
2 rows in set (0.00 sec)
and our final third table jointable have no primary keys and values are:
mysql> select * from jointable;
+--------+---------+
| dateid | colorid |
+--------+---------+
| 101 | 11 |
| 102 | 12 |
| 101 | 12 |
+--------+---------+
3 rows in set (0.00 sec)
Now our condition is to find the dateid's, which have both color values blue and yellow.
So, our query is:
mysql> SELECT t1.dateid FROM jointable AS t1 INNER JOIN jointable t2
-> ON t1.dateid = t2.dateid
-> WHERE
-> (t1.colorid IN (SELECT colorid FROM colortable WHERE colorvalue = 'blue'))
-> AND
-> (t2.colorid IN (SELECT colorid FROM colortable WHERE colorvalue = 'yellow'));
+--------+
| dateid |
+--------+
| 101 |
+--------+
1 row in set (0.00 sec)
Hope, this would help many one.

How to query a table (which has multiple rows pertaining to a single entity) and return GROUPED result but only where all conditionals have been met?

Firstly, pardon the incredibly vague/long question, I'm really not sure how to summarise my query without the full explanation.
Ok, I have a single MySQL table with the format like so
some_table
user_id
some_key
some_value
If you imagine that, for each user, there are multiple rows, for example:
1 | skill | html
1 | skill | php
1 | foo | bar
2 | skill | html
3 | skill | php
4 | foo | bar
If I want to find all the users who have listed HTML as a skill I can simply do:
SELECT user_id
FROM some_table
WHERE some_key = 'skill' AND some_value='html'
GROUP BY user_id
Easy enough. This would give me user ID's 1 and 2.
If I want to find all users who have listed HTML or PHP as a skill then I can do:
SELECT user_id
FROM some_table
WHERE (some_key = 'skill' AND some_value='html') OR (some_key = 'skill' AND some_value='php')
GROUP BY user_id
This would give me use ID's 1, 2 and 3.
Now, what I'm struggling to work out is how I can query the same table but this time say "give me all the users who have listed both HTML and PHP as a skill", i.e: just user ID 1.
Any advice, guidance or links to docs massively appreciated.
Thanks.
Here's one way:
SELECT user_id
FROM some_table
WHERE user_id IN (SELECT user_id FROM some_table where (some_key = 'skill' AND some_value='html'))
AND user_id IN (SELECT user_id FROM some_table where (some_key = 'skill' AND some_value='php'))
you need to use a nested query (or a self join, which is different)
I set up the following table.
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| type | char(10) | YES | | NULL | |
| value | char(10) | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
inserted the following values
+------+-------+-------+
| id | type | value |
+------+-------+-------+
| 1 | skill | html |
| 1 | skill | php |
| 2 | skill | html |
| 3 | skill | php |
| 2 | skill | php |
+------+-------+-------+
ran this query
select id
from test
where type = 'skill'
and value = 'html'
and id in (
select id
from test
where type = 'skill'
and value = 'php');
and got
+------+
| id |
+------+
| 1 |
| 2 |
+------+
a self join would be as follows
select e1.id
from test e1, test e2
where e1.id = e2.id
and e2.type = 'skill'
and e2.value = 'html'
and e1.type = 'skill'
and e1.value = 'php'
;
and produce the same result.
so there you have two ways to try it in your code.
I don't know if this is valid for mysql, but should be (works for other db engines):
SELECT php.user_id
FROM some_table php, some_table html
WHERE php.user_id = html.user_id
AND php.some_key = 'skill'
AND html.some_key = 'skill'
AND php.some_value = 'php'
AND html.some_value = 'html';
And alternative, by using HAVING statement:
SELECT user_id, count(*)
FROM some_table
WHERE some_key = 'skill'
AND some_value in ('php','html')
GROUP BY user_id
HAVING count(*) = 2;
And a third option is to use inner selects. A slight alternative approach to David's approach:
SELECT user_id FROM some_table
WHERE
some_key = 'skill' AND
some_value = 'html' AND
user_id IN (
SELECT user_id FROM some_table
WHERE
some_key = 'skill' AND
some_value = 'php' AND
user_id IN (
SELECT user_id FROM some_table
WHERE
some_key = 'skill' AND
some_value = 'js' -- AND user_id IN ... for next level, etc.
)
);
... idea is that you can "pipe" the inner selects. With each new property you add new inner select to the most inner one.

retrieving duplicates in table and displaying them

I have got the following table where if more than 1 row contain the same 'user_badge_name' and the 'user_email', the are considered duplicates.
user_id | user_name | user_badge_name | user_email
--------------------------------------------------
234 | Kylie | ky001 | kylie#test.com
235 | Francois | FR007 | france#test.com
236 | Maria | MA300 | Marie#test.com
237 | Francine | FR007 | france#test.com
I need to display the user_ids and username of those rows where 'user_badge_name' and 'user_email' are replicated.
I tried the following sql but it is not returning all user_ids, only the first id
SELECT user_id, username , COUNT(user_badge_name) AS user_badge_name_Count FROM user GROUP BY user_badge_name HAVING user_badge_name_Count > 1
Any suggestion is most appreciated
select a.user_id, a.user_name
from user as a
inner join
(SELECT user_badge_name, user_email
FROM user
GROUP BY user_badge_name, user_email
HAVING count(*)>1
) as dups
on a.user_badge_name=dups.user_badge_name and a.user_email=dups.user_email
order by a.user_badge_name, a.user_email
If you want to see all of the user ids in the same row, then you can used a GROUP_CONCAT:
SELECT GROUP_CONCAT(user_id) AS user_ids, GROUP_CONCAT(username) AS usernames, COUNT(user_badge_name) AS user_badge_name_Count FROM user GROUP BY user_badge_name HAVING user_badge_name_Count > 1
That will give you something like this:
user_ids | usernames | user_badge_name_Count
-----------------------------------------------
235,237 | Francois,Francine | 2