Select the fields are duplicated in mysql - mysql

Assuming that I have the below customer_offer table.
My question is:
How to select all the rows where the key(s) are duplicated in that table?
+---------+-------------+------------+----------+--------+---------------------+
| link_id | customer_id | partner_id | offer_id | key | date_updated |
+---------+-------------+------------+----------+--------+---------------------+
| 1 | 99 | 11 | 14 | mmmmmq | 2011-09-21 12:40:46 |
| 2 | 100 | 11 | 14 | qmmmmq | 2011-09-21 12:40:46 |
| 3 | 101 | 11 | 14 | 8mmmmq | 2011-09-21 12:40:46 |
| 4 | 99 | 11 | 14 | Dmmmmq | 2011-09-21 12:59:28 |
| 5 | 100 | 11 | 14 | Nmmmmq | 2011-09-21 12:59:28 |
+---------+-------------+------------+----------+--------+---------------------+
UPDATE:
Thanks so much for all your answer. There are many answers are good. Now I got the solution to do.

select *
from customer_offer
where key in
(select key from customer_offer group by key having count(*) > 1)
Update:
As mentioned from #Scorpi0, if with a big table, it is better to use join. And from mysql6.0 the new optimizer will convert this kind of subqueries into joins.

Self join
SELECT * FROM customer_offer c1 inner join customer_offer c2
on c1.key = c2.key
or group by the field then take when count > 1
SELECT COUNT(key),link_id FROM customer_offer c1
group by key, link_id
having COUNT(Key) > 1

SELECT DISTINCT c1.*
FROM customer_offer c1
INNER JOIN customer_offer c2
ON c1.key = c2.key
AND c1.link_id != c2.link_id
Assuming link_id is a primary key.

Use a sub-query to do the count check, and the main query to select the rows. The count check query is simply:
SELECT `link_id` FROM `customer_offer` GROUP BY `key` HAVING COUNT(`key`) > 1
Then the outer query will use this by joining into it:
SELECT customer_offer.* FROM customer_offer
INNER JOIN (SELECT `link_id` FROM `customer_offer` GROUP BY `key` HAVING COUNT(`key`) > 1) AS count_check
ON customer_offer.link_id = count_check.link_id

There are many threads on the mysql website which explains how to do this. This link will explain how to do this using mysql: http://forums.mysql.com/read.php?10,180556,180567#msg-180567
As a brief example the code below is from the link with a slight modification which better suits your example.
SELECT *
FROM tbl
GROUP BY key
HAVING COUNT(key)>1;
You can also use a joing which is my prefered method, as this removes the slower count method:
SELECT *
FROM this_table t
inner join this_table t1 on t.key = t1.key

SELECT link_id, key, count(key) as Occurrences
FROM table
GROUP BY key
HAVING COUNT(key)>1;

Related

How to remove duplicate rows in two columns with group by?

I'm trying remove the duplicate rows using this query but MySQL don't return nothing and crash:
DELETE FROM project_category WHERE prc_id IN (SELECT prc_id
FROM project_category
GROUP BY prc_proid, prc_catid
HAVING COUNT(*) > 1)
I want remove the duplication:
+--------+-----------+-----------+
| prc_id | prc_proid | prc_catid |
+--------+-----------+-----------+
| 1691 | 207 | 16 |
| 1692 | 207 | 16 |
+--------+-----------+-----------+
MySql does not allow direct reference to the table where the DELETE takes place in the WHERE clause. Do it like this:
DELETE FROM project_category
WHERE prc_id IN (
SELECT prc_id FROM (
SELECT prc_id
FROM project_category
GROUP BY prc_proid, prc_catid
HAVING COUNT(*) > 1
) t
)

Select only latest record for every employees and for specific employee in MySQL

I have a MySQL DB and in it there's a table with activity logs of employees.
+-------------------------------------------------+
| log_id | employee_id | date_time | action_type |
+-------------------------------------------------+
| 1 | 1 | 2015/02/03 | action1 |
| 2 | 2 | 2015/02/01 | action1 |
| 3 | 2 | 2017/01/02 | action2 |
| 4 | 3 | 2016/02/12 | action1 |
| 5 | 1 | 2016/10/12 | action2 |
+-------------------------------------------------+
And I would need 2 queries. First, to get for every employee his last action. So from this example table I would need to get row 3,4 and 5 with all columns. And second, get the latest action only for specified employee.
Any ideas how to achieve this? I'm using Spring Data JPA, but raw SQL Query would be also great.
Thank you in advance.
Ready for a fred ed...
SELECT x.*
FROM my_table x
JOIN
( SELECT employee_id
, MAX(date_time) date_time
FROM my_table
GROUP
BY employee_id
) y
ON y.employee_id = x.employee_id
AND y.date_time = x.date_time;
For your first query. Simply
SELECT t1.*
FROM tableName t1
WHERE t1.log_id = (SELECT MAX(t2.log_id)
FROM tableName t2
WHERE t2.employee_id = t1.employee_id)
For the second one
SELECT t1.*
FROM tableName t1
WHERE t1.employee_id=X and t1.log_id = (SELECT MAX(t2.log_id)
FROM tableName t2
WHERE t2.employee_id = t1.employee_id);
You can get the expected output by doing a self join
select a.*
from demo a
left join demo b on a.employee_id = b.employee_id
and a.date_time < b.date_time
where b.employee_id is null
Note it may return multiple rows for single employee if there are rows with same date_time you might need a CASE statement and another attribute to decide which row should be picked to handle this kind of situation
Demo

Multi join one to many

Trades
id |Trade_name |
1 | trade1 |
2 | trade2 |
3 | trade3 |
4 | trade4 |
Users
Name | Primary_id(FK to trade.id) | secondary_id (FK to trade.id) |
John | 1 | 2 |
Alex | 3 | 4 |
This is my current SQL which joins trades.t1 to primary & secondary.id:
select
`users`.`name` ,
`t1`.`trade_name` AS `Primary_trade`,
`t2`.`trade_name` AS `Secondary_trade`,
FROM `users`
right JOIN `trades` `t1` On (`t1`.`trade_id` = `users`.`primary_trade_id`)
right JOIN `trades` `t2` on (`t2`.`trade_id` = `users`.`secondary_trade_id`)
My question is, how do I identify which trades are not used for users both as primary or secondary. I want to see record where a trade does not exist in both primary or secondary column so I can perform housekeeping.
Thanking you all in advance for your help.
If you need only the trades rows
SELECT t.*
FROM trades t
WHERE NOT EXISTS ( SELECT 'u'
FROM Users u
WHERE u.Primary_id = t.id
OR u.Secondary_id = t.id
)
I think this should work for you:
SELECT * FROM trades WHERE id NOT IN (SELECT Primary_id FROM Users) AND id NOT IN (SELECT Secondary_id FROM Users)
It selects the rows which are not in either primary_id nor secondary_id

mysql subquery not producing all results

I have two tables: contacts and client_profiles. A contact has many client_profiles, where client_profiles has foreign key contact_id:
contacts:
mysql> SELECT id,first_name, last_name FROM contacts;
+----+-------------+-----------+
| id | first_name | last_name |
+----+-------------+-----------+
| 10 | THERESA | CAMPBELL |
| 11 | donato | vig |
| 12 | fdgfdgf | gfdgfd |
| 13 | some random | contact |
+----+-------------+-----------+
4 rows in set (0.00 sec)
client_profiles:
mysql> SELECT id, contact_id, created_at FROM client_profiles;
+----+------------+---------------------+
| id | contact_id | created_at |
+----+------------+---------------------+
| 6 | 10 | 2014-10-09 17:17:43 |
| 7 | 10 | 2014-10-10 11:38:01 |
| 8 | 10 | 2014-10-10 12:20:41 |
| 9 | 10 | 2014-10-10 12:24:19 |
| 11 | 12 | 2014-10-10 12:35:32 |
+----+------------+---------------------+
I want to get the latest client_profiles for each contact. That means There should be two results. I want to use subqueries to achieve this. This is the subquery I came up with:
SELECT `client_profiles`.*
FROM `client_profiles`
INNER JOIN `contacts`
ON `contacts`.`id` = `client_profiles`.`contact_id`
WHERE (client_profiles.id =
(SELECT `client_profiles`.`id` FROM `client_profiles` ORDER BY created_at desc LIMIT 1))
However, this is only returning one result. It should return client_profiles with id 9 and 11.
What is wrong with my subquery?
It looks like you were trying to filter twice on the client_profile table, once in the JOIN/ON clause and another time in the WHERE clause.
Moving everything in the where clause looks like this:
SELECT `cp`.*
FROM `contacts`
JOIN (
SELECT
`client_profiles`.`id`,
`client_profiles`.`contact_id`,
`client_profiles`.`created_at`
FROM `client_profiles`
ORDER BY created_at DESC
LIMIT 1
) cp ON `contacts`.`id` = `cp`.`contact_id`
Tell me what you think.
Should be something like maybe:
SELECT *
FROM `client_profiles`
INNER JOIN `contacts`
ON `contacts`.`id` = `client_profiles`.`contact_id`
GROUP BY `client_profiles`.`contact_id`
ORDER BY created_at desc;
http://sqlfiddle.com/#!2/a3f21b/9
You need to prequery the client profiles table grouped by each contact.. From that, re-join to the client to get the person, then again to the client profiles table based on same contact ID, but also matching the max date from the internal prequery using max( created_at )
SELECT
c.id,
c.first_name,
c.last_name,
IDByMaxDate.maxCreate,
cp.id as clientProfileID
from
( select contact_id,
MAX( created_at ) maxCreate
from
client_profiles
group by
contact_id ) IDByMaxDate
JOIN contacts c
ON IDByMaxDate.contact_id = c.id
JOIN client_profiles cp
ON IDByMaxDate.contact_id = cp.contact_id
AND IDByMaxDate.maxCreate = cp.created_at

Mysql to select rows group by with order by another column

I am trying to select the rows from a table by 'group by' and ignoring the first row got by sorting the data by date. The sorting should be done by a date field, to ignore the newest entry and returning the old ones for the group.
The table looks like
+----+------------+-------------+-----------+
| id | updated on | group_name | list_name |
+----+------------+----------------+--------+
| 1 | 2013-04-03 | g1 | l1 |
| 2 | 2013-03-21 | g2 | l1 |
| 3 | 2013-02-26 | g2 | l1 |
| 4 | 2013-02-21 | g1 | l1 |
| 5 | 2013-02-20 | g1 | l1 |
| 6 | 2013-01-09 | g2 | l2 |
| 7 | 2013-01-10 | g2 | l2 |
| 8 | 2012-12-11 | g1 | l1 |
+----+------------+-------------+-----------+
http://www.sqlfiddle.com/#!2/cec99/1
So, basically, I just want to return ids (3,4,5,6,8) as those are the oldest in the group_name and list_name. Ignoring the latest entry and returning the old ones by grouping it based on group_name and list_name
I am not able to write sql for this problem. I know order by will not work with group by. Please help me in figuring out a solution.
Thanks
And also, is there a way to do this without using subqueries?
Something like the following to get only the rows that are the minimum date for a specific row:
select a.ID, a.updated_on, a.group_name, list_name
from data a
where
a.updated_on <
(
select max(updated_on)
from data
group by group_name having group_name = a.group_name
);
SQL Fiddle: http://www.sqlfiddle.com/#!2/00d43/10
Update (based on your reqs)
select a.ID, a.updated_on, a.group_name, list_name
from data a
where
a.updated_on <
(
select max(updated_on)
from data
group by group_name, list_name having group_name = a.group_name
and list_name = a.list_name
);
See: http://www.sqlfiddle.com/#!2/cec99/3
Update (To not use Correlated Subquery but Simple subquery)
Decided correlated subquery is too slow based on: Subqueries vs joins
So I changed to joining with a aliased temporary table based on nested query.
select a.ID, a.updated_on, a.group_name, a.list_name
from data a,
(
select group_name, list_name , max(updated_on) as MAX_DATE
from data
group by group_name, list_name
) as MAXDATE
where
a.list_name = MAXDATE.list_name AND
a.group_name = MAXDATE.group_name AND
a.updated_on < MAXDATE.MAX_DATE
;
SQL Fiddle: http://www.sqlfiddle.com/#!2/5df64/8
You could try using the following query (yes, it has a nested join, but maybe it helps).
SELECT ID FROM
(select d1.ID FROM data d1 LEFT JOIN
data d2 ON (d1.group_name = d2.group_name AND d1.list_name=d2.list_name AND
d1.updated_on > d2.updated_on) WHERE d2.ID IS NULL) data_tmp;
CORRECTION:
SELECT DISTINCT(ID) FROM
(select d1.* FROM data d1 LEFT JOIN
data d2 ON (d1.group_name = d2.group_name AND d1.list_name=d2.list_name AND
d1.updated_on < d2.updated_on) WHERE d2.ID IS NOT NULL) date_tmp;
SELECT DISTINCT y.id
FROM data x
JOIN data y
ON y.group_name = x.group_name
AND y.list_name = x.list_name
AND y.updated_on < x.updated_on;