How to remove duplicate rows in two columns with group by? - mysql

I'm trying remove the duplicate rows using this query but MySQL don't return nothing and crash:
DELETE FROM project_category WHERE prc_id IN (SELECT prc_id
FROM project_category
GROUP BY prc_proid, prc_catid
HAVING COUNT(*) > 1)
I want remove the duplication:
+--------+-----------+-----------+
| prc_id | prc_proid | prc_catid |
+--------+-----------+-----------+
| 1691 | 207 | 16 |
| 1692 | 207 | 16 |
+--------+-----------+-----------+

MySql does not allow direct reference to the table where the DELETE takes place in the WHERE clause. Do it like this:
DELETE FROM project_category
WHERE prc_id IN (
SELECT prc_id FROM (
SELECT prc_id
FROM project_category
GROUP BY prc_proid, prc_catid
HAVING COUNT(*) > 1
) t
)

Related

Formating RIGHT JOIN query with multiple joins

I have SQL query which works:
SELECT table1.bike_id
FROM
(
SELECT bike_id
FROM `bike_filters`
WHERE (`bike_category_id` in (416,11111))
) as table1
RIGHT JOIN (
SELECT bike_id
FROM `bike_filters`
WHERE (`bike_category_id` in (5555,779))
) as table2 ON table1.bike_id = table2.bike_id
GROUP BY bike_id
But I need to add more RIGHT JOINs lines, maybe 5 or more. How to form the query in the right way? I'm searching in the same table, but joining several records in one query to get bike_id, which fits all conditions.
The purpose of this query is to get bike_ids, which has all parameters by the query - bike can have 20 filters, but if user searches by 5 and bike matches them, we get bike_id by this query.
Table Structure:
| id | bike_id | bike_category_id |
| 1 | 3 | 416 |
| 2 | 3 | 779 |
| 3 | 3 | 344 |
| 4 | 3 | 332 |
| 5 | 4 | 444 |
| 5 | 5 | 555 |
I need something like this, this one is incorrect:
SELECT table1.bike_id
FROM
(
SELECT bike_id
FROM `bike_filters`
WHERE (`bike_category_id` IN (416,11111))
) AS table1
RIGHT JOIN (
SELECT bike_id
FROM `bike_filters`
WHERE (`bike_category_id` IN (5555,779))
) AS table2
RIGHT JOIN (
SELECT bike_id
FROM `bike_filters`
WHERE (`bike_category_id` IN (5555,344))
) AS table3
RIGHT JOIN (
SELECT bike_id
FROM `bike_filters`
WHERE (`bike_category_id` IN (5555,332))
) AS table4
GROUP BY bike_id
You can use aggregation, and put all the conditions in the HAVING clause, as follows:
SELECT bike_id
FROM bike_filters
GROUP BY bike_id
HAVING
MAX(bike_category_id in (416,11111)) = 1
AND MAX(bike_category_id in (5555,779)) = 1
This will return all bike_ids that :
have category 416 or 11111
and have category 5555 or 779
You can extend the HAVING clause as per your requirements.

mysql subquery not producing all results

I have two tables: contacts and client_profiles. A contact has many client_profiles, where client_profiles has foreign key contact_id:
contacts:
mysql> SELECT id,first_name, last_name FROM contacts;
+----+-------------+-----------+
| id | first_name | last_name |
+----+-------------+-----------+
| 10 | THERESA | CAMPBELL |
| 11 | donato | vig |
| 12 | fdgfdgf | gfdgfd |
| 13 | some random | contact |
+----+-------------+-----------+
4 rows in set (0.00 sec)
client_profiles:
mysql> SELECT id, contact_id, created_at FROM client_profiles;
+----+------------+---------------------+
| id | contact_id | created_at |
+----+------------+---------------------+
| 6 | 10 | 2014-10-09 17:17:43 |
| 7 | 10 | 2014-10-10 11:38:01 |
| 8 | 10 | 2014-10-10 12:20:41 |
| 9 | 10 | 2014-10-10 12:24:19 |
| 11 | 12 | 2014-10-10 12:35:32 |
+----+------------+---------------------+
I want to get the latest client_profiles for each contact. That means There should be two results. I want to use subqueries to achieve this. This is the subquery I came up with:
SELECT `client_profiles`.*
FROM `client_profiles`
INNER JOIN `contacts`
ON `contacts`.`id` = `client_profiles`.`contact_id`
WHERE (client_profiles.id =
(SELECT `client_profiles`.`id` FROM `client_profiles` ORDER BY created_at desc LIMIT 1))
However, this is only returning one result. It should return client_profiles with id 9 and 11.
What is wrong with my subquery?
It looks like you were trying to filter twice on the client_profile table, once in the JOIN/ON clause and another time in the WHERE clause.
Moving everything in the where clause looks like this:
SELECT `cp`.*
FROM `contacts`
JOIN (
SELECT
`client_profiles`.`id`,
`client_profiles`.`contact_id`,
`client_profiles`.`created_at`
FROM `client_profiles`
ORDER BY created_at DESC
LIMIT 1
) cp ON `contacts`.`id` = `cp`.`contact_id`
Tell me what you think.
Should be something like maybe:
SELECT *
FROM `client_profiles`
INNER JOIN `contacts`
ON `contacts`.`id` = `client_profiles`.`contact_id`
GROUP BY `client_profiles`.`contact_id`
ORDER BY created_at desc;
http://sqlfiddle.com/#!2/a3f21b/9
You need to prequery the client profiles table grouped by each contact.. From that, re-join to the client to get the person, then again to the client profiles table based on same contact ID, but also matching the max date from the internal prequery using max( created_at )
SELECT
c.id,
c.first_name,
c.last_name,
IDByMaxDate.maxCreate,
cp.id as clientProfileID
from
( select contact_id,
MAX( created_at ) maxCreate
from
client_profiles
group by
contact_id ) IDByMaxDate
JOIN contacts c
ON IDByMaxDate.contact_id = c.id
JOIN client_profiles cp
ON IDByMaxDate.contact_id = cp.contact_id
AND IDByMaxDate.maxCreate = cp.created_at

How to expand and loop through a group of

The following SQL command lists the hash values which can be found on multiple objects.
SELECT * FROM (
SELECT
MIN(id) AS id,
hash,
status,
count(*) AS count
FROM foobar
GROUP BY hash
ORDER BY count
) AS t
WHERE count > 1;
...
+------+----------------------------------+--------+-------+
| id | hash | status | count |
+------+----------------------------------+--------+-------+
| 4523 | e4266978b1d99dffbf3a6e0b880a2c5e | 0 | 3 |
| 828 | 9414c7478416b7a40846d66e12df9370 | 0 | 4 |
| 293 | bfc742499fd97c4c8e36f57cdd0fa0e0 | 0 | 5 |
| 244 | ec408e4678789f7983f83a9c330ab8e4 | 0 | 14 |
+------+----------------------------------+--------+-------+
4 rows in set (0.02 sec)
I want to update the status of each item within the groups. For a single group this can be done as follows:
UDPATE foobar SET status = 9 WHERE hash = "e4266978b1d99dffbf3a6e0b880a2c5e";
How can I update the status of each 26 individual items? The solution should run on MySQL at least.
You can use
UPDATE foobar
SET status = 9
WHERE hash IN (
SELECT hash FROM (
SELECT
hash,
count(*) AS count
FROM foobar
GROUP BY hash
) AS t
WHERE count > 1
)
It will work in MySQL because MySQL will create a temporary table for the select with the GROUP BY.
UPDATE foobar t1
LEFT OUTER JOIN foobar t2 ON t1.id <> t2.id AND t1.hash = t2.hash
WHERE t2.id IS NOT NULL
SET t1.status = 9
Find all the hashes that are duplicated and then use that as the criteria for your update query:
UPDATE foobar
SET status = 9
WHERE hash IN (SELECT hash
FROM foobar
GROUP BY hash
HAVING COUNT(*) > 1)

Select the fields are duplicated in mysql

Assuming that I have the below customer_offer table.
My question is:
How to select all the rows where the key(s) are duplicated in that table?
+---------+-------------+------------+----------+--------+---------------------+
| link_id | customer_id | partner_id | offer_id | key | date_updated |
+---------+-------------+------------+----------+--------+---------------------+
| 1 | 99 | 11 | 14 | mmmmmq | 2011-09-21 12:40:46 |
| 2 | 100 | 11 | 14 | qmmmmq | 2011-09-21 12:40:46 |
| 3 | 101 | 11 | 14 | 8mmmmq | 2011-09-21 12:40:46 |
| 4 | 99 | 11 | 14 | Dmmmmq | 2011-09-21 12:59:28 |
| 5 | 100 | 11 | 14 | Nmmmmq | 2011-09-21 12:59:28 |
+---------+-------------+------------+----------+--------+---------------------+
UPDATE:
Thanks so much for all your answer. There are many answers are good. Now I got the solution to do.
select *
from customer_offer
where key in
(select key from customer_offer group by key having count(*) > 1)
Update:
As mentioned from #Scorpi0, if with a big table, it is better to use join. And from mysql6.0 the new optimizer will convert this kind of subqueries into joins.
Self join
SELECT * FROM customer_offer c1 inner join customer_offer c2
on c1.key = c2.key
or group by the field then take when count > 1
SELECT COUNT(key),link_id FROM customer_offer c1
group by key, link_id
having COUNT(Key) > 1
SELECT DISTINCT c1.*
FROM customer_offer c1
INNER JOIN customer_offer c2
ON c1.key = c2.key
AND c1.link_id != c2.link_id
Assuming link_id is a primary key.
Use a sub-query to do the count check, and the main query to select the rows. The count check query is simply:
SELECT `link_id` FROM `customer_offer` GROUP BY `key` HAVING COUNT(`key`) > 1
Then the outer query will use this by joining into it:
SELECT customer_offer.* FROM customer_offer
INNER JOIN (SELECT `link_id` FROM `customer_offer` GROUP BY `key` HAVING COUNT(`key`) > 1) AS count_check
ON customer_offer.link_id = count_check.link_id
There are many threads on the mysql website which explains how to do this. This link will explain how to do this using mysql: http://forums.mysql.com/read.php?10,180556,180567#msg-180567
As a brief example the code below is from the link with a slight modification which better suits your example.
SELECT *
FROM tbl
GROUP BY key
HAVING COUNT(key)>1;
You can also use a joing which is my prefered method, as this removes the slower count method:
SELECT *
FROM this_table t
inner join this_table t1 on t.key = t1.key
SELECT link_id, key, count(key) as Occurrences
FROM table
GROUP BY key
HAVING COUNT(key)>1;

How to improve this query?

I have a table,
| PAGELETS | CREATE TABLE `PAGELETS` (
`page_key` int(32) unsigned NOT NULL,
`pagelet_serial` int(32) unsigned NOT NULL,
`pagelet_shingle` int(32) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
I would like to:
1) Find all the pagelet_shingles where quantity > 1 ( occurs more than once)
2) out of these only output those that have different page_key
This is the query that produces the a semi-correct answer:
SELECT * FROM PAGELETS WHERE pagelet_shingle IN( SELECT pagelet_shingle FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(DISTINCT page_key) > 1) ORDER BY pagelet_shingle;
Unfortunately, on a small dataset it takes about 18 seconds;
I have another query,
SELECT dt1.* FROM
(SELECT * FROM PAGELETS
GROUP BY page_key, pagelet_shingle HAVING COUNT(*) = 1)
dt1 JOIN
(SELECT * FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(*) > 1)
dt2 USING (pagelet_shingle) ORDER BY pagelet_shingle
given by an expert which is not technically correct (something to do with you can't SELECT * .. GROUP ) but produces results that are A LOT faster, with the case where
SELECT * FROM PAGELETS WHERE pagelet_shingle=57
+----------+----------------+-----------------+
| page_key | pagelet_serial | pagelet_shingle |
+----------+----------------+-----------------+
| 1 | 99 | 57 |
| 1 | 99 | 57 |
| 2 | 228 | 57 |
| 2 | 228 | 57 |
+----------+----------------+-----------------+
The semi-correct query produces
+----------+----------------+-----------------+
| page_key | pagelet_serial | pagelet_shingle |
+----------+----------------+-----------------+
| 1 | 99 | 57 |
| 1 | 99 | 57 |
| 2 | 228 | 57 |
| 2 | 228 | 57 |
+----------+----------------+-----------------+
While the incorrect query doesn't have pagelet_shingle =57 in its resultset
My desired result is to have
+----------+----------------+-----------------+
| page_key | pagelet_serial | pagelet_shingle |
+----------+----------------+-----------------+
| 1 | 99 | 57 |
| 2 | 228 | 57 |
+----------+----------------+-----------------+
Each occuring once only.
a pagelet_shingle occuring twice in the same pagelet_serial will be omitted.
So I would like to ask the following:
1) Is there a way to to speed up the csemi orrect query to reach the speed of the incorrect one
2) or is there a way to fix the incorrect one to produce the result of the correct one ( I don't care about strictness )
Sounds like SELECT DISTINCT p.* ... would be your choice.
P.S. And I would really recommend the second one! make everything slow (like you just noticed) and should only be used where necessary.
doesn't this query solve your issue?
SELECT dt1.* FROM
(SELECT DISTINCT * FROM PAGELETS
GROUP BY page_key, pagelet_shingle HAVING COUNT(*) = 1)
dt1 JOIN
(SELECT * FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(*) > 1)
dt2 USING (pagelet_shingle) GROUP BY pagelet_shingle
use GROUP BY and HAVING, e.g.
SELECT *
FROM `pagelets`
GROUP BY `pagelet_shingle`
HAVING COUNT(*) > 1
additionally you can do a self join to output all columns, though in mysql it should work that way (different from SQL standard)
What is
SELECT * FROM PAGELETS GROUP BY pagelet_serial, pagelet_shingle HAVING COUNT(*) > 0
giving you?
Judging from what I read, what you are looking for is:
SELECT DISTINCT p1.page_key, p1.pagelet_serial, p1.pagelet_shingle
FROM PAGELETS p1
JOIN PAGELETS p2 ON p2.page_key = p1.page_key
AND p2.pagelet_serial = p1.pagelet_serial
AND p2.pagelet_shingle <> p1.pagelet_shingle
That query would make full use of an index on (page_key, pagelet_serial) and should complete in tenth of seconds, not seconds.
If this was not what you were looking for, please show us what result you would expect if the values in your table were those: (1,2,3),(1,2,3),(1,1,3),(1,1,3),(1,2,4),(1,2,4),(1,1,4),(1,1,4)
Have you tried using exists instead of in ?
Check this out:
http://decipherinfosys.wordpress.com/2007/01/30/in-vs-exists/
Hope this helps