Delete duplicates with condition - mysql

I have the table contacts which contains duplicate records:
id name is_contacted created_at
I need to delete duplicates, but keep the first record(among the duplicates for each name) where is_contacted=1.
If among the record duplicates there are no records where is_contacted=1, just keep the first one.
This is what I have so far:
DELETE c1 FROM contacts c1
INNER JOIN contacts c2
WHERE
c1.id > c2.id AND
c1.name = c2.name;

Assuming that is_contacted's data type is BOOLEAN and id is the primary key of the table and this is the column that defines the order and which row should be considered first, use ROW_NUMBER window function to rank the rows of each name:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY is_contacted DESC, id) rn
FROM contacts
)
DELETE t
FROM contacts t INNER JOIN cte c
ON c.id = t.id
WHERE c.rn > 1;
ORDER BY is_contacted DESC, id returns the rows with is_contacted = 1 at the top (if they exist).
For versions of MySql prior to 8.0, without support of CTEs and winow functions, use a join of the table to a query that uses aggregation to get the id of the row that you want to keep:
DELETE t
FROM contacts t
INNER JOIN (
SELECT name,
COALESCE(MIN(CASE WHEN is_contacted THEN id END), MIN(id)) id
FROM contacts
GROUP BY name
) c ON c.name = t.name AND c.id <> t.id;

Below query will filter only records what you want.
You didn't mention what is primary key in your table, so I don't know how to join this back 1:1 with your whole table.
But if you are not able to determine primary key, they you can create new table using this query, drop original one and rename it to original one.
SELECT * FROM
(
SELECT *,
ROW_NUMBER(PARTITION BY name ORDER BY CASE WHEN is_contacted = 1 THEN -999999 else is_contacted END ) AS RN_
from contacts
) c
WHERE c.RN_ = 1

Related

How to select only the last date row of a joined Table?

I am joining Table B into A. Table A has the basic information I want to retrieve and also the unique ID.
Table B has multiple rows for each ID with another column with Dates. Now I only want to select the last Date of Table B and join in into A.
I found the MAX() function of SQL but it says the other fields are not in the GROUP BY clause or an aggregation function.
This is my (simplified) query:
SELECT
MAX("B"."ENDDATE") AS FINALEND,
"A."ID",
"A"."COLOR",
"A"."MAKE",
"A"."WHEELS",
FROM "A"
JOIN "B" ON "A"."ID" = "B"."ID"
My expected result is for each ID a row with the basic information from Table A and the last Date from all matching rows from Table B. My result now is multiple rows for every row in B.
Do I need to add a GROUP BY for ever other column? Or what am I missing?
Thanks for any input :)
On MySQL 8+, we can use ROW_NUMBER here:
WITH cte AS (
SELECT a.*, b.ENDDATE,
ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY b.ENDDATE DESC) rn
FROM A a
INNER JOIN B b ON b.ID = a.ID
)
SELECT ID, COLOR, MAKE, WHEELS, ENDDATE AS FINALEND
FROM cte
WHERE rn = 1;
On earlier versions of MySQL, we can join to a subquery which finds the latest record for each ID in the B table:
SELECT a.ID, a.COLOR, a.MAKE, a.WHEELS, b1.ENDDATE AS FINALEND
FROM A a
INNER JOIN B b1 ON b1.ID = a.ID
INNER JOIN
(
SELECT ID, MAX(ENDDATE) AS MAXENDDATE
FROM B
GROUP BY ID
) b2
ON b2.ID = b1.ID AND b2.MAXENDDATE = b1.ENDDATE;

Mysql where if condition

Basically I have a table all customer will have a default row with customer_group_id=0
But some of this customer will belong to customer_group_id=1
When that happen a new row is created for customer with customer_group_id=1 therefore now I have 2 rows for same customer but different customer_group_id.
Now when I fetch the data I need first to select * from customer table where customer_group_id =1 but if doesn't exist give me then with customer_group_id = 0 which is the default, and continue until it returns all data.
Anyone know the best way to achieve this fast?
UPDATE: screen shoot show 2 rows with same customer_id different customer_group_id:
I need to one or the other no both so hierarchy is: if customer_group_id=1 exist then return that row and ignore the ohter otherwise return default which is customer_group_id=0
My full query:
SELECT `main_table`.*, `secondTable`.* FROM `customer` AS `main_table`
LEFT JOIN `customer_group` AS `secondTable` ON main_table.customer_id = secondTable.customer_id
WHERE (secondTable.customer_group_id = '1' )
AND (`secondTable`.`is_active` = '1')
If you have only two groups, then aggregation is simple:
select customer_id, max(customer_group_id)
from t
group by customer_id;
In MySQL 8+, you can implement a more customer prioritization using row_number():
select t.*
from (select t.*,
row_number() over (partition by customer_id order by customer_group_id desc) as seqnum
from t
) t
where seqnum = 1;
What you can do is fetch all the rows for the customers with customer_group_id=1 and then use UNION ALL to fetch the rows of the customers that do not have any row with customer_group_id=1 by using NOT EXISTS:
select * from tablename
where customer_group_id=1
union all
select * from tablename t
where t.customer_group_id=0
and not exists (
select 1 from tablename
where customer_id = t.customer_id and customer_group_id=1
)
For the rows with customer_group_id = 0 verify that there are no rows with customer_group_id = 1 for the same customer_id. You can use a NOT EXISTS subquery:
SELECT c.*, g.*
FROM customer AS c
JOIN customer_group AS g ON c.customer_id = g.customer_id
WHERE NOT EXISTS (
SELECT *
FROM customer_group AS g2
WHERE g2.customer_id = c.customer_id
AND g2.customer_group_id = 1
AND g.customer_group_id = 0
)

Delete records with multiple conditions

I need to exclude duplicate records in the bidding_price column with the following conditions:
Table: bid_account
Columns to check:
id = PRIMARY KEY auction_id = ID of each product
bidding_price = inserted value (this must be checked for duplicity for each product)
bid_flag = must always equal the value of: 'd' bidding_type = must always equal the value of:: 's'
It will always exist equal records in the bidding_price column, which it can not have is equal records with the same product ID (auction_id).
Example of how it should not have:
auction_id | bidding_price
------10------------0.02
------10------------0.02
------11------------0.02
------11------------0.02
The correct would be:
auction_id | bidding_price
------10------------0.02
------11------------0.02
I tried with the following command:
DELETE ba
FROM bid_account ba JOIN
(SELECT ba2.auction_id, ba2.bidding_price, MAX(ba2.id) as max_id
FROM bid_account ba2
WHERE ba2.bid_flag = 'd' AND ba2.bidding_type = 's'
GROUP BY ba2.auction_id, ba2.bidding_price
) ba2
ON ba2.auction_id = ba.auction_id AND
ba2.bidding_price = ba.bidding_price AND
ba2.max_id < ba.id
WHERE ba.bid_flag = 'd' AND ba.bidding_type = 's' AND ba.auction_id = ba2.auction_id
The problem is that it deleted multiple records that it should not delete, did not do the validations correctly. How can I do it?
ID is your PRIMARY KEY in the table, so you can get MAX(id) to be your
Reservation ID,then use NOT IN to delete by ID without MAX(id)
You can try this.
DELETE ba FROM bid_account ba
WHERE ba.id NOT IN
(
SELECT max_id FROM
(
SELECT auction_id, bidding_price, MAX(id) max_id
FROM bid_account
WHERE bid_flag = 'd' AND bidding_type = 's'
GROUP BY auction_id, bidding_price
) t
)
sqlfiddle:http://sqlfiddle.com/#!9/0f2e5/1
EDIT
If you want to get lowest-value ID you could use MIN(id) in the subquery in the where clause
DELETE ba FROM bid_account ba
WHERE ba.id NOT IN
(
SELECT min_id FROM
(
SELECT auction_id, bidding_price, MIN(id) min_id
FROM bid_account
WHERE bid_flag = 'd' AND bidding_type = 's'
GROUP BY auction_id, bidding_price
) t
)
sqlfiddle:http://sqlfiddle.com/#!9/ffe92/1
You can delete it by using your PRIMARY KEY: ID.
It is used to uniquely identify the record for delete action.
See demo here: http://sqlfiddle.com/#!9/603d56/1
It makes use of selecting the ID within a subquery(selecting it twice will AVOID the error: target table cannot be specified for update). The subquery is similar to your query and it selects the id that will be retained. Using NOT IN means delete the rest of the rows not equal to these ids in the subquery.
delete e.*
from bid_account e
where e.id not in (
select id from (
select a.id
from bid_account a
join bid_account b
on a.auction_id=b.auction_id
and a.bidding_price=b.bidding_price
and a.bid_flag=b.bid_flag
and a.bidding_type=b.bidding_type
where a.bid_flag='d' and a.bidding_type='s'
and a.id < b.id) tt);
The below statement should give you all the records that you need.
SELECT ba2.auction_id, ba2.bidding_price, MAX(ba2.id) as max_id
FROM bid_account ba2
WHERE ba2.bid_flag = 'd' AND ba2.bidding_type = 's'
GROUP BY ba2.auction_id, ba2.bidding_price;
A not-in clause should give you all the records that you don't need.
So, delete from bid_account where id not in (#sub query to fetch required ids#) and ba.bid_flag = 'd' AND ba.bidding_type = 's'; should delete the duplicate records.

UPDATE table using IN and COUNT

I am updating my table setting a field named "status" based on the condition that the total number of distinct rows should be more than 10 and less than 13. The query is as follows:
update myTable set status='Established'
where id IN(select id, count(*) as c
from myTable
where year>=1996 and year<=2008
group by id
having count(distinct year)>=10 and count(distinct year)<=13)
The problem is, I'm getting error1241 that is "operand should contain 1 column"! Could you please advise how can I solve this? Thanks!
The result of the sub query must return only 1 column :
update myTable set status='Established'
where id IN(select id
from myTable
group by id
having count(distinct year)>=10 and count(distinct year)>=13)
In MySQL, an update with a join often performs better than an update with a subquery in the where clause.
This version might have better performance:
update myTable join
(select id, count(*) as c
from myTable
where year >= 1996 and year <= 2008
group by id
having count(distinct year) >= 10 and count(distinct year) <= 13
) filter
on myTable.id = filter.id
set status = 'Established';
I will also note that you have a table where a column called id is not unique among the rows. Typically, such a column would be a primary key, so the having clause would always fail (there would only be one row).
update myTable
set status='Established'
where id IN(select id from myTable
group by id
having count(distinct year)>=10
and count(distinct year)>=13)
You are using IN operator and then you inner query returns two columns id and count(*) it should return only one column back.

Keep all records in "WHERE IN()" clause, even if they are not found

I have the following mysql query:
SELECT id, sum(views) as total_views
FROM table
WHERE id IN (1,2,3)
GROUP BY id
ORDER BY total_views ASC
If only id 1,3 are found in the database, i still want id 2 to appear, with total_views being set to 0.
Is there any way to do that? This cannot use any other table.
This query hard-codes the list of possible IDs using a sub-query consisting of unions... it then left joins this set of ids to the table containing the information to be counted.
This will preserve an ID in your results even if there are no occurrences:
SELECT ids.id, sum(views) as total_views
FROM (
SELECT 1 AS ID
UNION ALL SELECT 2 AS ID
UNION ALL SELECT 3 AS ID
) ids
LEFT JOIN table
ON table.ID = ids.ID
GROUP BY ids.id
ORDER BY total_views ASC
Alternately, if you had a numbers table, you could do the following query:
SELECT numbers.number, sum(views) as total_views
FROM
numbers
LEFT JOIN table
ON table.ID = ids.ID
WHERE numbers.number IN (1, 2, 3)
GROUP BY numbers.number
ORDER BY total_views ASC
Here's an alternative to Micheal's solution (not a bad solution, mind you -- even with "a lot" of ID's), so long as you're not querying against a cluster.
create temporary table __ids (
id int unsigned primary key
) engine=MEMORY;
insert into __ids (id) values
(1),
(2),
(3)
;
SELECT table.id, sum(views) as total_views
FROM __ids left join table using (id)
GROUP BY table.id
ORDER BY total_views ASC
And if your query becomes complex, I could even conceive of it running more efficiently this way. But, if I were you, I'd benchmark this option with Michael's ad-hoc UNION'ed table option using real data.
in #Michael's answer, if you do have a table with the ids you care about, you can use it as "ids" in place of Michael's in-line data.
Check this fiddle... http://www.sqlfiddle.com/#!2/a9392/3
Select B.ID, sum(A.views) sum from tableB B
left outer join tableA A
on B.ID = A.ID
group by A.ID
also check
http://www.sqlfiddle.com/#!2/a1bb7/1
try this
SELECT id
(CASE 1
IF EXISTS THEN views = mytable.views END
IF NOT EXIST THEN views = 0 END
CASE 2
IF EXISTS THEN views = mytable.views END
IF NOT EXIST THEN views = 0 END
CASE 3
IF EXISTS THEN views = mytable.views END
IF NOT EXIST THEN views = 0 END), sum(views) as total_views
FROM mytable
WHERE id IN (1,2,3)
GROUP BY id
ORDER BY total_views ASC
Does it have to be rows or could you pivot the data to give you one row and a column for every id?
SELECT
SUM(IF (id=1, views, 0)) views_1,
SUM(IF (id=2, views, 0)) views_2,
SUM(IF (id=3, views, 0)) views_3
FROM table