Remove duplicates only when they occur in pair per group - mysql

In the following data:
id_1
id_2
colA
colB
colC
1
2
2022-01-02
scroll fast
12
1
2
2022-01-02
scroll fast
12
1
3
2022-01-03
scroll fast fast
11
1
3
2022-01-03
scroll fast fast
11
1
3
2022-01-03
scroll fast fast
11
I would like to remove duplicates only when they occur in pair. The output table should have one of the rows from the first two rows and remaining rows should be left as is.
The output table would be
id_1
id_2
colA
colB
colC
1
2
2022-01-02
scroll fast
12
1
3
2022-01-03
scroll fast fast
11
1
3
2022-01-03
scroll fast fast
11
1
3
2022-01-03
scroll fast fast
11
I have large dataset, so I cannot filter based on Id_1 and id_2. I am looking for a generic solution.
Test data is here at Sqlfiddle.

We can use a combination of COUNT() and ROW_NUMBER(), both as analytic functions:
WITH cte AS (
SELECT t.*, COUNT(*) OVER (PARTITION BY id_1, id_2) cnt,
ROW_NUMBER() OVER (PARTITION BY id_1, id_2 ORDER BY id_1) rn
FROM yourTable t
)
SELECT id_1, id_2, colA, colB, colC
FROM cte
WHERE cnt <> 2 OR rn = 1;
The above logic returns rows appearing only once or appearing 3 times or more. In the case of pair duplicates, it arbitrarily returns one of them.

You can use a MERGE statement and count the size of each group using analytic functions and then delete the second row of a group when there are only 2 rows per group:
MERGE INTO table_name dst
USING (
SELECT ROWID AS rid,
ROW_NUMBER() OVER (PARTITION BY id_1, id_2, colA, colB, colC ORDER BY ROWNUM)
AS rn,
COUNT(*) OVER (PARTITION BY id_1, id_2, colA, colB, colC) AS cnt
FROM table_name
) src
ON (dst.ROWID = src.rid AND src.rn = 2 AND src.cnt = 2)
WHEN MATCHED THEN
UPDATE SET colC = colC
DELETE WHERE 1 = 1;
Which, for the sample data:
create table table_name (id_1, id_2, colA, colB, colC) AS
SELECT 1, 2, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 1, 2, 'A', 'B', 'C' FROM DUAL UNION ALL
SELECT 1, 3, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 1, 3, 'D', 'E', 'F' FROM DUAL UNION ALL
SELECT 1, 3, 'D', 'E', 'F' FROM DUAL;
Then after the MERGE the table contains:
ID_1
ID_2
COLA
COLB
COLC
1
2
A
B
C
1
3
D
E
F
1
3
D
E
F
1
3
D
E
F
fiddle

Related

MySQL select from multiple tables, keep all columns and row without match

I have 2 tables
tableA:
id
dateA
colA
...
1
2022-11-11 12:00:00
A
2
2022-11-12 12:00:00
B
3
2022-11-14 12:00:00
C
tableB:
id
dateB
colB
...
3
2022-11-05 12:00:00
D
4
2022-11-06 12:00:00
E
5
2022-11-13 12:00:00
F
and I want put all rows to one result and sort it by column date
Wanted result (rows from both tables sorted by column date DESC):
id
date
colA
colB
...
...
3
2022-11-14 12:00:00
C
5
2022-11-13 12:00:00
F
2
2022-11-12 12:00:00
B
1
2022-11-11 12:00:00
A
4
2022-11-06 12:00:00
E
3
2022-11-05 12:00:00
D
I can combine tables, but tables are "squashed"...
SELECT
COALESCE(a.id, b.id) AS id,
COALESCE(a.dateA, b.dateB) AS date,
a.colA,
b.colB
FROM tableA AS a, tableB AS b
ORDER BY date DESC
Use UNION ALL and ORDER BY. This requires enumerating the columns:
select id, dateA as dateX, colA, null as colB from tableA
union all
select id, dateB, null, colB from tableB
order by dateX
union all combines two datasets. Since we have different columns in the two tables, we need to arrange both select clauses so that they return the same set of columns ; null values can be used to fill in "missing" columns in a given dataset.
As for sorting the resultset : in MySQL, a single ORDER BY in a UNION ALL query applies to the whole resultset (after it was unioned). The query just does that (using dateX, the column alias that was created in the subqueries).
In case you want to show the rows that share same id and date as one row, you can build on top of GMB's answer:
select id, dateX as date, max(colA) as colA, max(colB) as colB
from (
select id, dateA as dateX, colA, null as colB from tableA
union all
select id, dateB, null, colB from tableB
) as q
group by id, dateX
order by dateX
See db-fiddle

Select duplicates while concatenating every one except the first

I am trying to write a query that will select all of the numbers in my table, but those numbers with duplicates i want to append something on the end that shows it as a duplicate. However I am not sure how to do this.
Here is an example of the table
TableA
ID Number
1 1
2 2
3 2
4 3
5 4
SELECT statement output would be like this.
Number
1
2
2-dup
3
4
Any insight on this would be appreciated.
if you mysql version didn't support window function. you can try to write a subquery to make row_number then use CASE WHEN to judgement rn > 1 then mark dup.
create table T (ID int, Number int);
INSERT INTO T VALUES (1,1);
INSERT INTO T VALUES (2,2);
INSERT INTO T VALUES (3,2);
INSERT INTO T VALUES (4,3);
INSERT INTO T VALUES (5,4);
Query 1:
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,(SELECT COUNT(*)
FROM T tt
where tt.Number = t1.Number and tt.id <= t1.id
) rn
FROM T t1
)t1
Results:
| id | Number |
|----|--------|
| 1 | 1 |
| 2 | 2 |
| 3 | 2-dup |
| 4 | 3 |
| 5 | 4 |
If you can use window function you can use row_number with window function to make rownumber by Number.
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,row_number() over(partition by Number order by id) rn
FROM T t1
)t1
sqlfiddle
I made a list of all the IDs that weren't dups (left join select) and then compared them to the entire list(case when):
select
case when a.id <> b.min_id then cast(a.Number as varchar(6)) + '-dup' else cast(a.Number as varchar(6)) end as Number
from table_a
left join (select MIN(b.id) min_id, Number from table_a b group by b.number)b on b.number = a.number
I did this in MS SQL 2016, hope it works for you.
This creates the table used:
insert into table_a (ID, Number)
select 1,1
union all
select 2,2
union all
select 3,2
union all
select 4,3
union all
select 5,4

MySQL query with GROUP BY, SUM, MAX and subquery

I have the following tabel structure:
Id Num1 Num2 Type Num3
1 2 2 1 4
1 3 1 2 5
1 1 1 3 2
2 2 1 1 3
2 0 1 2 2
2 4 3 3 6
I need a query with group by 'Id', sum of 'Num1', sum of 'Num2', max of 'Num3' and the 'Type' related to the MAX of 'Num3'.
So, the desired output is:
Id Sum(Num1) Sum(Num2) type Max(Num3)
1 6 4 2 5
2 6 4 3 6
Without this related 'Type' the query below works fine:
SELECT
Id,
SUM(Num1),
SUM(Num2),
MAX(Num3)
GROUP BY
Id
I tried different methods of subquery but can't make it work yet.
Your problem is a bit of a spin on the greatest value per group problem. In this case, we can use a subquery to find the max Num3 value for each Id. But, in the same subquery we also compute the sum aggregates.
SELECT
t1.Id,
t2.s1,
t2.s2,
t1.Type,
t1.Num3
FROM yourTable t1
INNER JOIN
(
SELECT Id, SUM(Num1) AS s1, SUM(Num2) AS s2, MAX(Num3) AS m3
FROM yourTable
GROUP BY Id
) t2
ON t1.Id = t2.Id AND t1.Num3 = t2.m3;
As a hat tip to MySQL 8+, and to ward off evil spirits, we can also write a query using analytic functions:
SELECT Id, s1, s2, Type, Num3
FROM
(
SELECT
Id,
SUM(Num1) OVER (PARTITION BY Id) s1,
SUM(Num2) OVER (PARTITION BY Id) s2,
Type,
Num3,
MAX(Num3) OVER (PARTITION BY Id) m3
FROM yourTable
) t
WHERE Num3 = m3;

MySQL Select parents and childs in proper order with single query

I have a MySQL table with following data:
ID Name ParentID
1 Foo null
2 Bar null
3 Foo SubA 1
4 Bar SubA 2
5 Foo SubC 1
6 Foo SubB 1
I would like to retreive all data with following order:
1 Foo null
3 Foo SubA 1
6 Foo SubB 1
5 Foo SubC 1
2 Bar null
4 Bar SubA 2
Is it possible with MySQL and single query?
If this is a two-level hierarchie, i.e. no grandparents and grandchildren, it's a mere ORDER BY clause:
select id, name, parentid
from mytable
order by coalesce(parentid, id), parentid is not null, name;
This makes use of MySQL's true = 1, false = 0. parentid is not null is 0 for the parent and 1 for the children.
You could use recursive CTE (MySQL 8.0+):
-- 2 level hierarchy (parent-child)
WITH RECURSIVE cte AS
(
SELECT tx.*, 1 AS lvl, ID AS grp FROM tx WHERE ParentID IS NULL
UNION ALL
SELECT tx.*, lvl+1, cte.ID FROM tx JOIN cte WHERE tx.ParentId = cte.Id
)
SELECT ID, Name, ParentId
FROM cte
ORDER BY grp, lvl, Name;
DBFiddle Demo

Minimum values from the table

I need the minimum value from 3 coloumn and corresponding name for the min value,
like this..
Name val1 val2 val3
a 12 5 4
b 10 9 1
c 7 11 5
d 13 8 2
output:
Name MIN
b 1
I wrote query to find minimum value :
select MIN(less)
from (
select case
when val1<=val2 and val1<=val3 then val1
when val2<=val1 and val2<=val3 then val2
when val3<=val1 and val3<=val2 then val3 end as less from table) as low
I used alises,i want to display the corresponding name from the table...plz tell me the query...
You can do it using the UNION operator to convert the 3 column table into a single table with 1 column.
SELECT TOP 1 Name, Val AS Min
FROM (
SELECT Name, val1 AS Val
FROM table
UNION
SELECT Name, val2 AS Val
FROM table
UNION
SELECT Name, val3 AS Val
FROM table
) AS sub_query
ORDER BY Val ASC
This solution has the added advantage that it is easier to maintain if the number of columns increases.
Most Concise
SELECT top 1 Name,col,val
FROM T
UNPIVOT ( val for col in (val1,val2,val3)) unpvt
ORDER BY val
Most Efficient (assuming these columns are indexed)
;WITH cte(Name, col, val) AS
(
SELECT TOP 1 Name, 'val1', val1
FROM T
ORDER BY val1
UNION ALL
SELECT TOP 1 Name, 'val2', val2
FROM T
ORDER BY val2
UNION ALL
SELECT TOP 1 Name, 'val3', val3
FROM T
ORDER BY val3
)
SELECT TOP 1 Name, col, val
FROM cte
ORDER BY val