Incrementing count ONLY for duplicates in MySQL - mysql

Here is my MySQL table. I updated the question by adding an 'id' column to it (as instructed in the comments by others).
id data_id
1 2355
2 2031
3 1232
4 9867
5 2355
6 4562
7 1232
8 2355
I want to add a new column called row_num to assign an incrementing number ONLY for duplicates, as shown below. Order of the results does not matter.
id data_id row_num
3 1232 1
7 1232 2
2 2031 null
1 2355 1
5 2355 2
8 2355 3
6 4562 null
4 9867 null
I followed this answer and came up with the code below. But following code adds a count of '1' to non-duplicate values too, how can I modify below code to add a count only for duplicates?
select data_id,row_num
from (
select data_id,
#row:=if(#prev=data_id,#row,0) + 1 as row_num,
#prev:=data_id
from my_table
)t

If you are running MySQL 8.0, you can do this more efficiently with window functions only:
select
data_id,
case when count(*) over(partition by data_id) > 1
then row_number() over(partition by data_id order by data_id) row_num
end
from mytable
When the window count returns more than 1, you know that the current data_id has duplicates, in which case you can use row_number() to assign the incrementing number.
Note that, in absence of an ordering columns to uniquely identify each record within groups sharing the same data_id, it is undefined which record will actually get each number.

I am assuming that id is the column that defines the order on the rows.
In MySQL 8 you can use row_number() to get the number of each data_id and a CASE with EXISTS to exclude the rows which have no duplicate.
SELECT t1.data_id,
CASE
WHEN EXISTS (SELECT *
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id <> t1.id) THEN
row_number() OVER (PARTITION BY t1.data_id
ORDER BY t1.id)
END row_num
FROM my_table t1;
In older versions you can use a subquery counting the rows with the same data_id but smaller id. With an EXISTS in a HAVING clause you can exclude the rows that have no duplicate.
SELECT t1.data_id,
(SELECT count(*)
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id < t1.id
HAVING EXISTS (SELECT *
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id <> t1.id)) + 1 row_num
FROM my_table t1;
db<>fiddle

Join with a query that returns the number of duplicates.
select t1.data_id, IF(t2.dups > 1, row_num, '') AS row_num
from (
select data_id,
#row:=if(#prev=data_id,#row,0) + 1 as row_num,
#prev:=data_id
from my_table
order by data_id
) AS t1
join (
select data_id, COUNT(*) AS dups
FROM my_table
GROUP BY data_id
) AS t2 ON t1.data_id = t2.data_id

If you want to have the old "order" of the old table, you need much more code
SELECT
data_id, IF (row_num = 1 AND cntid = 1, NULL,row_num)
FROM
(SELECT
#row:=IF(#prev = t1.data_id, #row, 0) + 1 AS row_num,
cntid,
#prev:=t1.data_id data_id
FROM
(SELECT
*
FROM
my_table
ORDER BY data_id) t1
INNER JOIN (SELECT Count(*) cntid,data_id FROM my_table GROUP BY data_id)t2
ON t1.data_id = t2.data_id) t2
data_id | IF (row_num = 1 AND cntid = 1, NULL,row_num)
------: | -------------------------------------------:
1232 | 1
1232 | 2
2031 | null
2355 | 1
2355 | 2
2355 | 3
4562 | null
9867 | null
db<>fiddle here

Related

Group overlapping ranges of data in MySQL

Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.
SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.
If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;
You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13
I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549

select rows with condition in other rows

I want select rows from my table with last status_Id if there is a row with status_Id = 2 for that rows
ticketStatus_Id ticket_Id status_Id
======================================
1 1 1
2 1 2 -
3 1 3 *
4 2 1
5 3 1
6 3 2 - *
7 4 1
8 4 2 -
9 4 3
10 4 4 *
I want select just rows 3, 6, 10. there are another rows with status_Id = 2 (rows 2, 6, 8) for that ticket_Id,
In other word How to select rows 3,6,10 with ticket_Id =1,3,4 that there are another row with these ticket_Ids and status_Id=2 (rows 2,6,8)
If you want the complete row, then I would view this as exists:
select t.*
from t
where exists (select 1
from t t2
where t2.ticket_id = t.ticket_id and t2.status_id = 2
) and
t.status_Id = (select max(t2.status_id)
from t t2
where t2.ticket_id = t.ticket_id
);
If you just want the ticket_id and status_id (and not the whole row), I would recommend aggregation:
select ticket_id, max(status_id)
from t
group by ticket_id
having sum(status_id = 2) > 0;
In your case, ticketStatus_Id seems to increase with status_id, so you can use:
select max(ticketStatus_Id) as ticketStatus_Id, ticket_id, max(status_id) as Status_Id
from t
group by ticket_id
having sum(status_id = 2) > 0;
First, for each ticket we get the row with the highest status. We can do this with a self-join. Each row is joined with the row with the next highest status. We select the rows which have no higher status, those will be the highest. Here's a more detailed explanation.
select ts1.*
from ticket_statuses ts1
left outer join ticket_statuses ts2
on ts1.ticket_Id = ts2.ticket_Id
and ts1.status_Id < ts2.status_Id
where ts2.ticketStatus_Id is null
3 1 3
4 2 1
6 3 2
10 4 4
11 5 3
Note that I've added a curve-ball of 11, 5, 3 to ensure we only select tickets with a status of 2, not greater than 2.
Then we can use that as a CTE (or subquery if you're not using MySQL 8) and select only those tickets who have a status of 2.
with max_statuses as (
select ts1.*
from ticket_statuses ts1
left outer join ticket_statuses ts2
on ts1.ticket_Id = ts2.ticket_Id
and ts1.status_Id < ts2.status_Id
where ts2.ticketStatus_Id is null
)
select ms.*
from max_statuses ms
join ticket_statuses ts
on ms.ticket_id = ts.ticket_id
and ts.status_id = 2;
3 1 3
6 3 2
10 4 4
This approach ensures we select the complete rows with the highest statuses and any extra data they may contain.
dbfiddle
This is basicaly a "last row per group" problem. You will find some solutions here. My prefered solution would be:
select t.*
from (
select max(ticketStatus_Id) as ticketStatus_Id
from mytable
group by ticket_Id
) tmax
join mytable t using(ticketStatus_Id)
The difference in your question is that you have a condition requiring a specific value within the group. This can be solved with a JOIN within the subquery:
select t.*
from (
select max(t1.ticketStatus_Id) as ticketStatus_Id
from mytable t2
join mytable t1 using(ticket_Id)
where t2.status_Id = 2
group by t2.ticket_Id
) tmax
join mytable t using(ticketStatus_Id)
Result:
| ticketStatus_Id | ticket_Id | status_Id |
| --------------- | --------- | --------- |
| 3 | 1 | 3 |
| 6 | 3 | 2 |
| 10 | 4 | 4 |
View on DB Fiddle
A solution using window functions could be:
select ticketStatus_Id, ticket_Id, status_Id
from (
select *
, row_number() over (partition by ticket_Id order by ticketStatus_Id desc) as rn
, bit_or(status_Id = 2) over (partition by ticket_Id) > 0 as has_status2
from mytable
) x
where has_status2 and rn = 1
A quite expressive way is to use EXISTS and NOT EXISTS subquery conditions:
select t.*
from mytable t
where exists (
select *
from mytable t1
where t1.ticket_Id = t.ticket_Id
and t1.status_Id = 2
)
and not exists (
select *
from mytable t1
where t1.ticket_Id = t.ticket_Id
and t1.ticketStatus_Id > t.ticketStatus_Id
)
SELECT a.*
FROM t a
JOIN
(
SELECT ticket_id, MAX(status_id) max_status_id
FROM t
WHERE status_id >= 2
GROUP BY ticket_id
) b
ON a.ticket_id = b.ticket_id
AND a.status_id = b.max_status_id;
SELECT
MAX(m1.ticketstatus_Id) as ticket_status,
m1.ticket_Id as ticket,
MAX(m1.status_Id) as status
FROM mytable m1
WHERE
m1.ticket_Id in (select m2.ticket_Id from mytable m2 where m2.ticket_Id=m1.ticket_Id and m2.status_Id=2)
GROUP BY m1.ticket_Id

Filling nulls with average between neighbor values with restriction on another column

I have a table with column names "id", "time", "value"
and when "value" is null, I want it to be average between nearest neighbors by "time" column on that id
My problem is exactly that described here select nearest neighbours, but the answer doesn't explain how can I find nearest neighbors with a restriction on another column (id should be the same)
Example:
in second row "value" is missing
id | time | value
-------------------------
11111 | 1 | 5.0
11111 | 10 |
22222 | 7 | 32.6
33333 | 11 | 15.88
11111 | 15 | 20.0
and I want it to be:
id | time | value
-------------------------
11111 | 1 | 5.0
11111 | 10 | 12.5*
22222 | 7 | 32.6
33333 | 11 | 15.88
11111 | 15 | 20.0
as (20.0 + 5.0) / 2 = 12.5
How can it be obtained in MySQL?
Assuming that time defines the order and is unique (a unique column and one that defines the order is necessary for this), one method is to use subqueries getting the top (bottom) value of the records with a smaller (larger) time using ORDER BY and LIMIT.
SELECT t1.id,
t1.time,
coalesce(t1.value,
((SELECT t2.value
FROM elbat t2
WHERE t2.id = t1.id
AND t2.time < t1.time
ORDER BY t2.time DESC
LIMIT 1)
+
(SELECT t2.value
FROM elbat t2
WHERE t2.id = t1.id
AND t2.time > t1.time
ORDER BY t2.time ASC
LIMIT 1)
)
/
2) value
FROM elbat t1;
db<>fiddle
But this only can fill gaps one row wide. If there can be larger gaps you'd have to define what are the next non null neighbours of these rows.
just join self, but be care for no NEXT_VALUE
SELECT ID_,
TIME_,
CASE
WHEN VALUE_ IS NULL THEN (LAST_VALUE + NEXT_VALUE) / 2
ELSE VALUE_
END AS REAL_VALUE
FROM (SELECT ROW_NUMBER () OVER (PARTITION BY ID_ ORDER BY TIME_ DESC)
NOW_ROW_NUM,
ID_,
TIME_,
VALUE_
FROM TESTTABLE)
LEFT JOIN (SELECT (ROW_NUMBER ()
OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
- 1
LAST_ROW_NUM,
ID_ AS LAST_ID,
VALUE_ AS LAST_VALUE
FROM TESTTABLE)
ON ID_ = LAST_ID AND NOW_ROW_NUM = LAST_ROW_NUM
LEFT JOIN (SELECT (ROW_NUMBER ()
OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
+ 1
NEXT_ROW_NUM,
ID_ AS NEXT_ID,
VALUE_ AS NEXT_VALUE
FROM TESTTABLE)
ON ID_ = LAST_ID AND NOW_ROW_NUM = NEXT_ROW_NUM
Just use lead() and lag(). The simplest answer is:
selet t.*
(case when value is null
then ( lag(value) over (partition by id order by time) + lead(value) over (partition by id order by time) ) / 2
else value
end) as new_value
from t;
This does not work for the first or last values. You can instead use:
selet t.*
(case when value is null
then ( avg(value) over (partition by id order by time rows between 1 preceding and 1 following)
else value
end) as new_value
from t;
This calculates the average based on available data in the preceding and succeeding rows.

Select duplicates while concatenating every one except the first

I am trying to write a query that will select all of the numbers in my table, but those numbers with duplicates i want to append something on the end that shows it as a duplicate. However I am not sure how to do this.
Here is an example of the table
TableA
ID Number
1 1
2 2
3 2
4 3
5 4
SELECT statement output would be like this.
Number
1
2
2-dup
3
4
Any insight on this would be appreciated.
if you mysql version didn't support window function. you can try to write a subquery to make row_number then use CASE WHEN to judgement rn > 1 then mark dup.
create table T (ID int, Number int);
INSERT INTO T VALUES (1,1);
INSERT INTO T VALUES (2,2);
INSERT INTO T VALUES (3,2);
INSERT INTO T VALUES (4,3);
INSERT INTO T VALUES (5,4);
Query 1:
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,(SELECT COUNT(*)
FROM T tt
where tt.Number = t1.Number and tt.id <= t1.id
) rn
FROM T t1
)t1
Results:
| id | Number |
|----|--------|
| 1 | 1 |
| 2 | 2 |
| 3 | 2-dup |
| 4 | 3 |
| 5 | 4 |
If you can use window function you can use row_number with window function to make rownumber by Number.
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,row_number() over(partition by Number order by id) rn
FROM T t1
)t1
sqlfiddle
I made a list of all the IDs that weren't dups (left join select) and then compared them to the entire list(case when):
select
case when a.id <> b.min_id then cast(a.Number as varchar(6)) + '-dup' else cast(a.Number as varchar(6)) end as Number
from table_a
left join (select MIN(b.id) min_id, Number from table_a b group by b.number)b on b.number = a.number
I did this in MS SQL 2016, hope it works for you.
This creates the table used:
insert into table_a (ID, Number)
select 1,1
union all
select 2,2
union all
select 3,2
union all
select 4,3
union all
select 5,4

Mysql derived table

Suppose i have a mysql table name table with fields
rank
date
id
The values are like:
10, 2012-01-01, 3
9, 2012-01-04, 3
5, 2012-01-07, 3
3, 2012-01-10, 3
10, 2012-01-01, 4
6, 2012-01-04, 4
7, 2012-01-07, 4
In a single sql, how can i get both last and first values sorted by date grouped by id?
I know how to get first one or last one
SELECT rank, id FROM
(SELECT rank, id FROM table ORDER BY date DESC) s GROUP BY id;
I would like that the fields returned to be somethink like: lastrank, firstrank and id.
Thank you
Try this:
select id,
max(if(MyOrder = 1, rank, null)) as FirstRank,
max(if(MyOrder = 2, rank, null)) as LastRank
from (
select t1.id, t1.rank, 1 MyOrder from t t1
left join t t2 on
t1.id = t2.id and t1.date > t2.date
where t2.date is null
union
select t1.id, t1.rank, 2 from t t1
left join t t2 on
t1.id = t2.id and t1.date < t2.date
where t2.date is null
) s
group by id
The result of this query taking your sampla data as input is:
+----+-----------+----------+
| ID | FIRSTRANK | LASTRANK |
+----+-----------+----------+
| 3 | 10 | 3 |
| 4 | 10 | 7 |
+----+-----------+----------+
I'm not fully sure that I understand your question, but I'm going to try to answer anyway.
SELECT min(rank), max(rank), id
FROM table
ORDER BY date DESC
GROUP BY id;
When grouping, you can use aggregate functions on the results to get specific samples from the groups.
Try this query -
SELECT
t1.*,
IF(t1.date = t2.min_date, 'FIRSTRANK', 'LASTRANK') rank_type
FROM table_rank t1
JOIN (
SELECT id, MAX(date) max_date, MIN(date) min_date FROM table_rank GROUP BY id
) t2
ON t1.id = t2.id AND (t1.date = t2.min_date OR t1.date = t2.max_date)
Involve GROUP_CONCAT(rank ORDER BY date), and use SUBSTRING_INDEX.