Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.
SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.
If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;
You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13
I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549
Related
I have a table with column names "id", "time", "value"
and when "value" is null, I want it to be average between nearest neighbors by "time" column on that id
My problem is exactly that described here select nearest neighbours, but the answer doesn't explain how can I find nearest neighbors with a restriction on another column (id should be the same)
Example:
in second row "value" is missing
id | time | value
-------------------------
11111 | 1 | 5.0
11111 | 10 |
22222 | 7 | 32.6
33333 | 11 | 15.88
11111 | 15 | 20.0
and I want it to be:
id | time | value
-------------------------
11111 | 1 | 5.0
11111 | 10 | 12.5*
22222 | 7 | 32.6
33333 | 11 | 15.88
11111 | 15 | 20.0
as (20.0 + 5.0) / 2 = 12.5
How can it be obtained in MySQL?
Assuming that time defines the order and is unique (a unique column and one that defines the order is necessary for this), one method is to use subqueries getting the top (bottom) value of the records with a smaller (larger) time using ORDER BY and LIMIT.
SELECT t1.id,
t1.time,
coalesce(t1.value,
((SELECT t2.value
FROM elbat t2
WHERE t2.id = t1.id
AND t2.time < t1.time
ORDER BY t2.time DESC
LIMIT 1)
+
(SELECT t2.value
FROM elbat t2
WHERE t2.id = t1.id
AND t2.time > t1.time
ORDER BY t2.time ASC
LIMIT 1)
)
/
2) value
FROM elbat t1;
db<>fiddle
But this only can fill gaps one row wide. If there can be larger gaps you'd have to define what are the next non null neighbours of these rows.
just join self, but be care for no NEXT_VALUE
SELECT ID_,
TIME_,
CASE
WHEN VALUE_ IS NULL THEN (LAST_VALUE + NEXT_VALUE) / 2
ELSE VALUE_
END AS REAL_VALUE
FROM (SELECT ROW_NUMBER () OVER (PARTITION BY ID_ ORDER BY TIME_ DESC)
NOW_ROW_NUM,
ID_,
TIME_,
VALUE_
FROM TESTTABLE)
LEFT JOIN (SELECT (ROW_NUMBER ()
OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
- 1
LAST_ROW_NUM,
ID_ AS LAST_ID,
VALUE_ AS LAST_VALUE
FROM TESTTABLE)
ON ID_ = LAST_ID AND NOW_ROW_NUM = LAST_ROW_NUM
LEFT JOIN (SELECT (ROW_NUMBER ()
OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
+ 1
NEXT_ROW_NUM,
ID_ AS NEXT_ID,
VALUE_ AS NEXT_VALUE
FROM TESTTABLE)
ON ID_ = LAST_ID AND NOW_ROW_NUM = NEXT_ROW_NUM
Just use lead() and lag(). The simplest answer is:
selet t.*
(case when value is null
then ( lag(value) over (partition by id order by time) + lead(value) over (partition by id order by time) ) / 2
else value
end) as new_value
from t;
This does not work for the first or last values. You can instead use:
selet t.*
(case when value is null
then ( avg(value) over (partition by id order by time rows between 1 preceding and 1 following)
else value
end) as new_value
from t;
This calculates the average based on available data in the preceding and succeeding rows.
I am trying to write a query that will select all of the numbers in my table, but those numbers with duplicates i want to append something on the end that shows it as a duplicate. However I am not sure how to do this.
Here is an example of the table
TableA
ID Number
1 1
2 2
3 2
4 3
5 4
SELECT statement output would be like this.
Number
1
2
2-dup
3
4
Any insight on this would be appreciated.
if you mysql version didn't support window function. you can try to write a subquery to make row_number then use CASE WHEN to judgement rn > 1 then mark dup.
create table T (ID int, Number int);
INSERT INTO T VALUES (1,1);
INSERT INTO T VALUES (2,2);
INSERT INTO T VALUES (3,2);
INSERT INTO T VALUES (4,3);
INSERT INTO T VALUES (5,4);
Query 1:
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,(SELECT COUNT(*)
FROM T tt
where tt.Number = t1.Number and tt.id <= t1.id
) rn
FROM T t1
)t1
Results:
| id | Number |
|----|--------|
| 1 | 1 |
| 2 | 2 |
| 3 | 2-dup |
| 4 | 3 |
| 5 | 4 |
If you can use window function you can use row_number with window function to make rownumber by Number.
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,row_number() over(partition by Number order by id) rn
FROM T t1
)t1
sqlfiddle
I made a list of all the IDs that weren't dups (left join select) and then compared them to the entire list(case when):
select
case when a.id <> b.min_id then cast(a.Number as varchar(6)) + '-dup' else cast(a.Number as varchar(6)) end as Number
from table_a
left join (select MIN(b.id) min_id, Number from table_a b group by b.number)b on b.number = a.number
I did this in MS SQL 2016, hope it works for you.
This creates the table used:
insert into table_a (ID, Number)
select 1,1
union all
select 2,2
union all
select 3,2
union all
select 4,3
union all
select 5,4
I have a table similar to the following in my database
+----+----+----+---------------------+
| id | a | b | date_created |
+----+----+----+---------------------+
| 1 | 22 | 33 | 2014-07-31 14:38:17 |
| 2 | 11 | 9 | 2014-07-30 14:40:19 |
| 3 | 8 | 4 | 2014-07-29 14:40:34 |
+----+----+----+---------------------+
I'm trying to write a query that subtracts sum(b) from each a. However, the values of b included in sum(b) should be only those that are earlier than (or the same time as) the a they are being subtracted from. In other words, the results returned by the query should be those shown below
22 - (33 + 9 + 4)
11 - (9 + 4)
8 - (4)
is it possible to calculate this in a single query?
select id, a, a - (select sum(b)
from My_TABLE T2
where T2.date_created <= T1.date_created)
from MY_TABLE T1;
Something like this should work:
select t1.a - ifnull( sum(t2.b), 0)
from myTable t1
left outer join myTable t2 on t2.date_created <= t1.date_created
group by t1.a
Note that the table is joined to itself to access two different sets of information.
Edit:
I think you probably want to group by the date_created like:
select t1.date_created, t1.a - ifnull( sum(t2.b), 0)
from myTable t1
left outer join myTable t2 on t2.date_created <= t1.date_created
group by t1.date_created, t1.a
SELECT x.*
, x.a - SUM(y.b)
FROM my_table x
JOIN my_table y
ON y.date_created <= x.date_created
GROUP
BY x.id;
Another alternative
SQL Fiddle Example
SELECT
id,
a - (#total := #total + b) as Total
FROM
(SELECT *, #total:=0
FROM my_table
ORDER BY date_created asc) AS Base
Suppose a table, tableX, like this:
| date | hours |
| 2014-07-02 | 10 |
| 2014-07-03 | 10 |
| 2014-07-07 | 20 |
| 2014-07-08 | 40 |
The dates are 'workdays' -- that is, no weekends or holidays.
I want to find the increase in hours between consecutive workdays, like this:
| date | hours |
| 2014-07-03 | 0 |
| 2014-07-07 | 10 |
| 2014-07-08 | 20 |
The challenge is dealing with the gaps. If there were no gaps, something like
SELECT t1.date1 AS 'first day', t2.date1 AS 'second day', (t2.hours - t1.hours)
FROM tableX t1
LEFT JOIN tableX t2 ON t2.date1 = DATE_add(t1.date1, INTERVAL 1 DAY)
ORDER BY t2.date1;
would get it done, but that doesn't work in this case as there is a gap between 2014-07-03 and 2014-07-07.
Just use a correlated subquery instead. You have two fields, so you can do this with two correlated subqueries, or a correlated subquery with a join back to the table. Here is the first version:
SELECT t1.date1 as `first day`,
(select t2.date1
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) as `next day`,
(select t2.hours
from tableX t2
where t2.date1 > t.date1
order by t2.date asc
limit 1
) - t.hours
FROM tableX t
ORDER BY t.date1;
Another alternative is to rank the data by date and then subtract the hours of the previous workday's date from the hours of the current workday's date.
SELECT
ranked_t1.date1 date,
ranked_t1.hours - ranked_t2.hours hours
FROM
(
SELECT t.*,
#rownum := #rownum + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum := 0) r
) ranked_t1
INNER JOIN
(
SELECT t.*,
#rownum2 := #rownum2 + 1 AS rank
FROM (SELECT * FROM tableX ORDER BY date1) t,
(SELECT #rownum2 := 0) r
) ranked_t2
ON ranked_t2.rank = ranked_t1.rank - 1;
SQL Fiddle demo
Note:
Obviously an index on tableX.date1 would speed up the query.
Instead of a correlated subquery, a join is used in the above query.
Reference:
Mysql rank function on SO
Unfortunately, MySQL doesn't (yet) have analytic functions which would allow you to access the "previous row" or the "next row" of the data stream. However, you can duplicate it with this:
select h2.LogDate, h2.Hours - h1.Hours as Added_Hours
from Hours h1
left join Hours h2
on h2.LogDate =(
select Min( LogDate )
from Hours
where LogDate > h1.LogDate )
where h2.LogDate is not null;
Check it out here. Note the index on the date field. If that field is not indexed, this query will take forever.
Is it possible to select the next lower number from a table without using limit.
Eg: If my table had 10, 3, 2 , 1 I'm trying to select * from table where col > 10.
The result I'm expecting is 3. I know I can use limit 1, but can it be done without that?
Try
SELECT MAX(no) no
FROM table1
WHERE no < 10
Output:
| NO |
------
| 3 |
SQLFiddle
Try this query
SELECT
*
FROM
(SELECT
#rid:=#rid+1 as rId,
a.*
FROM
tbl a
JOIN
(SELECT #rid:=0) b
ORDER BY
id DESC)tmp
WHERE rId=2;
SQL FIDDLE:
| RID | ID | TYPE | DETAILS |
------------------------------------
| 2 | 28 | Twitter | #sqlfiddle5 |
Another approach
select a.* from supportContacts a inner join
(select max(id) as id
from supportContacts
where
id in (select id from supportContacts where id not in
(select max(id) from supportContacts)))b
on a.id=b.id
SQL FIDDLE:
| ID | TYPE | DETAILS |
------------------------------
| 28 | Twitter | #sqlfiddle5 |
Alternatively, this query will always get the second highest number based on the inner where clause.
SELECT *
FROM
(
SELECT t.col,
(
SELECT COUNT(distinct t2.col)
FROM tableName t2
WHERE t2.col >= t.col
) as rank
FROM tablename t
WHERE col <= 10
) xx
WHERE rank = 2 -- <<== means second highest
SQLFiddle Demo
SQLFiddle Demo (supports duplicate values)
If you want to get next lower number from table
you can get it with this query:
SELECT distinct col FROM table1 a
WHERE 2 = (SELECT count(DISTINCT(b.col)) FROM table1 b WHERE a.col >= b.col);
later again if you want to get third lower number you can just pass 3 in place of 2 in where clause
again if you want to get second higher number, just change the condition of where clause in inner query with
a.col <= b.col