MySQL/ Hive : Join conditioned rows using windowing or analytical functions - mysql

I have two tables which I want to join with a specific logic.
Table_1 ( S_No, ID, Date1, Date2 )
S_No ID Date1 Date2
1 id1 2014-05-01 2014-07-03
2 id1 2015-03-23 2016-06-18
3 id1 2016-06-21 2016-07-29
Table_2 ( S_No_New, ID_New, Date_New )
S_No_New ID_New Date_New
2_1 id1 2014-04-25
2_2 id1 2014-06-14
2_3 id1 2015-01-10
2_4 id1 2015-02-15
2_5 id1 2015-05-17
2_6 id1 2016-04-24
2_7 id1 2016-06-19
2_8 id1 2016-06-25
2_9 id1 2016-07-11
2_10 id1 2016-08-11
2_11 id1 2016-08-16
I want to join above two table in such a way that I get a count of how many rows are there in table_2 before Date1 and between Date1 and Date2 and then when we move to the next row we have to use the data which is not counted yet so far for the same id.
And if we have a date entry in table_2 after the last Date2 entry in table 1 then we need to append a new row with '+1" added to S_No and similar remaining column details.
Overall this problem can be split into two parts :
1) Getting the counts column
2) Adding up the extra rows ( S_No "4" in this example )
Please drop an answer if you know solution to either of the two.
Final output :
S_No ID Date1 Date2 Count_pre Count_Between
1 id1 2014-05-01 2014-07-03 1 1
2 id1 2015-03-23 2016-06-18 2 2
3 id1 2016-06-21 2016-07-29 1 2
4 id1 NULL NULL 2 0
Logic :
S_No 1 :
Count_Pre = Dates before 2014-05-01
Count_between = Dates between 2014-05-01 and 2014-07-03
S_No 2 :
Count_Pre = Dates between 2014-07-03 and 2015-03-23
Count_between = Dates between 2015-03-23 and 2016-06-18
and so on
Intermediate table has to look something like this:
S_No ID Date Date2 S_No_New Date_New
1 id1 2014-05-01 2014-07-03 2_1 2014-04-25
1 id1 2014-05-01 2014-07-03 2_2 2014-06-14
2 id1 2015-03-23 2016-06-18 2_3 2015-01-10
2 id1 2015-03-23 2016-06-18 2_4 2015-02-15
2 id1 2015-03-23 2016-06-18 2_5 2015-05-17
2 id1 2015-03-23 2016-06-18 2_6 2016-04-24
3 id1 2016-06-21 2016-07-29 2_7 2016-06-19
3 id1 2016-06-21 2016-07-29 2_8 2016-06-25
3 id1 2016-06-21 2016-07-29 2_9 2016-07-11
4 id1 NULL NULL 2_10 2016-08-11
4 id1 NULL NULL 2_11 2016-08-16
I was trying out different windowing and analytics function but couldn't get through this problem.
Is it possible to do this kind of join in hive ( basic sql ) ?
NOTE : EDIT 2 : I need to implement this in hive and it supports all the builtin functions but not the variables from mysql. It supports aggregate, windowing and analytics funtions.
EDIT : Changed the date format from dd/mm/yyyy to yyyy-mm-dd

SELECT t.t1s_no,t.date1,t.date2,
sum(case when t.srce = 'P' then 1 else 0 end) as 'prev',
sum(case when t.srce = 'B' then 1 else 0 end) as 'between',
sum(case when t.srce = 'X' then 1 else 0 end) as 'missing'
FROM
(
SELECT S.*,
ROW_NUMBER() OVER (PARTITION BY S.DATE_NEW ORDER BY s.srce ,S.DATE1) RN
FROM
(SELECT 'P' AS SRCE,T1.S_NO T1S_NO,T1.ID T1ID,T1.DATE1 DATE1,T1.DATE2 DATE2,T2.DATE_NEW
FROM TABLE_1 T1
JOIN TABLE_2 T2 ON T2.DATE_NEW < T1.DATE1
UNION
SELECT 'B' AS SRCE,T1.S_NO T1S_NO,T1.ID T1ID,T1.DATE1 DATE1,T1.DATE2 DATE2,T2.DATE_NEW
FROM TABLE_1 T1
JOIN TABLE_2 T2 ON T2.DATE_NEW BETWEEN T1.DATE1 AND T1.DATE2
UNION
SELECT 'X' AS SRCE,4 T1S_NO,T1.ID T1ID,T1.DATE1 DATE1,T1.DATE2 DATE2,T2.DATE_NEW
FROM TABLE_2 T2
left JOIN TABLE_1 T1 ON (T2.DATE_NEW BETWEEN T1.DATE1 AND T1.DATE2) or (t2.date_new < t1.date1)
where t1.date1 is null
) S
) T
WHERE T.RN = 1
group by t.t1s_no,t.date1,t.date2
ORDER BY T.T1S_NO, T.DATE1
;
Result
t1s_no date1 date2 prev between missing
----------- ---------------- ---------------- ----------- ----------- -----------
1 2014-05-01 2014-07-03 1 1 0
2 2015-03-23 2016-06-18 2 2 0
3 2016-06-21 2016-07-29 1 2 0
4 NULL NULL 0 0 2
(4 rows affected)

Related

Daily Sales from Total Sales

I have a database that looks like this:
ID
Sale_Date(YYYY-MM-DD)
Total_Volume
123
2022-01-01
0
123
2022-01-02
2
123
2022-01-03
5
456
2022-04-06
38
456
2022-04-07
40
456
2022-04-08
45
I want to get a daily sale column from Total Volume. which is just by subtracting the total volume on date x with total volume on date x-1 for each id.
ID
Sale_Date(YYYY-MM-DD)
Total_Volume
Daily_Sale
123
2022-01-01
0
0
123
2022-01-02
2
2
123
2022-01-03
5
3
456
2022-04-06
38
38
456
2022-04-07
40
2
456
2022-04-08
45
5
My initial attempt was using a rank function and self join but that didnt turn out correct.
with x as (
select
distinct t1.ID,
t1.Sale_Date,
t1.Total_volume,
rank() over (partition by ID order by Sale_Date) as ranker
from t t1 order by t1.Sale_Date)
select t2.ID, t2.ranker, t2.Sale_date, t1.Total_volume, t1.Total_volume - t2.Total_volume as Daily_sale
from x t1, x t2 where t1.ID = t2.ID and t2.ranker = t1.ranker-1 order by t1.ID;
You should use:
the LAG window function to retrieve last "Sale_Date" value
the COALESCE function to replace NULL with "Total Volume" for each first rows
Then subtract Total_Volume from the previous value of Total_Volume and coalesce if the value of the LAG is NULL.
SELECT *,
COALESCE(`Total_Volume`
-LAG(`Total_Volume`) OVER(PARTITION BY `ID`
ORDER BY `Sale_Date(YYYY-MM-DD)`), `Total_Volume`) AS `Daily_Sale`
FROM tab
Check the demo here.

Mysql finding count of previous date of occurences for each record

I have a table to store id, sid with a date time.
id is used as primary key and no meaning in data.
sid is used to identify entity.
eg.
id sid date
--------------------
1 1 2020-01-12
2 2 2020-01-01
3 1 2019-12-31
4 2 2019-12-31
5 1 2019-12-31
6 1 2019-11-01
7 3 2019-11-01
8 3 2018-12-21
9 2 2018-12-21
Then I would like to query for each record, count occurrences in the same table with the previous date of current date, and with the same sid, like:
id sid date previous_count
----------------------------------
1 1 2020-01-12 2
2 2 2020-01-01 1
3 1 2019-12-31 1
4 2 2019-12-31 1
5 1 2019-12-31 1
6 1 2019-11-01 0
7 3 2019-11-01 1
8 3 2018-12-21 0
9 2 2018-12-21 0
Explanation:
for row 1, since sid 1 has two records in 2019-12-31, which is the previous date of 2020-01-12 for sid 1 in the table, it has 2 in previous_count;
while in row 2, since sid 2 has only 1 record in 2019-12-31, which is the previous date of 2020-01-01 for sid 2, it has 1 in previous_count.
Thanks
Your are looking for dense_rank() - 1:
select t.*,
(dense_rank() over (partition by sid order by date) - 1) as previous_count
from t
order by id;
In older versions of MySQL, you could use variables or a correlated subquery:
select t.*,
(select count(distinct t2.date)
from t t2
where t2.sid = t.sid and t2.date < t.date
) as previous_count
from t
order by id;
EDIT:
Ahh, I think I may have misunderstood the problem. I think this does what you want:
select t.*, lag(cnt, 1, 0) over (partition by sid order by date)
from (select t.*,
count(*) over (partition by sid, date) as cnt
from t
) t
order by id;
Here is a db<>fiddle.

How to get difference based on date in mysql

I have like below mentioned two table:
Table1
ID Unique_Value
T-1 OI-45
T-4 OI-45
T-8 OI-45
T-7 OI-46
T-6 OI-49
Table2
ID Date Value
T-1 2018-01-01 15:13:22 10
T-4 2018-03-15 18:10:45 15
T-8 2018-05-12 05:17:43 25
T-7 2018-04-01 15:13:22 12
T-6 2018-06-01 15:13:22 18
I have joined the Table2 ID with Table1 ID and get the Unique_Value, based on the unique value and order by Date in Descending order and group by Unique_Value, I want to get the difference value of a particular ID from the previous Value.
Required Output would be:
ID Date Value Unique_Value Difference
T-1 2018-01-01 15:13:22 10 OI-45 [Null]
T-4 2018-03-15 18:10:45 15 OI-45 5
T-8 2018-05-12 05:17:43 25 OI-45 10
T-7 2018-04-01 15:13:22 12 OI-46 [Null]
T-6 2018-06-01 15:13:22 18 OI-49 [Null]
I have tried using Lead Log but it didn't worked.
You can try below using lag() function - it will work for mysql version 8.0+
DEMO
select id,Date,value,Unique_Value,case when prevval is null then null else value-prevval end as Difference
from
(
select t1.Id,t1.Unique_Value,t2.Date,t2.value,lag(t2.value,1) over(partition by t1.Unique_Value order by t2.Date) as prevval
from table1 t1 inner join table2 t2 on t1.id=t2.id
)A
For Mysql Version 5.7 you can try below -
DEMO
SET #quot=0, #latest=0, #comp=''
select id, Unique_Value,d,value,case when latest=1 then c=null else c end as difference
from
(
select id,Unique_Value,d,value,c,IF(#comp<>Unique_Value,1,0) as LATEST,#comp:=Unique_Value as company from
(
select t1.Id,t1.Unique_Value,value,t2.d,value-#quot as c,#quot:=value
from t1 inner join t2 on t1.id=t2.id
order by t1.Unique_Value,t2.d
)A order by Unique_Value,d
)B
OUTPUT:
id d value Unique_Value Difference
T-1 2018-01-01 10 OI-45
T-4 2018-03-15 15 OI-45 5
T-8 2018-05-12 25 OI-45 10
T-7 2018-04-01 12 OI-46
T-6 2018-06-01 18 OI-49

Mysql find value that follows another

Hard for me to put in a coherent statement but I can give a sample set
ID STATUS DATE
1 A 2016-01-01
2 A 2016-01-01
2 B 2016-01-02
3 C 2016-01-13
4 D 2016-01-14
5 A 2016-01-15
5 B 2016-01-16
6 A 2016-01-17
7 C 2016-01-18
8 B 2016-01-19
9 B 2016-01-20
I want an sql statement that can determine two things:
1) How many items go from STATUS = A to a STATUS = B, with the same ID
2) I only want to show the rows with the aforementioned statuses - as follows:
ID STATUS DATE
2 A 2016-01-01
2 B 2016-01-02
5 A 2016-01-15
5 B 2016-01-16
COUNT(distinct ID) of that result should return 2 in this case
Any help would be appreciated
Join the table with itself, matching rows with the row after them with the same id.
SELECT t1.id, t1.status AS start_status, t1.date AS start_date,
t2.status AS end_status, t2.date AS end_date
FROM yourTable AS t1
JOIN yourTable AS t2 ON t1.id = t2.id AND t1.date = date_sub(t2.date, interval 1 day)
WHERE t1.status = 'A' AND t2.status = 'B'
This will show both rows together, e.g.
id start_status start_date end_status end_date
2 A 2016-01-01 B 2016-01-02
5 A 2016-01-15 B 2016-01-16

Select row of second table only if not exists in first one

I have two tables t1 and t2 with the same structure
id INT
userid INT
date DATETIME
The first table contains my data, while the second table is kind of helper table which contains rows for 10 fix dates and userid = -1
What i need is a SELECT which gives me all rows from t1 with userid=X joined(merged) with all rows from t2 which date is not already in the result of t1.
Pseudo code
SELECT id, date
FROM t1, t2
WHERE (t1.userid=:id OR t2.userid=-1) AND t2.date NOT IN t1.date
Sample:
t1:
id userid date
1 1 2015-12-01
2 1 2015-12-02
3 1 2015-12-03
4 2 2015-12-01
5 2 2015-12-02
t2:
id userid date
1 -1 2015-12-01
2 -1 2015-12-02
3 -1 2015-12-03
4 -1 2015-12-04
5 -1 2015-12-05
Expected output for userid=1:
1 1 2015-12-01
2 1 2015-12-02
3 1 2015-12-03
4 -1 2015-12-04
5 -1 2015-12-05
Thanks for your help
I'll use a union select for doing this.
SELECT
id, date
FROM
t1
WHERE
t1.id=:id
UNION ALL
(SELECT
id, date
FROM
t2
WHERE
t2.id=-1
AND t2.date NOT IN (SELECT date FROM t1 WHERE t1.userid=:id))