In a previous question the answer was given to perform a query which would get revenue from the last year in a new column. This worked perfectly, but now I have a follow-up question. (please review this link to have a look at the previous question)
The query used to get this data (With thanks to Mikhail):
#standardSQL
SELECT
a.date, a.location, a.revenue,
DATE_SUB(a.date, INTERVAL 1 YEAR) date_last_year,
IFNULL(b.revenue, 0) revenue_last_year
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.location = b.location
AND DATE_SUB(a.date, INTERVAL 1 YEAR) = b.date
The simplified outcome looks as follows (limited to 1 location):
date revenue revenue_last_year
2019-01-31 1471,2577 2185,406
2019-01-30 1291,1111 4723,7439
2019-01-29 2178,6532 2263,5283
2019-01-28 1531,8021 0
2019-01-26 1578,1247 2446,6234
2019-01-25 1299,644 1522,4537
2019-01-24 788,2669 1979,104
2019-01-23 787,441 4117,7927
2019-01-22 2437,2951 1876,2479
2019-01-21 1071,0476 0
2019-01-19 2291,0456 2289,8657
The follow up question relates to working with weekdays from last year. As you can see the revenue last year has '0' values. That's because it's a day that location A was closed. However, to make an accurate comparison on a day by day basis we need to locate the revenue for the day this revenue was open.
For more information, have a look at this table below to see what it looks like when we take the last 10 days of Jan this year and append two columns with the revenues of last year found manually:
date revenue revenue_last_year date revenue
2019-01-31 1471,2577 2185,406 2018-01-31 2185,406
2019-01-30 1291,1111 4723,7439 2018-01-30 4723,7439
2019-01-29 2178,6532 2263,5283 2018-01-29 2263,5283
2019-01-28 1531,8021 0 2018-01-27 2843,6616
2019-01-26 1578,1247 2446,6234 2018-01-26 2446,6234
2019-01-25 1299,644 1522,4537 2018-01-25 1522,4537
2019-01-24 788,2669 1979,104 2018-01-24 1979,104
2019-01-23 787,441 4117,7927 2018-01-23 4117,7927
2019-01-22 2437,2951 1876,2479 2018-01-22 1876,2479
2019-01-21 1071,0476 0 2018-01-20 2561,4086
2019-01-19 2291,0456 2289,8657 2018-01-19 2289,8657
Please note the differences in dates.
What would be a good way to solve this? Would it be necessary to adjust to a query for weekdays and how would you approach this?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
a.date, a.location, ANY_VALUE(a.revenue) revenue,
ARRAY_AGG(
STRUCT(b.date AS date_last_year, b.revenue AS revenue_last_year)
ORDER BY b.date DESC LIMIT 1
)[OFFSET(0)].*
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.location = b.location
AND b.date BETWEEN DATE_SUB(DATE_SUB(a.date, INTERVAL 1 YEAR), INTERVAL 7 DAY) AND DATE_SUB(a.date, INTERVAL 1 YEAR)
GROUP BY a.date, a.location
You can test, play with above using dummy/sample data (I used same as in my answer for your previous question) as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2018-02-20' `date`, 'A' location, 1 revenue UNION ALL
SELECT '2018-02-21', 'A', 3 UNION ALL
SELECT '2019-02-20', 'A', 5 UNION ALL
SELECT '2019-02-21', 'A', 7 UNION ALL
SELECT '2019-02-22', 'A', 9 UNION ALL
SELECT '2018-02-20', 'B', 2 UNION ALL
SELECT '2018-02-22', 'B', 4 UNION ALL
SELECT '2019-02-20', 'B', 6 UNION ALL
SELECT '2019-02-21', 'B', 8 UNION ALL
SELECT '2019-02-22', 'B', 10
)
SELECT
a.date, a.location, ANY_VALUE(a.revenue) revenue,
ARRAY_AGG(
STRUCT(b.date AS date_last_year, b.revenue AS revenue_last_year)
ORDER BY b.date DESC LIMIT 1
)[OFFSET(0)].*
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.location = b.location
AND b.date BETWEEN DATE_SUB(DATE_SUB(a.date, INTERVAL 1 YEAR), INTERVAL 7 DAY) AND DATE_SUB(a.date, INTERVAL 1 YEAR)
GROUP BY a.date, a.location
-- ORDER BY a.date, a.location
with result
Row date location revenue date_last_year revenue_last_year
1 2019-02-20 A 5 2018-02-20 1
2 2019-02-20 B 6 2018-02-20 2
3 2019-02-21 A 7 2018-02-21 3
4 2019-02-21 B 8 2018-02-20 2
5 2019-02-22 A 9 2018-02-21 3
6 2019-02-22 B 10 2018-02-22 4
Note the differences in dates :o)
Related
I am trying to write a query to get the last 4 weeks (Mon-Sun) of data. I want every week of data to be stored with an individual and shared table.
every week data store based on name if same name repeated on single week amt should sum and if multiple name it should be show data individual, To see an example of what I am looking for, I have included the desired input and output below.
this is my table
date
amt
name
2022-04-29
5
a
2022-04-28
10
b
2022-04-25
11
a
2022-04-23
15
b
2022-04-21
20
b
2022-04-16
20
a
2022-04-11
10
a
2022-04-10
5
b
2022-04-05
5
b
i want output like this
date
sum(amt)
name
2022-04-25 to 2020-04-29
16
a
2022-04-25 to 2020-04-29
10
b
2022-04-18 to 2022-04-24
35
b
2022-04-11 to 2022-04-17
30
a
2022-04-04 to 2022-04-10
10
b
I would appreciate any pointers or 'best-practises' which I should employ to achieve this task.
You can try to use DATE_ADD with WEEKDAY get week first day and end day.
SELECT
CASE WHEN
weekofyear(`date`) = weekofyear(NOW())
THEN 'current week'
ELSE
CONCAT(date_format(DATE_ADD(`date`, interval - WEEKDAY(`date`) day), '%Y-%m-%d'),' to ',date_format(DATE_ADD(DATE_ADD(`date`, interval -WEEKDAY(`date`) day), interval 6 day), '%Y-%m-%d'))
END 'date',
SUM(amt)
FROM T
GROUP BY
CASE WHEN
weekofyear(`date`) = weekofyear(NOW())
THEN 'current week'
ELSE
CONCAT(date_format(DATE_ADD(`date`, interval - WEEKDAY(`date`) day), '%Y-%m-%d'),' to ',date_format(DATE_ADD(DATE_ADD(`date`, interval -WEEKDAY(`date`) day), interval 6 day), '%Y-%m-%d'))
END
sqlfiddle
EDIT
I saw you edit your question, you can just add name in group by
SELECT
CONCAT(date_format(DATE_ADD(`date`, interval - WEEKDAY(`date`) day), '%Y-%m-%d'),' to ',date_format(DATE_ADD(DATE_ADD(`date`, interval -WEEKDAY(`date`) day), interval 6 day), '%Y-%m-%d')) 'date',
SUM(amt),
name
FROM T
GROUP BY
CONCAT(date_format(DATE_ADD(`date`, interval - WEEKDAY(`date`) day), '%Y-%m-%d'),' to ',date_format(DATE_ADD(DATE_ADD(`date`, interval -WEEKDAY(`date`) day), interval 6 day), '%Y-%m-%d')),
name
ORDER BY 1 desc
sqlfiddle
This is in SQL Server, and just a mess about. Hopefully it can be of some help.
with cteWeekStarts
as
(
select
n,dateadd(week,-n,DATEADD(week, DATEDIFF(week, -1, getdate()), -1)) as START_DATE
from
(values (1),(2),(3),(4)) as t(n)
), cteStartDatesAndEndDates
as
(
select *,dateadd(day,-1,lead(c.start_date) over (order by c.n desc)) as END_DATE
from cteWeekStarts as c
)
,cteSalesSumByDate
as
(
select s.SalesDate,sum(s.salesvalue) as sum_amt from
tblSales as s
group by s.SalesDate
)
select c3.n as WeekNum,c3.START_DATE,isnull(c3.END_DATE,
dateadd(day,6,c3.start_date)) as END_DATE,
(select sum(c2.sum_amt) from cteSalesSumByDate as c2 where c2.SalesDate
between c3.START_DATE and c3.END_DATE) as AMT
from cteStartDatesAndEndDates as c3
order by c3.n desc
im looking for a SQL Query that can deliver me all of the free time Intervals between a given Range for a Table with two datetime Columns (DATE_FROM, DATE_TILL). As a Requirement: All other entries are not overlapped and the only acceptable distance between each interval is 1 Second.
I have found a Solution but this doesnt fill all my Requirements, specially the one where i want to put a given start and end datetime to calculate the missing Intervals if given.
Here is my datatable:
ROW_ID LOCATION_ID DATE_FROM DATE_TILL
1 193 2019-02-01 00:00:00 2019-12-31 23:59:59
2 193 2020-02-01 00:00:00 2020-12-31 23:59:59
3 193 2021-01-01 00:00:00 2021-12-31 23:59:59
4 193 2022-01-01 00:00:00 2022-12-31 23:59:59
5 204 2020-01-01 00:00:00 2021-12-31 23:59:59
And this is my SQL Query, which is from another Solution in this Plattform where i made some requirements changes.
SELECT DATE_ADD(DATE_TILL,INTERVAL 1 SECOND) AS GAP_FROM, DATE_SUB(DATE_FROM,INTERVAL 1 SECOND) AS GAP_TILL
FROM
(
SELECT DISTINCT DATE_FROM, ROW_NUMBER() OVER (ORDER BY DATE_FROM) RN
FROM overlappingtable T1
WHERE
LOCATION_ID = 193 AND
NOT EXISTS (
SELECT *
FROM overlappingtable T2
WHERE T1.DATE_FROM > T2.DATE_FROM AND T1.DATE_FROM < T2.DATE_TILL
)
) T1
JOIN (
SELECT DISTINCT DATE_TILL, ROW_NUMBER() OVER (ORDER BY DATE_TILL) RN
FROM overlappingtable T1
WHERE
LOCATION_ID = 193 AND
NOT EXISTS (
SELECT *
FROM overlappingtable T2
WHERE T1.DATE_TILL > T2.DATE_FROM AND T1.DATE_TILL < T2.DATE_TILL
)
) T2
ON T1.RN - 1 = T2.RN
WHERE
DATE_ADD(DATE_TILL,INTERVAL 1 SECOND) < DATE_FROM
This Query delivers me this result:
GAP_FROM GAP_TILL
2020-01-01 00:00:00 2020-01-31 23:59:59
Which is great, this is the free Interval that i have to deliver between entries that have their ranges and dont overlap.
But I want to set in this Query two Parameters for The Main Range for this entries. One for the startdate and the other for enddate. For this example:
startdate = '2019-01-01 00:00:00'
enddate = '9999-12-31 23:59:59'
For LOCATION_ID = 193 i am missing the gap between the startdate('2019-01-01 00:00:00') and the first DATE_FROM for the first entry('2019-02-01 00:00:00').
The result that i would like to deliver should look like this for LOCATION_ID = 193:
GAP_FROM GAP_TILL
2019-01-01 00:00:00 2019-01-31 23:59:59
2020-01-01 00:00:00 2020-01-31 23:59:59
2023-01-01 00:00:00 9999-12-31 23:59:59
Im really new at SQL and could understand this Query, but i can't develop this further to set these Main Ranges and deliver the missing gaps.
Thanks in Advance
For clarity I would recommend to find the initial gaps, the middle ones, and the ending ones in separate CTEs, as shown below in the b, m, and e CTEs. Then, a simple UNION ALL can combine all of them:
with
p (loc_id, start_date, end_date) as (
select 193, '2019-01-01 00:00:00', '9999-12-31 23:59:59'
),
r as (
select location_id, date_from,
date_add(date_till, interval 1 second) as date_till,
lead(date_from) over(partition by location_id order by date_from) as next_from
from overlappingtable t
cross join p
where t.location_id = p.loc_id
),
b as (
select p.start_date as gap_from, r.date_from as gap_till
from (select * from r order by date_from limit 1) r
cross join p
where p.start_date < r.date_from
),
m as (
select date_till, next_from
from r
where date_till < next_from
),
e as (
select r.date_till, p.end_date
from (select * from r order by date_till desc limit 1) r
cross join p
where r.date_till < p.end_date
)
select * from b
union all select * from m
union all select * from e
order by gap_from
Result:
gap_from gap_till
-------------------- -------------------
2019-01-01 00:00:00 2019-02-01 00:00:00
2020-01-01 00:00:00 2020-02-01 00:00:00
2023-01-01 00:00:00 9999-12-31 23:59:59
See running example atDB Fiddle.
The initial CTE p includes the parameters of the query (loc_id, start_date, end_date) and is added for clarity.
You could join to a sub-query with the start & end datetimes.
Then compare to the previous & next datetimes per location_id.
The previous or next datetimes can be found via the LAG & LEAD functions.
WITH CTE_UNDERLAPS AS
(
SELECT t.*
, LAG(DATE_TILL) OVER (PARTITION BY LOCATION_ID ORDER BY DATE_FROM, DATE_TILL) AS PREV_DATE_TILL
, LEAD(DATE_FROM) OVER (PARTITION BY LOCATION_ID ORDER BY DATE_FROM, DATE_TILL) AS NEXT_DATE_FROM
, l.*
FROM overlappingtable t
JOIN (
SELECT
CAST('2019-01-01 00:00:00' AS DATETIME) AS START_DATETIME
, CAST('9999-12-31 23:59:59' AS DATETIME) AS END_DATETIME
) l ON DATE_FROM >= START_DATETIME AND DATE_TILL <= END_DATETIME
)
SELECT LOCATION_ID
, COALESCE(DATE_ADD(PREV_DATE_TILL,INTERVAL 1 SECOND), START_DATETIME) AS DATE_FROM
, DATE_SUB(DATE_FROM,INTERVAL 1 SECOND) AS DATE_TILL
FROM CTE_UNDERLAPS
WHERE COALESCE(DATE_ADD(PREV_DATE_TILL,INTERVAL 1 SECOND), START_DATETIME) < DATE_FROM
UNION
SELECT LOCATION_ID
, DATE_ADD(DATE_TILL,INTERVAL 1 SECOND) AS DATE_FROM
, COALESCE(DATE_SUB(NEXT_DATE_FROM,INTERVAL 1 SECOND), END_DATETIME) AS DATE_TILL
FROM CTE_UNDERLAPS
WHERE DATE_ADD(DATE_TILL,INTERVAL 1 SECOND) < COALESCE(NEXT_DATE_FROM, END_DATETIME)
ORDER BY LOCATION_ID, DATE_FROM, DATE_TILL
LOCATION_ID
DATE_FROM
DATE_TILL
193
2019-01-01 00:00:00
2019-01-31 23:59:59
193
2020-01-01 00:00:00
2020-01-31 23:59:59
193
2023-01-01 00:00:00
9999-12-31 23:59:59
204
2019-01-01 00:00:00
2019-12-31 23:59:59
204
2022-01-01 00:00:00
9999-12-31 23:59:59
Demo on db<>fiddle here
I want to duplicate a (revenue) column and shift it one year in order to make YoY comparisons. Currently looking to lead values in a Big Query table based on a specific date to achieve this but stuck.
I used DATE_ADD to create a new column to get the date of last year but now I want to get a column next to it with the revenue based on the DATE_ADD date.
One problem is that not all locations include the same date, that's why it's harder to make the shift.
There is no way to properly format a table so I have an image of the intended result here. Where basically the revenue_last_year should fill in with the value of the revenue column corresponding to the date_add column and the right location.
The query below is as far as I've been able to go:
SELECT
Date,
location,
revenue,
DATE_ADD(date, INTERVAL -1 YEAR) AS DateAdd,
LEAD(revenue, ##OFFSET## ) OVER (PARTITION BY location ORDER BY date DESC) AS revenue_last_year
FROM
`dataset.table1`
Does anyone have a suggestion on how to relate the offset value to the right date? Or should I approach this in a completely different way?
Below is for BigQuery Standard SQL
#standardSQL
SELECT
a.date, a.location, a.revenue,
DATE_SUB(a.date, INTERVAL 1 YEAR) date_last_year,
IFNULL(b.revenue, 0) revenue_last_year
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.location = b.location
AND DATE_SUB(a.date, INTERVAL 1 YEAR) = b.date
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2018-02-20' `date`, 'A' location, 1 revenue UNION ALL
SELECT '2018-02-20', 'B', 2 UNION ALL
SELECT '2018-02-21', 'A', 3 UNION ALL
SELECT '2018-02-22', 'B', 4 UNION ALL
SELECT '2019-02-20', 'A', 5 UNION ALL
SELECT '2019-02-20', 'B', 6 UNION ALL
SELECT '2019-02-21', 'A', 7 UNION ALL
SELECT '2019-02-21', 'B', 8 UNION ALL
SELECT '2019-02-22', 'A', 9 UNION ALL
SELECT '2019-02-22', 'B', 10
)
SELECT
a.date, a.location, a.revenue,
DATE_SUB(a.date, INTERVAL 1 YEAR) date_last_year,
IFNULL(b.revenue, 0) revenue_last_year
FROM `project.dataset.table` a
LEFT JOIN `project.dataset.table` b
ON a.location = b.location
AND DATE_SUB(a.date, INTERVAL 1 YEAR) = b.date
-- ORDER BY a.date, a.location
with result
Row date location revenue date_last_year revenue_last_year
1 2018-02-20 A 1 2017-02-20 0
2 2018-02-20 B 2 2017-02-20 0
3 2018-02-21 A 3 2017-02-21 0
4 2018-02-22 B 4 2017-02-22 0
5 2019-02-20 A 5 2018-02-20 1
6 2019-02-20 B 6 2018-02-20 2
7 2019-02-21 A 7 2018-02-21 3
8 2019-02-21 B 8 2018-02-21 0
9 2019-02-22 A 9 2018-02-22 0
10 2019-02-22 B 10 2018-02-22 4
My table:
rating date
4 12/02/2013
3 12/02/2013
2.5 12/01/2013
3 12/01/2013
4.5 21/11/2012
5 10/11/2012
If I give input as 3 the last three months (02,01,12), average of rating result should come
I tried by using GROUP BY but I get this result:
rating month
3.5 02
2.75 01
For the 12th month no rating so no output.....
My desired result:
rating month
3.5 02
2.75 01
0 12
The problem is that you want to return months that do not exist. If you do not have a calendar table with dates, then you will want to use something like the following:
select d.mth Month,
coalesce(avg(t.rating), 0) Rating
from
(
select 1 mth union all
select 2 mth union all
select 3 mth union all
select 4 mth union all
select 5 mth union all
select 6 mth union all
select 7 mth union all
select 8 mth union all
select 9 mth union all
select 10 mth union all
select 11 mth union all
select 12 mth
) d
left join yourtable t
on d.mth = month(t.date)
where d.mth in (1, 2, 12)
group by d.mth
See SQL Fiddle with Demo
SELECT coalesce(avg(rating), 0.0) avg_rating, req_month
FROM yourTable
RIGHT JOIN
(SELECT month(now()) AS req_month
UNION
SELECT month(now() - INTERVAL 1 MONTH) AS req_month
UNION
SELECT month(now() - INTERVAL 2 MONTH) AS req_month) tmpView
ON month(yourTable.date) = tmpView.req_month
WHERE yourTable.date > ( (curdate() - INTERVAL day(curdate()) - 1 DAY) - INTERVAL 2 MONTH)
OR ratings.datetime IS NULL
GROUP BY month(yourTable.date);
I have query like this :
SELECT EXTRACT(MONTH FROM d.mydate) AS synmonth, SUM(apcp) AS apcptot
FROM t_synop_data2 d
WHERE d.mydate
BETWEEN '2011-01-01' AND '2011-12-31'
AND d.idx_synop = '06712'
GROUP BY synmonth
This query adds all rain (apcp) in a month like this :
1 32.8 => from 2011.01.01 to 2011.01.31
2 27.2 => from 2011.02.01 to 2011.02.28
3 21.0
4 21.8
5 88.5
6 131.4
7 118.6
8 57.1
9 80.9
10 84.6
11 1.1
12 143.5 => from 2011.12.01 to 2011.12.31
That's what I want, but with a little difference.
This difference is that i have to adds apcp from day 2 in the month to day 1 next month and then return a result like above.
1 132.8 => from 2011.01.02 to 2011.02.01
2 27.2 => from 2011.02.02 to 2011.03.01
3 21.0
4 21.8
5 88.5
6 131.4
7 118.6
8 57.1
9 80.9
10 84.6
11 1.1
12 143.5 => from 2011.12.02 to 2012.01.01
I tried something with add_date(), extract() or date_format() but without result.
Thank you for your answer
Vince
Here is the query :
SELECT EXTRACT(MONTH FROM ADDDATE(d.mydate,-1) ) AS synmonth
, SUM(apcp) AS apcptot
FROM t_synop_data2 AS d
WHERE ADDDATE(d.mydate,-1) BETWEEN '2011-01-01' AND '2012-12-31'
AND d.idx_synop = '06712'
GROUP BY synmonth
You can check the result by adding two columns like this:
SELECT EXTRACT(MONTH FROM ADDDATE(d.mydate,-1) ) AS synmonth
, SUM(apcp) AS apcptot
, MIN(d.mydate) AS date_min
, MAX(d.mydate) AS date_max
FROM t_synop_data2 AS d
WHERE ADDDATE(d.mydate,-1) BETWEEN '2011-01-01' AND '2012-12-31'
AND d.idx_synop = '06712'
GROUP BY synmonth
You can group by EXTRACT(MONTH FROM d.mydate - INTERVAL 1 DAY)
SELECT EXTRACT(MONTH FROM d.mydate) AS synmonth, SUM(apcp) AS apcptot
FROM t_synop_data2 d
WHERE d.mydate
BETWEEN '2011-01-01' AND '2011-12-31'
AND d.idx_synop = '06712'
GROUP BY EXTRACT(MONTH FROM d.mydate - INTERVAL 1 DAY)