i have a table that looks like this :
what i need is For each day show the accumulated (moving) number of new Droppers in the last 5 days inclusive
Split to US vs. Non-US geos.
Report Columns:
DataTimstamp - upper boundary of a 5-day time frame
Total - number of New Droppers within the time frame
Region_US - number of New Droppers where country =’US’
Region_rest - number of New Droppers where country<>’US’
this is my code :
Create view new_droppers_per_date as Select date,country,count(dropperid) as num_new_customers
From(
Select dropper id,country,min(cast( LoginTimestamp as date)) as date
From droppers) as t1 group by date,country
Select DataTimstamp,region_us,region_rest from(
(Select date as DataTimstamp ,sum(num_new_customers) over(oreder by date range between 5
preceding and 1 preceding) as total
From new_droppers_per_date ) as t1 inner join
(Select date ,sum(num_new_customers) over(oreder by date range between 5 preceding and preceding)
as region_us
From new_droppers_per_date where country=”us”) as t2 on t1.date=t2.date inner join
(Select date, sum(num_new_customers) over(oreder by date range between 5 preceding and 1
preceding)
as region_rest
From new_droppers_per_date where country <>”us”) as t3 on t2.date=t3.date)
i was wondering if there is any easier\smarter way to do so without using so many joins and view
thank you for the help:)
Here is one way to do it using window functions. First assign a flag to the first login of each DropperId, then aggregate by day and count the number of new logins. Finally, make a window sum() with a range frame that spans over the last 5 days.
select
LoginDay,
sum(CntNewDroppers) over(
order by LoginDay
range between interval 5 day preceding and current row
) RunningCntNewDroppers
from
select
date(LoginTimestamp) LoginDay,
sum(rn = 1) CntNewDroppers
from (
select
LoginTimestamp,
row_number() over(partition by DropperId order by LoginTimestamp) rn
from mytable
) t
) t
Related
I have two tables, "Gate_Logs" and "Employee".
The "Gate_Logs" table has three columns.
Employee ID - A 4 Digit Unique Number assigned to every employee
Status – In or Out
Timestamp – A recorded timestamp
The "Employee" Table has
Employee ID
Level
Designation
Joining Date
Reporting Location
Reporting Location ID - Single Digit ID
I want to find out which employee had the highest weekly work time over the past year, and I am trying to get this data for each individual location. I want to look at the cumulative highest. Let's say Employee X at Location L worked 60 hours in a particular week, which was the highest at that location, so X will be the person I wanted to query.
Please provide any pointers on how I can proceed with this, have been stuck at it for a while.
SQL version 8.0.27
It can use window function LAG to pair In/Out records
periods - pair in/out records
sumup_weekly - compute weekly work hours for each employee
rank_weekly - rank employees per location per week
and finally select those rank one
WITH periods AS (
SELECT
`employee_id`,
`status` to_status,
`timestamp` to_timestamp,
LAG(`status`) OVER w AS fr_status,
LAG(`timestamp`) OVER w AS fr_timestamp
FROM gate_log
WINDOW w AS (PARTITION BY `employee_id` ORDER BY `timestamp` ASC)
),
sumup_weekly AS (
SELECT
`employee_id`,
WEEKOFYEAR(fr_timestamp) week,
SUM(TIMESTAMPDIFF(SECOND, fr_timestamp, to_timestamp)) seconds
FROM periods
WHERE fr_status = 'In' AND to_status = 'Out'
GROUP BY `employee_id`, `week`
),
rank_weekly AS (
SELECT
e.`employee_id`,
e.`location_id`,
w.`week`,
SEC_TO_TIME(w.`seconds`) work_hours,
RANK() OVER(PARTITION BY e.`location_id`, w.`week` ORDER BY w.`seconds` DESC) rank_hours
FROM sumup_weekly w
JOIN employee e ON w.`employee_id` = e.`employee_id`
)
SELECT *
FROM rank_weekly
WHERE rank_hours = 1
DEMO
I'm trying to fetch the records with half an hour time interval of the current day with concern data count for that time period.
So, my output came as expected. But, If count(no records) on the particular time period let's say 7:00 - 7:30 I'm not getting that record with zero count.
My Query as follows :
SELECT time_format( FROM_UNIXTIME(ROUND(UNIX_TIMESTAMP(start_time)/(30* 60)) * (30*60)) , '%H:%i')
thirtyHourInterval , COUNT(bot_id) AS Count FROM bot_activity
WHERE start_time BETWEEN CONCAT(CURDATE(), ' 00:00:00') AND CONCAT(CURDATE(), ' 23:59:59')
GROUP BY ROUND(UNIX_TIMESTAMP(start_time)/(30* 60))
For reference of my output :
We need a source for that 7:30 row; a row source for all the time values.
If we have a clock table that contains all of the time values we want to return, such that we can write a query that returns that first column, the thirty minute interval values we want to return,
as an example:
SELECT c.hhmm AS thirty_minute_interval
FROM clock c
WHERE c.hhmm ...
ORDER BY c.hhmm
then we can do an outer join the results of the query with the missing rows
SELECT c.hhmm AS _thirty_minute_interval
, IFNULL(r._cnt_bot,0) AS _cnt_bot
FROM clock c
LEFT
JOIN ( -- query with missing rows
SELECT time_format(...) AS thirtyMinuteInterval
, COUNT(...) AS _cnt_bot
FROM bot_activity
WHERE
GROUP BY time_format(...)
) r
ON r.thirtyMinuteInterval = c.hhmm
WHERE c.hhmm ...
ORDER BY c.hhmm
The point is that the SELECT will not generate "missing" rows from a source where they don't exist; we need a source for them. We don't necessarily have to have a separate clock table, we could have an inline view generate the rows. But we do need to be able to SELECT those value from a source.
( Note that bot_id in the original query is indeterminate; the value will be from some row in the collapsed set of rows, but no guarantee which value. (If we add ONLY_FULL_GROUP_BY to sql_mode, the query will throw an error, like most other relational databases will when non-aggregate expressions in the SELECT list don't appear in the GROUP BY are aren't functionally dependent on the GROUP BY )
EDIT
In place of a clock table, we can use an inline view. For small sets, we could something like this.
SELECT c.tmi
FROM ( -- thirty minute interval
SELECT CONVERT(0,TIME) + INTERVAL h.h+r.h HOUR + INTERVAL m.mm MINUTE AS tmi
FROM ( SELECT 0 AS h UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11
) h
CROSS JOIN ( SELECT 0 AS h UNION ALL SELECT 12 ) r
CROSS JOIN ( SELECT 0 AS mm UNION ALL SELECT 30 ) m
ORDER BY tmi
) c
ORDER
BY c.tmi
(Inline view c is a standin for a clock table, returns time values on thirty minute boundaries.)
That's kind of ugly. We can see where if we had a rowsource of just integer values, we could make this much simpler. But if we pick that apart, we can see how to extend the same pattern to generate fifteen minute intervals, or shorten it to generate two hour intervals.
I need an Amazon Redshift SQL query to calculate the number of a particular day fall in between two dates.
Date Format - YYYY-MM-DD
For example - Start date = 2019-06-14, End Date = 2019-10-09, Day - 2nd of every month
Now, I want to calculate the count of 2nd-day fall in between 2019-06-14 and 2019-10-09
So, the actual result for the above example should be 4. Since 4 times the 2nd-day will fall in between 2019-06-14 and 2019-10-09.
I tried the DATE_DIFF function and months_between function of redshift. But failed to build the logic. Since not able to understand what math or equation should be.
for me it seems as if you wanted to select from a calendar table. That's how you can solve your problem. You'll notice that the query looks a little hacky because Redshift does not support any functions to generate sequences, which leaves you with creating sequence tables yourself (see seq_10 and seq_1000). Once you have a sequence, you can easily create a calendar with all the information you need (eg. day_of_month).
That's the query answering your question:
WITH seq_10 as (
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
), seq_1000 as (
select
row_number() over () - 1 as n
from
seq_10 a cross join
seq_10 b cross join
seq_10 c
), calendar as (
select '2018-01-01'::date + n as date,
extract(day from date) as day_of_month,
extract(dow from date) as day_of_week
from seq_1000
)
select count(*) from calendar
where day_of_month = 2
and date between '2019-06-14' and '2019-10-09'
I try to show the 'top 5' per month of worked hours.
I have the following query:
SELECT
concat(m.firstname, " ",m.lastname) AS name,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) AS activity,
month(start_activity) AS month,
year(start_activity) AS year
FROM
log AS pl
INNER JOIN
employee AS m
ON
m.employee = pl.employee
GROUP BY
name,
year,
month,
ORDER BY
year,
month,
activity
I tried: limit 0,5 bit it gives me only the first 5 records of all. How can I show 5 records ordered by month?
In MySQL version 8.0.2 and above, we can utilize Window Functions. We can utilize Row_Number() window function to determine row numbers within a partition of concatenated expression of year and month. Ordering within the partition is done based on the descending order of activity.
We can then use this result-set as a Derived Table, and consider row number up-to 5. This will give us 5 rows per month, having top activity values.
SELECT dt.*
FROM
(
SELECT
concat(m.firstname, " ",m.lastname) AS name,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) AS activity,
month(start_activity) AS month,
year(start_activity) AS year,
ROW_NUMBER() OVER (PARTITION BY CONCAT(year(start_activity), month(start_activity))
ORDER BY SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) DESC) AS row_no
FROM
log AS pl
INNER JOIN
employee AS m
ON
m.employee = pl.employee
GROUP BY
name,
year,
month
) AS dt
WHERE dt.row_no <= 5
ORDER BY
dt.year,
dt.month,
dt.activity
I have a table that looks like this and i want to know the number of entries that are registered over a six hour time period and display that period which has max number of entries.
Time
09:42:29
10:37:28
15:18:49
15:28:34
16:43:51
18:14:10
18:26:06
18:26:14
So for each element in Time column, i will include a 6 hour period starting from that element and count how many entries in that column will fall in that period.
Ex 09:42:29 will have the end period has 15:42:29 and it should have count as 4 (09:42:29,10:37:28
15:18:49,15:28:34).
So do this for each element in Time Column and whichever element has max count, that will be the starting time of the period and display the start and end period accordingly.
Help me with writing a mysql query for this. Thank You!!!
Hope it helps
select
T.TimeStart,
T.TimeEnd,
COUNT(*)
from (
select
T.Time TimeStart,
date_add(T.Time,INTERVAL 6 HOUR) TimeEnd
from TimeTable T
) T
inner join TimeTable T2 on
T2.Time between T.TimeStart and T.TimeEnd
group by
T.TimeStart,
T.TimeEnd
The code below is for MSSQL but it works as expected and should give you some guidelines how the example above could be used
WITH TimeTable([Time]) AS (
select
CONVERT(DATETIME,a.a)
from (
values
('09:42:29'),
('10:37:28'),
('15:18:49'),
('15:28:34'),
('16:43:51'),
('18:14:10'),
('18:26:06'),
('18:26:14'))a(a)
)
select
convert(time(7),T.TimeStart)TimeStart,
convert(time(7),T.TimeEnd)TimeEnd,
COUNT(*) [Ocorrences]
from (
select
T.Time TimeStart,
DATEADD(HOUR,6,T.Time) TimeEnd
from TimeTable T
) T
inner join TimeTable T2 on
T2.Time between T.TimeStart and T.TimeEnd
group by
T.TimeStart,
T.TimeEnd