Use Case: I have a cron checking every 5 minutes some statistics and insert it into the database table stats
**Structure**
`time` as DATETIME (index)
`skey` as VARCHAR(50) (index)
`value` as BIGINT
Primary (time and skey)
Now I want to create a graph to display the daily average in progress over the day - so i.E. a graph for playing users:
from 0-1 i have 10 playing users (avg value from 0-1 is now 10)
from 1-2 i have 6 playing users (avg value is now 8 => (10+6) / 2)
from 2-3 i have 14 playing users (avg value is no 10 => (10+6+14) / 3
and next day it begins from start
I got already queries running, but it takes 3.5+ seconds to run
First attempt:
SELECT *
, (SELECT AVG(value)
FROM stats as b
WHERE b.skey = stats.skey
AND b.time <= stats.time
AND DATE(b.time) = DATE(stats.time))
FROM stats
ORDER
BY stats.time DESC
Second attempt:
SELECT *
, (SELECT AVG(b.value)
FROM stats as b
WHERE b.skey = stats.skey
AND DATE(b.time) = DATE(stats.time)
AND b.time <= stats.time) as avg
FROM stats
WHERE skey = 'playingUsers'
GROUP
BY HOUR(stats.time)
, DATE(stats.time)
First try was to get each entry and calculate the average
Second try was to group by hour (like my example)
Anyway, this does not change anything in performance
Is there anyway to boost performance in mysql or do i have to change the full logic behind it?
DB Fiddle:
https://www.db-fiddle.com/f/krFmR1yPsmnPny2zi5NJGv/4
I suggest to separate the calculation of the average per hour from the calculation of the days average and to calculate these values only once per hour via grouping.
If you are on MySQL 8, I suggest to use CTE as follows:
with HOURLY AS (
SELECT distinct
DATE_,
HOUR_,
AVG(b.value) as avg_per_hour
FROM (SELECT s.value, DATE(s.time) DATE_, HOUR(s.time) HOUR_
FROM stats s
where skey = 'playingUsers'
) b
GROUP BY b.DATE_, b.HOUR_
ORDER BY b.DATE_ DESC, b.HOUR_ DESC
)
SELECT *
, (SELECT AVG(b.avg_per_hour)
FROM HOURLY as b
WHERE b.DATE_ = HOURLY.DATE_
AND b.HOUR_ <= HOURLY.HOUR_) as avg
FROM HOURLY
This statement lasts < 300 ms in the given fiddle.
The calculation corresponds to the algorithm you described in the table above.
However, the results differ from those of the statements presented.
Related
I have two tables, "Gate_Logs" and "Employee".
The "Gate_Logs" table has three columns.
Employee ID - A 4 Digit Unique Number assigned to every employee
Status – In or Out
Timestamp – A recorded timestamp
The "Employee" Table has
Employee ID
Level
Designation
Joining Date
Reporting Location
Reporting Location ID - Single Digit ID
I want to find out which employee had the highest weekly work time over the past year, and I am trying to get this data for each individual location. I want to look at the cumulative highest. Let's say Employee X at Location L worked 60 hours in a particular week, which was the highest at that location, so X will be the person I wanted to query.
Please provide any pointers on how I can proceed with this, have been stuck at it for a while.
SQL version 8.0.27
It can use window function LAG to pair In/Out records
periods - pair in/out records
sumup_weekly - compute weekly work hours for each employee
rank_weekly - rank employees per location per week
and finally select those rank one
WITH periods AS (
SELECT
`employee_id`,
`status` to_status,
`timestamp` to_timestamp,
LAG(`status`) OVER w AS fr_status,
LAG(`timestamp`) OVER w AS fr_timestamp
FROM gate_log
WINDOW w AS (PARTITION BY `employee_id` ORDER BY `timestamp` ASC)
),
sumup_weekly AS (
SELECT
`employee_id`,
WEEKOFYEAR(fr_timestamp) week,
SUM(TIMESTAMPDIFF(SECOND, fr_timestamp, to_timestamp)) seconds
FROM periods
WHERE fr_status = 'In' AND to_status = 'Out'
GROUP BY `employee_id`, `week`
),
rank_weekly AS (
SELECT
e.`employee_id`,
e.`location_id`,
w.`week`,
SEC_TO_TIME(w.`seconds`) work_hours,
RANK() OVER(PARTITION BY e.`location_id`, w.`week` ORDER BY w.`seconds` DESC) rank_hours
FROM sumup_weekly w
JOIN employee e ON w.`employee_id` = e.`employee_id`
)
SELECT *
FROM rank_weekly
WHERE rank_hours = 1
DEMO
I'm trying to fetch the records with half an hour time interval of the current day with concern data count for that time period.
So, my output came as expected. But, If count(no records) on the particular time period let's say 7:00 - 7:30 I'm not getting that record with zero count.
My Query as follows :
SELECT time_format( FROM_UNIXTIME(ROUND(UNIX_TIMESTAMP(start_time)/(30* 60)) * (30*60)) , '%H:%i')
thirtyHourInterval , COUNT(bot_id) AS Count FROM bot_activity
WHERE start_time BETWEEN CONCAT(CURDATE(), ' 00:00:00') AND CONCAT(CURDATE(), ' 23:59:59')
GROUP BY ROUND(UNIX_TIMESTAMP(start_time)/(30* 60))
For reference of my output :
We need a source for that 7:30 row; a row source for all the time values.
If we have a clock table that contains all of the time values we want to return, such that we can write a query that returns that first column, the thirty minute interval values we want to return,
as an example:
SELECT c.hhmm AS thirty_minute_interval
FROM clock c
WHERE c.hhmm ...
ORDER BY c.hhmm
then we can do an outer join the results of the query with the missing rows
SELECT c.hhmm AS _thirty_minute_interval
, IFNULL(r._cnt_bot,0) AS _cnt_bot
FROM clock c
LEFT
JOIN ( -- query with missing rows
SELECT time_format(...) AS thirtyMinuteInterval
, COUNT(...) AS _cnt_bot
FROM bot_activity
WHERE
GROUP BY time_format(...)
) r
ON r.thirtyMinuteInterval = c.hhmm
WHERE c.hhmm ...
ORDER BY c.hhmm
The point is that the SELECT will not generate "missing" rows from a source where they don't exist; we need a source for them. We don't necessarily have to have a separate clock table, we could have an inline view generate the rows. But we do need to be able to SELECT those value from a source.
( Note that bot_id in the original query is indeterminate; the value will be from some row in the collapsed set of rows, but no guarantee which value. (If we add ONLY_FULL_GROUP_BY to sql_mode, the query will throw an error, like most other relational databases will when non-aggregate expressions in the SELECT list don't appear in the GROUP BY are aren't functionally dependent on the GROUP BY )
EDIT
In place of a clock table, we can use an inline view. For small sets, we could something like this.
SELECT c.tmi
FROM ( -- thirty minute interval
SELECT CONVERT(0,TIME) + INTERVAL h.h+r.h HOUR + INTERVAL m.mm MINUTE AS tmi
FROM ( SELECT 0 AS h UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11
) h
CROSS JOIN ( SELECT 0 AS h UNION ALL SELECT 12 ) r
CROSS JOIN ( SELECT 0 AS mm UNION ALL SELECT 30 ) m
ORDER BY tmi
) c
ORDER
BY c.tmi
(Inline view c is a standin for a clock table, returns time values on thirty minute boundaries.)
That's kind of ugly. We can see where if we had a rowsource of just integer values, we could make this much simpler. But if we pick that apart, we can see how to extend the same pattern to generate fifteen minute intervals, or shorten it to generate two hour intervals.
i have a table that looks like this :
what i need is For each day show the accumulated (moving) number of new Droppers in the last 5 days inclusive
Split to US vs. Non-US geos.
Report Columns:
DataTimstamp - upper boundary of a 5-day time frame
Total - number of New Droppers within the time frame
Region_US - number of New Droppers where country =’US’
Region_rest - number of New Droppers where country<>’US’
this is my code :
Create view new_droppers_per_date as Select date,country,count(dropperid) as num_new_customers
From(
Select dropper id,country,min(cast( LoginTimestamp as date)) as date
From droppers) as t1 group by date,country
Select DataTimstamp,region_us,region_rest from(
(Select date as DataTimstamp ,sum(num_new_customers) over(oreder by date range between 5
preceding and 1 preceding) as total
From new_droppers_per_date ) as t1 inner join
(Select date ,sum(num_new_customers) over(oreder by date range between 5 preceding and preceding)
as region_us
From new_droppers_per_date where country=”us”) as t2 on t1.date=t2.date inner join
(Select date, sum(num_new_customers) over(oreder by date range between 5 preceding and 1
preceding)
as region_rest
From new_droppers_per_date where country <>”us”) as t3 on t2.date=t3.date)
i was wondering if there is any easier\smarter way to do so without using so many joins and view
thank you for the help:)
Here is one way to do it using window functions. First assign a flag to the first login of each DropperId, then aggregate by day and count the number of new logins. Finally, make a window sum() with a range frame that spans over the last 5 days.
select
LoginDay,
sum(CntNewDroppers) over(
order by LoginDay
range between interval 5 day preceding and current row
) RunningCntNewDroppers
from
select
date(LoginTimestamp) LoginDay,
sum(rn = 1) CntNewDroppers
from (
select
LoginTimestamp,
row_number() over(partition by DropperId order by LoginTimestamp) rn
from mytable
) t
) t
I have a table containing:
Balance, Client_ID, Date
This table has ~25 Million rows - Most days, a service executes and creates a new row for each client, with today's date, and balance of the client.
Inside a date range, lets say 01/01/2016 to 12/05/2016, I need to get the first and last row.
*the service does not run every day, so doing Date = 12/05/2016 will not work. If today's balance is equal to yesterday's balance, there is no row inserted (saves me about 90% of the data, which if I calculate correctly, should be 300 Million rows)
To do such, I run these two queries:
Get the first date: 6.9433851242065 seconds
SELECT * FROM (SELECT * FROM daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016') dates
GROUP BY Client_ID
Get the last date: 32.034277915955 seconds
SELECT * FROM (SELECT * FROM daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016'
ORDER BY Date DESC) dates
GROUP BY Client_ID
The first query has no order, because rows are inserted always in the right order, by the service mentioned above - and such is much faster. (7/32)
How can I make both queries faster, or at least the second one?
Query description:
Get the row where the date is the first date after 01/01/2016
Get the row where the date is the last date before 13/05/2016
EDIT: The checked answer gives me the following:
ASC and DESC are mine, 'combined' is the suggested answer
dates_ASC: 33.300458192825
dates_DESC: 8.9232740402222
dates_combined: 8.4357199668884
dates_ASC: 5.4825110435486
dates_DESC: 10.173403978348
dates_combined: 2.7024359703064
dates_ASC: 15.090759038925
dates_DESC: 29.375104904175
dates_combined: 3.2885720729828
Pick each client's min and max time in a derived table. Join with that table:
select *
from daily d1
join (select Client_ID, max(TIME) as maxtime, min(TIME) as mintime
from daily
WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016'
group by Client_ID) d2
on d1.Client_ID = d2.Client_ID and d1.TIME in (d2.mintime, d2.maxtime)
Try first query as:
SELECT * FROM daily WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016' ORDER BY TIME ASC LIMIT 1
The second query as:
SELECT * FROM daily WHERE TIME >= '01/01/2016' AND TIME < '13/05/2016' ORDER BY TIME DESC LIMIT 1
I'm trying to take the average time of two selects and return them
to see what percentage of time has gone by since the last order. It
is part of an early alert system if you're wondering why.
Select 1 takes 5 of the (almost) most current datetimes, records 2-6
for example to get the average of some of the last ordered products.
Select 2 (I don't think its working) takes the average of the current
order and NOW() to see how much time has gone by since the last order.
The return should be the average time between those recent orders and
the last order placed. I evidently have some bugs in my code, and can
not get it to work...
I dunno, I'm hoping some obvious stuff sticks out, like how I botched the
UNION() or muddled the second TIMESTAMPDIFF()
SELECT (spread + recent) / 2 AS lapse FROM
(SELECT TIMESTAMPDIFF(MINUTE, MIN(created_at), MAX(created_at) )
/
(COUNT(DISTINCT(created_at)) -1)
FROM ( SELECT created_at
FROM dbname.sales_flat_order_status_history
ORDER BY created_at DESC LIMIT 5 OFFSET 1
)
AS created_at) AS spread,
UNION
(SELECT TIMESTAMPDIFF(MINUTE, MIN(created_at), NOW() )
/
(COUNT(DISTINCT(created_at)) -1)
FROM ( SELECT created_at
FROM dbname.sales_flat_order_status_history
ORDER BY created_at DESC LIMIT 1
)
AS created_at) AS recent;
ORDER BY lapse LIMIT 1;