Query data to create statistical chart - mysql

I have a table with these fields id, title, ddetail, date_created, type, website_id called Warning. I want query data with a given time and hava the following case(my opinion):
If the month and year are in the beginning time, the end time is equal, I will query weekly warning for that month.(eg, 01/07/2019 - 31/07/2019)
If both are the same year, I will query monthly warning, it mean I will count the number of warnings from the beginning month to the end month. (eg, 1/2019 - 9/2019)
The last case is almost similar to the above case but with longer time periods. (eg, 12/2018 - 3/2019)
Example for above cases:
Time: 01/07/2019 - 31/07/2019, the result:
01/07/2019 - 07/07/2019: 3 warnings
08/07/2019 - 14/07/2019: 0 warning
15/07/2019 - 21/07/2019: 1 warning
22/07/2019 - 28/07/2019: 2 warning
29/07/2019 - 31/07/2019: 0
Case 2, Time: 1/2019 - 6/2019
1/2019: 1 warnings
2/2019: 3 warnings
3/2019: 0 warnings
4/2019: 1 warnings
5/2019: 2 warnings
6/2019: 0 warnings
this is my solution, but I can't write SQL query. I need 3 SQL query for 3 case and fix my solution if possible.

To begin with, we know that we need to select warnings, then group them by a range of times. I am going to group by calendar week to start with.
All the warnings in a month:
SELECT `id` from warnings WHERE date_created >= 2019-07-01 AND date_created <= 2019-07-31;
To get HOW MANY warnings were in the month, it's almost the same:
SELECT count(`id`) from warnings WHERE date_created >= 2019-07-01 AND date_created <= 2019-07-31;
That will return one row with a single value in it. Not very interesting yet. To find out how many warnings happened each (calendar) week, you can group the results by the week.
SELECT count(`id`) as num_warnings, WEEK(date_created) as weeknum
FROM warning
WHERE `date_created` >= 2019-07-01 AND date_created <= 2019-07-31
GROUP BY weeknum;
This will give you the number of warnings in a calendar week. If the month started on a Friday, the first week will have a low number.
To query for seven-day intervals starting on the first of the month, things get a lot more complicated. (Also, obviously the last "week" won't be a full seven days.)
To help, I first referred to SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc) which talks about grouping by a number of seconds. A week is 60*60*24*7 seconds, so the answer can be converted pretty easily - but there's a catch we will get to.
SELECT count(`id`) as num_warnings as weeknum
FROM warning
WHERE `date_created` >= 2019-07-01 AND date_created <= 2019-07-31
GROUP BY UNIX_TIMESTAMP(date_created) DIV 604800
This takes the timestamp of the warning and divides it by the number of seconds in a week and chops off the decimal. So every 604800 seconds the division will increase by 1. Almost there, but here's the catch: this will tell you how many weeks it has been since January 1, 1973, and you want to know how many weeks it has been since the first of the month. Put another way, you want zero to be at the start of the month, not in 1973.
SELECT count(`id`) as num_warnings
FROM warnings
WHERE `date_created` >= 2019-07-01 AND date_created <= 2019-07-31
GROUP BY (UNIX_TIMESTAMP(date_created) - UNIX_TIMESTAMP('2019-07-01')) DIV 604800
That's pretty much it for dividing a month by weeks. I know almost nothing of Django, so I can't help you with the code that would generate the query.
But what about dividing a year by months? At first it seems like a similar problem, but there's a catch: How many seconds are there in a month?
The answer for grouping by month over a year is actually more like the original solution above for dividing by week. It works because the year always starts at the beginning of a month:
SELECT count(`id`) as num_warnings, MONTH(date_created) as monthNum
FROM warnings
WHERE `date_created` >= 2019-01-01 AND date_created <= 2019-12-31
GROUP BY monthNum;
Should get you close to where you want to go.
The two queries are different enough that you will want to recognize the different cases in your Django code and build the appropriate query.

Related

Generating time series reports

I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
Let's say you have a gigantic table measure with two columns datestamp and temp.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.
You'll need different trunc() functions for hours, etc.
The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.

MySQL find date range with count

In MySQL, it is fairly easy to find the number of records that exist within some time interval.
SELECT COUNT(*) FROM records WHERE create_date > '2018-01-01 01:15:00' AND create_date < '2018-01-01 02:15:00'
But I want to do the opposite, sort of. Rather than providing a time interval and getting a count of records, I want to provide a count of records and check if a X minute time interval exists where more than Y many records were created. Getting the exact time interval is not essential, only if one exists or not. At a higher level, I am attempting to identify if there was any X minute "surge" when more than Y records where created during the course of a day.
For example, in the past 24 hours was there any 1 hour interval where a "surge" of more than 50 new records occurred?
I have already ruled out dividing the 24 hours into blocks of 1 hour intervals and checking each block. This does not work because the "surge" could span two sequential 1 hour blocks, such as 25 records at the end of the 01:00:00 block and 25 records at the beginning of the 02:00:00 block.
This should do it:
SELECT COUNT(*)
FROM records r1
WHERE
(SELECT COUNT(*) FROM records r2
WHERE ABS(UNIX_TIMESTAMP(r1.create_date) - UNIX_TIMESTAMP(r2.create_date)) < X) > Y
What this does is count how many records have more than Y records that have been created within X seconds after or before each record.
So basically it will return >=1 if there are any, 0 if not.
So if you wanted to sort by hours you would want to group the records. Here I'm using the built-in functions that return parts of a timestamp, year(), month(), dayofmonth(), hour(). Since you can't use an aggregate function in the where clause I had to use having to limit by the count requirement.
select date(create_date),
hour(create_date),
count(*) as surge from records
where create_date > curdate() - interval 1 day
group by year(create_date), month(create_date),
dayofmonth(create_date), hour(create_date)
having count(*) > 50;
Another method to accomplish your goal might be to select the count of records and group by the interval in question. In this case I'm adding an hour to the create_date to get your 1 hour suggested interval. Anytime the count is greater than 50 it returns a row. Notice I'm also grouping by the hour. This is to prevent multiple starts for a "surge" within the same hour:
select create_date,count(*) as surge from records
group by year(create_date), month(create_date),
dayofmonth(create_date),hour(create_date),
(create_date + interval 1 hour - create_date) having count(*) > 50;
The problem with this however is that some surges may last longer than 1 hour, but it should give you the moment the "surge" started.

Efficient SELECT query to find records within a month

I have a MySQL DB table with multiple date type fields. I need to do different SELECT queries on this table but I am not sure which way is the best to find records from the same month.
I know I can do the following:
SELECT *
FROM table
WHERE MONTH(somedate) = 5
AND YEAR(somedate) = 2015
But I keep reading that isn't efficient and that I should go with using actual dates, i.e.
SELECT *
FROM table
WHERE somedate BETWEEN '2015-05-01' AND '2015-05-31'
However, all I would have is the month and the year as variables coming in from PHP. How do I easily and quickly calculate the last day of the month if I go with second option?
Don't calculate the last day of the month. Calculate the first day of the next month instead.
Your query can be like this
WHERE t.mydatetimecol >= '2015-05-01'
AND t.mydatetimecol < '2015-05-01' + INTERVAL 1 MONTH
Note that we're doing a less than comparison, not a "less than or equal to"... this is very convenient for comparing TIMESTAMP and DATETIME columns, which can include a time portion.
Note that a BETWEEN comparison is a "less than or equal to". To get a comparison equivalent to the query above, we'd need to do
WHERE t.mydatetimecol
BETWEEN '2015-05-01' AND '2015-05-01' + INTERVAL 1 MONTH + INTERVAL -1 SECOND
(This assumes that the resolution of DATETIME and TIMESTAMP is down to a second. In other databases, such as SQL Server, the resolution is finer than a second, so there we'd have the potential of missing a row with value of '2015-05-31 23:59:59.997'. We don't have a problem like that with the less than the first day of the next month comparison... < '2015-06-01'
No need to do the month or date math yourself, let MySQL do it for you. If you muck with adding 1 to the month, you have to handle the rollover from December to January, and increment the year. MySQL has all that already builtin.
date('t', strtotime("$year-$month-01")) will give days in the month

How to get the count of specific days from week between a date range in mysql?

I have a table "task_table" containing columns-
Task_id, Start_date, End_date
And I have one more "configuration" table which has the records that tell which days of the week are working days.
This table has two columns -
week_day, isHoliday
and this table contains seven records as week_days are the Monday,Tuesday.....Sunday , and each record has an entry as 1 or 0. If a day is a holiday in any organization then there will be 0 against that day. Like if an organisation has holidays on Wednesday and Friday every week then there will be 0 against Wednesday and Friday only.
Now I want to make a SQL query to get the Task_id, Start_date, End_date, and the count of total days consumed on each task. (These days are the days between task start_date and end_date excluding the holiday days as configured in "configuration" table.)
I don't have time to fully answer this question now, but what I would do is:
Get the date as at the start of the Start_date week, and the date as at the end of the End_date week (you can get this by date_adding an amount of days according to the day of the week.
Then you want to date diff them, divide by seven, multiply by two, and remove any that you would have added (e.g. if the start date was Thursday then you'll need to remove one from the result, as you will have counted one for the Wednesday immediately prior.
I'll write the code out tomorrow (it's late here - something like 14 hours from now or so.) if noone else has suggested a better answer.
Edit: Right, didn't properly read the question, but the tactic still applies with a little fiddling. Speaking of which, here is the fiddle of my solution.
It boils down to the following code:
set #holidaysPerWeek = (select sum(isHoliday) from configuration);
select
Task_id,
((dateDiff(
DATE_ADD(End_Date, INTERVAL 7 - DayOfWeek(End_Date) DAY),
DATE_ADD(Start_Date, INTERVAL -DayOfWeek(Start_Date) + 1 Day)) + 1) / 7)
* #holidaysPerWeek
- (select sum(isHoliday) from configuration where week_day > DayOfWeek(End_Date))
- (select sum(isHoliday) from configuration where week_day < DayOfWeek(Start_Date)),
DayOfWeek(End_Date)
from task_table
This does exactly what I was saying before, but with a variable number of "weekends" spread throughout the week, by first selecting the number of holidays for if the full weeks were covered, then removing holidays that were before or after the start and end dates respectively.

How do I select two weeks ago in MYSQL?

I have a report that is driven by a sql query that looks like this:
SELECT batch_log.userid,
batches.operation_id,
SUM(TIME_TO_SEC(ramses.batch_log.time_elapsed)),
SUM(ramses.tasks.estimated_nonrecurring + ramses.tasks.estimated_recurring),
DATE(start_time)
FROM batch_log
JOIN batches ON batch_log.batch_id=batches.id
JOIN ramses.tasks ON ramses.batch_log.batch_id=ramses.tasks.batch_id
JOIN protocase.tblusers on ramses.batch_log.userid = protocase.tblusers.userid
WHERE DATE(ramses.batch_log.start_time) > "2011-02-01"
AND ramses.batch_log.time_elapsed > "00:03:00"
AND DATE(ramses.batch_log.start_time) < now()
AND protocase.tblusers.active = 1
AND protocase.tblusers.userid NOT in ("ksnow","smanning", "dstapleton")
GROUP BY userid, batches.operation_id, date(start_time)
ORDER BY start_time, userid ASC
Since this is to be compared with the time from the current payperiod it causes an error.
Our pay periods start on a Sunday, the first pay period was 2011-02-01 and our last pay period started the 4th of this month. How do I put that into my where statement to strip the most recent pay period out of the query?
EDIT: So now I'm using date_sub(now(), INTERVAL 2 WEEK) but I really need a particular day of the week(SUNDAY) since it is wednesday it's chopping it off at wednesday.
You want to use DATE_SUB, and as an example.
Specifically:
select DATE_SUB(curdate(), INTERVAL 2 WEEK)
gets you two weeks ago. Insert the DATE_SUB ... part into your sql and you're good to go.
Edit per your comment:
Check out DAYOFWEEK:
and you can do something along the lines of:
DATE_SUB(DATE_SUB(curdate(), INTERVAL 2 WEEK), INTERVAL 2 + DAYOFWEEK(curdate()) DAY)
(I don't have a MySql instance to test it on .. but essentially subtract the number of days after Monday.)
Question isn't quite clear, especially after the edit - it isn't clear now is the "pay period" two weeks long or do you want just last two weeks back from last sunday? I assume that the period is two weeks... then you first need to know how many days the latest period (which you want to ignore, as it isn't over yet) has been going on. To get that number of days you can use expression like
DATEDIFF(today, FirstPeriod) % 14
where FirstPeriod is 2011-02-01. And now you strip that number of days from the current date in the query using date_sub(). The exact expression depends on how the period is defined but you should get the idea...