Grouping MySQL results by 7 day increments - mysql

Hoping someone might be able to assist me with this.
Assume I have the table listed below. Hosts can show up multiple times on the same date, usually with different backupsizes.
+------------------+--------------+
| Field | Type |
+------------------+--------------+
| startdate | date |
| host | varchar(255) |
| backupsize | float(6,2) |
+------------------+--------------+
How could I find the sum total of backupsize for 7 day increments starting with the earliest date, through the last date? I don't mind if the last few days get cut off because they don't fall into a 7 day increment.
Desired output (prefered):
+------------+----------+----------+----------+-----
|Week of | system01 | system02 | system03 | ...
+------------+----------+----------+----------+-----
| 2014/07/30 | 2343.23 | 232.34 | 989.34 |
+------------+----------+----------+----------+-----
| 2014/08/06 | 2334.7 | 874.13 | 234.90 |
+------------+----------+----------+----------+-----
| ... | ... | ... | ... |
OR
+------------+------------+------------+------
|Host | 2014/07/30 | 2014/08/06 | ...
+------------+------------+------------+------
| system01 | 2343.23 | 2334.7 | ...
+------------+------------+------------+-------
| system02 | 232.34 | 874.13 | ...
+------------+------------+------------+-------
| system03 | 989.34 | 234.90 | ...
+------------+------------+------------+-------
| ... | ... | ... |
Date format is not a concern, just as long as it gets identified somehow. Also, the order of the hosts is not a concern either. Thanks!

The simplest way is to get the earliest date and just count the number of days:
select x.minsd + interval floor(datediff(x.minsd, lb.startdate) / 7) day as `Week of`,
host,
sum(backupsize)
from listedbelow lb cross join
(select min(startdate) as minsd from listedbelow lb) x
group by floor(datediff(x.minsd, lb.startdate) / 7)
order by 1;
This produces a form with week of and host on each row. You can pivot the results as you see fit.

I'll assume that what you want is the sum of bakcupsize grouped by host and that seven-day interval you are talking about.
My solution would be something like this:
You need to define the first date, and then "create" a column with the date you want (the end of the seven-day period)
Then I would group it.
I think temporary tables and little tricks with temp variables are the best way to tackle this, so:
drop table if exists temp_data;
create temporary table temp_data
select a.*
-- The #d variable will have the date that you'll use later to group the data.
, #d := case
-- If the current "host" value is the same as the previous one, then...
when #host_prev = host then
-- ... if #d is not null and is within the seven-day period,
-- then leave the value of #d intact; in other case, add 7 days to it.
case
when #d is not null or a.startdate <= #d then #d
-- The coalesce() function will return the first not null argument
-- (just as a precaution)
else dateadd(coalesce(#d, a.startdate), interval +7 day)
end
-- If the current "host" value is not the same as the previous one,
-- then take the current date (the first date of the "new" host) and add
-- seven days to it.
else #d = dateadd(a.startdate, interval +7 day)
end as date_group
-- This is needed to perform the comparisson in the "case" piece above
, #host_prev := a.host as host2
from
(select #host_prev = '', #d = null) as init -- Initialize the variables
, yourtable as a
-- IMPORTANT: This will only work if you order the data properly
order by a.host, a.startdate;
-- Add indexes to the temp table, to make things faster
alter table temp_data
add index h(host),
add index dg(date_group)
-- OPTIONAL: You can drop the "host2" column (it is no longer needed)
-- , drop column host2
;
Now, you can get the grouped data:
select a.host, a.date_group, sum(a.bakcupsize) as backupsize
from temp_data as a
group by a.host, a.date_group;
This will give you the unpivoted data. If you want to build a pivot table with it, I recommend you take a look to this article, and/or read this question and its answers. In short, you'll have to build a "dynamic" sql instruction, prepare a statement with it and execute it.
Of course, if you want to group this by week, there's a simpler approach:
drop table if exists temp_data2;
create temporary table temp_data2
select a.*
-- The following will give you the end-of-week date
, dateadd(a.startdate, interval +(6 - weekday(a.startdate)) day) as group_date
from yourtable as a;
alter table temp_data
add index h(host),
add index dg(date_group);
select a.host, a.date_group, sum(a.bakcupsize) as backupsize
from temp_data as a
group by a.host, a.date_group;
I leave the pivot part to you.

So I was able to determine a solution that fit my needs using a procedure I created by putting together concepts from your recommended solutions as well as some other other solutions I found on this site. The procedure SUM's by 7 day increments as well as does a pivot.
DELIMITER $$
CREATE PROCEDURE `weekly_capacity_by_host`()
BEGIN
SELECT MIN(startdate) into #start_date FROM testtable;
SET #SQL = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'SUM(if(host=''',host,''', backupsize, 0)) as ''',host,''''
)
) INTO #SQL
FROM testtable;
SET #SQL = CONCAT('SELECT 1 + DATEDIFF(startdate, ''',#start_date,''') DIV 7 AS week_num
, ''',#start_date,''' + INTERVAL (DATEDIFF(startdate, ''',#start_date,''') DIV 7) WEEK AS week_start,
', #SQL,'
FROM testtable group by week_num'
);
PREPARE stmt FROM #SQL;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;
Output appears as follows:
mysql> call weekly_capacity_by_host;
+----------+------------+----------+----------+----------+----------+
| week_num | week_start | server01 | server02 | server03 | server04 |
+----------+------------+----------+----------+----------+----------+
| 1 | 2014-06-11 | 1231.08 | 37.30 | 12.04 | 68.17 |
| 2 | 2014-06-18 | 1230.98 | 37.30 | 11.76 | 68.13 |
| 3 | 2014-06-25 | 1243.12 | 37.30 | 8.85 | 68.59 |
| 4 | 2014-07-02 | 1234.73 | 37.30 | 11.77 | 67.80 |
| 5 | 2014-07-09 | 341.32 | 0.04 | 0.14 | 4.94 |
+----------+------------+----------+----------+----------+----------+
5 rows in set (0.03 sec)

Related

Inserting random data from a list

These are my table columns:
ID || Date || Description || Priority
My goal is to insert random test data of 2000 rows with date ranging between (7/1/2019 - 7/1/2020) and randomize the priority from list (High, Medium, Low).
I know how to insert random numbers but I am stuck with the date and the priority fields.
If I need to write code, any pointers on how do I do it?
Just want to be clear - I have issue with randomizing and inserting from a given list
CREATE TABLE mytable (
id SERIAL PRIMARY KEY,
date DATE NOT NULL,
description TEXT,
priority ENUM('High','Medium','Low') NOT NULL
);
INSERT INTO mytable (date, priority)
SELECT '2019-07-01' + INTERVAL FLOOR(RAND()*365) DAY,
ELT(1+FLOOR(RAND()*3), 'High', 'Medium', 'Low')
FROM DUAL;
The fake table DUAL is a special keyword. You can select from it, and it always returns exactly one row. But it has no real columns with data, so you can only select expressions.
Do this INSERT a few times and you get:
mysql> select * from mytable;
+----+------------+-------------+----------+
| id | date | description | priority |
+----+------------+-------------+----------+
| 1 | 2019-10-20 | NULL | Medium |
| 2 | 2020-05-17 | NULL | High |
| 3 | 2020-06-25 | NULL | Low |
| 4 | 2020-05-06 | NULL | Medium |
| 5 | 2019-09-30 | NULL | High |
| 6 | 2019-08-06 | NULL | Low |
| 7 | 2020-02-21 | NULL | High |
| 8 | 2019-11-10 | NULL | High |
| 9 | 2019-07-30 | NULL | High |
+----+------------+-------------+----------+
Here's a trick to use the number of rows in the table itself to insert the same number of rows, basically doubling the number of rows:
INSERT INTO mytable (date, priority)
SELECT '2019-07-01' + INTERVAL FLOOR(RAND()*365) DAY,
ELT(1+FLOOR(RAND()*3), 'High', 'Medium', 'Low')
FROM mytable;
Just changing FROM DUAL to FROM mytable I change from selecting one row, to selecting the current number of rows from the table. But the values I insert are still random expressions, not the values already in those rows. So I get new rows with new random values.
Then repeat this INSERT as many times as you want to double the number of rows.
Read also about the ELT() function.
You seem to be looking for something like this. A basic random sample is:
select t.*
from t
where date >= '2019-07-01' and date < '2020-07-01'
order by random()
fetch first 2000 rows only;
Of course, the function for random() varies by database, as does the logic for limiting rows. This should get about the same distribution of priorities as in the original data.
If you want the rows to come by priority first, then use:
select t.*
from t
where date >= '2019-07-01' and date < '2020-07-01'
order by (case when priority = 'High' then 1 when priority = 'Medium' then 2 else 3 end),
random()
fetch first 2000 rows only;

How to check missing timestamp in MySql

In my SQL database (MySql), I want to record the price history of an asset.
I have a table with a timestamp as a primary key and price as the value. It has only two column timestamp / price
There should be one price point per second recorded.
Sometimes, there are missing price points. (When the server goes down)
Here is an example of the timestamp column.
**timestamp**
1581431400
1581431401
1581431402
1581431403
1581431405
1581431406 //missing 4 rows price points after this
1581431410
1581431411
1581431412
1581431413
1581431414
1581431415 //missing 3 rows price points after this
1581431418
1581431419
1581431420
Given two timestamps, how to run a SQL query that will fetch the timestamp ranges where the data exists without querying the entire database?
For example, I let's say the two timestamp in UNIX are 1 and 2000000000
What is the SQL query I should run to return the following ranges:
[
[1581431400,1581431406],
[1581431410,1581431415],
[1581431418,1581431420]
]
Here is my answer (Hack). You can use a query like this.
SELECT CONCAT( '[',GROUP_CONCAT('\n',
'[', res.missing_from, '],'
,'[', res.missing_to -1,']') , '\n]') AS missing
FROM (
SELECT m.ts+1 AS missing_from,
(SELECT ts FROM mytable WHERE ts > m.ts ORDER BY ts LIMIT 1 ) as missing_to
FROM mytable m
LEFT JOIN mytable mf ON m.ts+1 = mf.ts
WHERE
mf.ts IS NULL
) AS res
WHERE res.missing_to - res.missing_from > 0;
SAMPLE
mysql> SELECT * FROM mytable;
+------------+
| ts |
+------------+
| 1581431400 |
| 1581431401 |
| 1581431402 |
| 1581431403 |
| 1581431405 |
| 1581431406 |
| 1581431410 |
| 1581431411 |
| 1581431412 |
| 1581431413 |
| 1581431414 |
| 1581431415 |
| 1581431418 |
| 1581431419 |
| 1581431420 |
+------------+
15 rows in set (0.00 sec)
TEST
mysql> SELECT CONCAT( '[',GROUP_CONCAT('\n',
'[', res.missing_from, '],'
,'[', res.missing_to -1,']') , '\n]') AS missing
FROM (
SELECT m.ts+1 AS missing_from,
(SELECT ts FROM mytable WHERE ts > m.ts ORDER BY ts LIMIT 1 ) as missing_to
FROM mytable m
LEFT JOIN mytable mf ON m.ts+1 = mf.ts
WHERE
mf.ts IS NULL
) AS res
WHERE res.missing_to - res.missing_from > 0;
+-------------------------------------------------------------------------------------+
| missing |
+-------------------------------------------------------------------------------------+
| [
[1581431404],[1581431404],
[1581431407],[1581431409],
[1581431416],[1581431417]
] |
+-------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
I would simply use window functions:
select min(timestamp), max(timestamp)
from (select timestamp, row_number() over (order by timestamp) as seqnum
from t
) t
group by (timestamp - seqnum);
I'm not sure what "without querying the entire database?" is supposed to mean. This reads the table -- as any such query would need to -- but does not need to query anything else in the database.
This illustrates what happens:
timestamp seqnum diff
1581431400 1 1581431399
1581431401 2 1581431399
1581431402 3 1581431399
1581431403 4 1581431399
1581431405 5 1581431400
1581431406 6 1581431400
1581431410 7 1581431403
1581431411 8 1581431403
The last column is identifying adjacent timestamps that differ by "1". That is what is aggregated in the outer query.

How to order records by hours from 7 to 6:55?

How to sort records in mysqli by a custom start time:
For example in the table below i want to sort by date with the exception that the 'start' of every day is to be 7AM and not 12AM.
This seems like a trivial order by on the concatenated values of date and time
drop table if exists t;
create table t
(id int auto_increment primary key, time time, dt date);
insert into t(dt,time) values
('2019-10-15' ,'05:00:00'),
('2019-10-16' ,'06:55:00'),
('2019-10-15' ,'22:00:00'),
('2019-10-15' ,'07:55:00');
select * from t
order by concat(dt,time);
+----+----------+------------+
| id | time | dt |
+----+----------+------------+
| 1 | 05:00:00 | 2019-10-15 |
| 4 | 07:55:00 | 2019-10-15 |
| 3 | 22:00:00 | 2019-10-15 |
| 2 | 06:55:00 | 2019-10-16 |
+----+----------+------------+
4 rows in set (0.00 sec)
If this is not what you want then you need to put a bit more work into the question.
As mentioned by you in example https://imgur.com/a/IS7tupp, you need your tasklist to start at 7AM.
Please check and see if this works for you.
Sqlfiddle link : http://sqlfiddle.com/#!9/4b725c/1
drop table if exists t;
create table t (id int auto_increment primary key, time time);
insert into t(time) values
('00:05:00'),
('05:50:00'),
('07:00:00'),
('13:00:00'),
('19:00:00'),
('23:55:00');
select * from t
order by
case when time < '07:00:00' then 1 end asc, time;
#skelwa do you know how to order by sedcond column Date ?
so i have this sql and i need to add order by date also.. For example I need records from 2019-10-24 and after that 2019-10-25 from 7 to 6:55
select * from t
order by
case when time < '07:00:00' then 1 end asc, time;

Calculating total length of a union of time intervals presented at a table

I wish to calculate the total time length of the union of time intervals presented at a table.
For example, given the following:
mysql> SELECT * FROM Temp;
+------+---------------------+---------------------+
| id | start | end |
+------+---------------------+---------------------+
| 1 | 2010-01-01 10:00:00 | 2010-01-01 11:00:00 |
| 2 | 2010-01-01 12:00:00 | 2010-01-01 14:00:00 |
| 3 | 2010-01-01 13:00:00 | 2010-01-01 15:00:00 |
+------+---------------------+---------------------+
I would like to somehow select the total time length, which is in this case 4 hours (the total time length of the union of the intervals (10:00, 11:00) and (12:00, 15:00).
I do not care if the output will be in seconds (either INT or FLOAT), or in any other sensible format.
It may worth mentioning that I am not sure about the order of the date in the table. It is not guaranteed that either the start datetimes or the end datetimes are sorted in any manner. I also can't say anything about a "typical" datetime interval - it may be more than one day, for example.
I can say, however, that any single time interval is of non-negative length. That is, for any record, the end date is at least as late as the start date.
I know how to accomplish that task in simple programming languages (such as Python); I just wonder if there's a sensible way to do so in pure MySQL. If not, I'll just select everything and process it in some other programming language. Hence, "it is impossible to accomplish this in MySQL without some very serious effort" may also serve as a legitimate answer to this question...
I've seen this question which is similar, but is about tsql. The solution presented there is using syntax which is unknown to MySQL such as cross apply, and my attempts to translate it have failed.
As requested, here are queries to create an example data:
CREATE TABLE Temp (id INT, start DATETIME, end DATETIME);
INSERT INTO Temp (id, start, end) VALUES (1, '2010-01-01 10:00', '2010-01-01 11:00');
INSERT INTO Temp (id, start, end) VALUES (2, '2010-01-01 13:00', '2010-01-01 14:00');
INSERT INTO Temp (id, start, end) VALUES (3, '2010-01-01 11:00', '2010-01-01 16:00');
So the data will be as follows:
+------+---------------------+---------------------+
| id | start | end |
+------+---------------------+---------------------+
| 1 | 2010-01-01 10:00:00 | 2010-01-01 11:00:00 |
| 2 | 2010-01-01 13:00:00 | 2010-01-01 14:00:00 |
| 3 | 2010-01-01 11:00:00 | 2010-01-01 16:00:00 |
+------+---------------------+---------------------+
The result on this example data should be 6 hours.
Disclaimer
This is probably best done outside of SQL
For Those That Like Painful Queries
You could create a query that attempts to decide whether there is a row elsewhere in the table that overlaps the end column. If there is not then try to find out how much time is in between the end column and the nearest start column aka gap.
Then take the maximum end from the whole table, subtract the minimum start from the whole table and finally subtract the total of the gap columns:
select
unix_timestamp(maxEnd)-unix_timestamp(minSt)-sum(case when hasEndOverlap=0 then gap else 0 end) as unionSecs,
(unix_timestamp(maxEnd)-unix_timestamp(minSt)-sum(case when hasEndOverlap=0 then gap else 0 end))/(60*60) as unionHrs
from
(
select c.id,c.`start`,c.`end`,
c.minSt,c.maxEnd,
c.hasEndOverlap,
#prevSt,
unix_timestamp(#prevSt)-unix_timestamp(c.`end`) as gap,
#prevSt := c.`start`
from
(
select t.id,t.`start`,t.`end`,
a.minSt,a.maxEnd,
case when min(te.id) is null and t.`end` != a.maxEnd then 0 else 1 end as hasEndOverlap
from Temp t
left outer join Temp te on t.`end` >= te.`start` and t.`end` <= te.`end` and t.id != te.id
join (select min(`start`) as minSt,max(`end`) as maxEnd from test.`Temp`) a
group by t.id,t.`start`,t.`end`
) c
join (select #prevSt := '1970-01-01') r
order by c.`end` desc
) d
group by minSt,maxEnd
;

MySQL: Creating buckets on the fly

I have a mysql table that stores network utilization for every five minutes, I want to now use this data for graphing. Is there a way where I could just specify the start time and the end time and the number of buckets / samples I need, and MySQL could in someway oblige :?
My table
+---------------------+-----+
| Tstamp | QID |
+---------------------+-----+
| 2010-12-10 15:05:39 | 20 |
| 2010-12-10 15:06:09 | 26 |
| 2010-12-10 15:06:14 | 27 |
| 2010-12-10 15:06:18 | 28 |
| 2010-12-10 15:06:23 | 40 |
| 2010-12-10 15:10:38 | 20 |
| 2010-12-10 15:11:12 | 26 |
| 2010-12-10 15:11:17 | 27 |
| 2010-12-10 15:11:21 | 28 |
------ SNIP ------
So can I specify I need 20 samples from the last 24 hours.
Thanks!
Harsh
You can convert your DATETIME to a UNIX_TIMESTAMP, and play with division and modulo...
Here is a sample query you can use. Notice it does not work if the number of requested samples in the given time range is more than half of the available records for that range (which would mean the bucket size is one).
-- Configuration
SET #samples = 4;
SET #start = '2011-05-06 19:44:00';
SET #end = '2011-05-06 20:46:50';
--
SET #bucket = (SELECT FLOOR(count(*)/#samples) as bucket_size FROM table1
WHERE Tstamp BETWEEN #start AND #end);
SELECT
SUM(t.QID), FLOOR((t.ID-1)/#bucket) as bucket
FROM (SELECT QID , #r:=#r+1 as ID
FROM table1
JOIN (SELECT #r:=0) r
WHERE Tstamp BETWEEN #start AND #end
ORDER BY Tstamp) as t
GROUP BY bucket
HAVING count(t.QID) = #bucket
ORDER BY bucket;
P.S. I believe there is a more elegant way to do this, but since no one has provided a working query I hope this helps.