If I have this dataset below:
Timestamp Clicks
1:40:11 5
2:40:13 10
3:42:56 20
4:42:23 30
7:45:59 23
9:45:34 24
10:47:23 24
12:47:12 24
So from the data above the minutes range go from 40-47 but skips 41, 43, 44, and 46 in that range.
I want to find the average number of clicks per minute in that range (40-47) and put a zero value for the minutes that are not within the range (41, 43, 44, and 46).
So the result should be like this:
Minute Clicks
40 8
41 0
42 25
43 0
44 0
45 24
46 0
47 24
Any ideas on how to achieve something like this?
You only need 60 series, so you can create a table with 60 rows which contains the 60 existing minutes:
[table serie]
minute
0
1
2
3
4
5
…
Then use left join to create simple query like this:
select a.minute, IF(avg(b.Clicks),avg(b.Clicks),0) as avg_click from serie a
left join my_dataset b on a.`minute`*1 = SUBSTRING(b.Timestamp,-5,2)*1
group by minute
SUBSTRING(b.Timestamp,-5,2) will give you the minute from the end (to avoid wrong substring from the beginning if the HOUR has only 1 char).
We need to force comparison to INT by using *1 to CAST.
I would start with something like this
declare #StartTime DateTime = (select MIN(Timestamp) from tablename)
declare #EndTime DateTime = (select MAX(Timestamp) from tablename)
declare #CurrentMinute DateTime = #StartTime
declare #ResultTable (Minute int, Clicks int)
While #CurrentMinute <= #EndTime
begin
insert into #ResultTable (Minute,Clicks)
select DatePart(Minute,#CurrentMinute) as Minute, (select isnull( Clicks from tablename where DatePart(Minute,Timestamp) = DatePart(Minute,#CurrentMinute),0 )
end
select * from #ResultTable
this works by locating the lowest highest times and initializes the variable currentTime to the start time and continues in the while loop until the ending time it then insert into a temp row for every minute, if the results do not have a minute that matches it returns a null in the sub query and the is null insert a 0 for the clicks for that row as it had no row found
Related
I come from MS SQL Server and I'm relatively new to MySQL / MariaDB 10 (at least in a deeper way than just "SELECT * FROM [Table]"). I now searched for several hours in Google and StackOverflow, but I haven't found a soluton to my problem yet. If it's relevant in any way: I use MySQL Workbench for writing my code.
The Background
I have a new data logging project for saving and displaying data from several temperature and humidity sensors within the house. I save it in following table:
ID
Time
Device
Temperature
Humidity
1
2022-01-09 13:34:00
1
20.1
52.3
2
2022-01-09 13:35:00
1
20.0
52.3
3
2022-01-09 13:36:00
1
20.1
52.4
4
2022-01-09 13:37:00
1
20.1
52.5
5
2022-01-09 13:38:00
1
20.0
52.5
6
2022-01-09 13:39:00
1
20.1
52.6
I query the needed data for a chart using a stored procedure. Especially the on 0.1°C rounded temperature values have the disadvantage that they naturally often change between a value of 0.1 when the temperature is pretty stable. So I thought of a moving average to smooth the values over the last 10 minutes which works perfectly with an average window function.
Here a simplified version of my procedure:
CREATE PROCEDURE `stpGetSensorData`(sensorId INT, startDate VARCHAR(8))
BEGIN
DECLARE FromDate DATE;
SET FromDate = STR_TO_DATE(startDate, '%Y%m%d');
SELECT
L.ID,
L.Time,
L.Device,
AVG(L.Temperature) OVER (ORDER BY L.Time ROWS BETWEEN 10 PRECEDING AND 0 FOLLOWING) AS Temperature,
AVG(L.Humidity) OVER (ORDER BY L.Time ROWS BETWEEN 10 PRECEDING AND 0 FOLLOWING) AS Humidity
FROM
LoggedData AS L
WHERE
Device = sensorId
AND Time < DATE_ADD(FromDate, INTERVAL 1 DAY)
AND Time >= FromDate
ORDER BY Time DESC;
END
The challenge
Now I thought I let the end user decide about the size of the window, i.e. an average over the last 5, 10, 30, 60, ... minutes. But when I try to insert a parameter in the window function, it leads to the error: "averageRows is not valid at this position".
Here the code:
CREATE PROCEDURE `stpGetSensorData`(sensorId INT, startDate VARCHAR(8), averageRows INT)
BEGIN
DECLARE FromDate DATE;
SET FromDate = STR_TO_DATE(startDate, '%Y%m%d');
SELECT
L.ID,
L.Time,
L.Device,
AVG(L.Temperature) OVER (ORDER BY L.Time ROWS BETWEEN averageRows PRECEDING AND 0 FOLLOWING) AS Temperature,
AVG(L.Humidity) OVER (ORDER BY L.Time ROWS BETWEEN averageRows PRECEDING AND 0 FOLLOWING) AS Humidity
FROM
LoggedData AS L
WHERE
Device = sensorId
AND Time < DATE_ADD(FromDate, INTERVAL 1 DAY)
AND Time >= FromDate
ORDER BY Time DESC;
END
I guess it's possible to solve this using Dynamic SQL, but I try to avoid Dynamic SQL whereever possible and thought there must be a 'normal' solution as well and I'm just too blind to see it.
Any smart ideas?
I've written a stored procedure to iterate over every week for three years. It doesn't work though and returns a vague error message.
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 18
DELIMITER $$
CREATE PROCEDURE loop_three_years()
BEGIN
declare y INT default 2016;
declare m int default 4;
declare d int default 20;
WHILE y <= 2019 DO
WHILE YEARWEEK(concat(y, '-', m, '-', d)) <= 53 DO
WHILE m < 12 DO
WHILE (m = 2 and d <= 29) OR (d <=30 and m in(4, 6,9,11)) OR ( m in(1,3,5,7,8,10,12) AND d <= 31) DO
set d = d + 7;
SELECT YEARWEEK(concat(y, '-', m, '-', d));
END WHILE;
set d=1;
END WHILE;
set m = 1;
SET y = y + 1;
END WHILE;
END
$$
When I used this as minimal parts they work so I'm not sure what the issue is with my reassembly. Also not sure if there's a better way to do this. (The select is just for testing, it will be an insert when I use the real code.
Slightly Altered from a previous solution
You can build your own dynamic calendar / list using ANY other table in your system that has at least as many records as you need to fake row numbers. The query below will use MySQL # variables which work like an inline program and declaration. I can start the list with a given date... such as your 2016-04-20 and then each iteration through, add 1 week using date-based functions. No need for me to know or care about how many days have a 28, 29(leap-year), 30 or 31 days.
The table reference below of "AnyTableThatHasAtLeast156Records" is just that.. Any table in your database that has at least 156 records (52 weeks per year, 3 years)
select
YEARWEEK( #startDate ) WeekNum,
#startDate as StartOfWeek,
#startDate := date_add( #startDate, interval 1 week ) EndOfWeek
from
( select #startDate := '2016-04-20') sqlv,
AnyTableThatHasAtLeast156Records
limit
156
This will give you a list of 156 records (provided your "anyTable…" has 156 records all at once. If you need to join this to some other transaction table, you could do so by making the above a JOIN table. Benefit here, Since I included the begin date and end of week, those can be part of your joining to table.
Example, on
record WeekNum StartOfWeek EndOfWeek
1 ?? 2016-04-20 2016-04-27
2 ?? 2016-04-27 2016-05-04
3 ?? 2016-05-04 2016-05-11
4 ?? 2016-04-11 2016-05-18... etc
By adding 1 week to the starting point, you can see that it would do Ex: Monday to Monday. And the JOIN Condition below I have LESS THAN the EndOfWeek. This would account for any transactions UP TO but not including the ending date... such as transactions on 2016-04-26 11:59:59PM (hence LESS than 2016-04-27, as 04/27 is the beginning of the next week's cycle of transactions)
select
Cal.WeekNum,
YT.YourColumns
from
YourTransactionTable YT
JOIN ( aboveCalendarQuery ) Cal
on YT.TransactionDate >= Cal.StartOfWeek
AND YT.TransactionDate < Cal.EndOfWeek
where
whatever else
You could even do sum() with group by such as by WeekNum if that is what you intend.
Hopefully this is a much more accurate and efficient way to build out your calendar to run with and linking to transactions if you so needed to.
Response from comment.
You could by doing a join to a ( select 1 union select 2 union … select 156 ), but your choice. The ONLY reason for the "AnyTable…" is I am sure with any reasonable database with transactions you would have 156 records or more easily. It's sole purpose is to just allow a row for cycling through the iterations to dynamically create the rows.
Also much more sound than the looping mechanism you have run into to begin with. Nothing wrong with that, especially learning purposes, but if more efficient ways, doesn't that make more sense?
Per feedback from comment
I dont exactly know your other table you are trying to insert into, but yes, you can use this for all 3000 things. Provide more of what you are trying to do and I can adjust... In the mean-time, something like this...
insert into YourOtherTable
( someField,
AnotherField,
WeekNum
)
select
x.someField,
x.AnotherField,
z.WeekNum
from
Your3000ThingTable x
JOIN (select
YEARWEEK( #startDate ) WeekNum,
#startDate as StartOfWeek,
#startDate := date_add( #startDate, interval 1 week ) EndOfWeek
from
( select #startDate := '2016-04-20') sqlv,
AnyTableThatHasAtLeast156Records
limit
156 ) z
on 1=1
where
x.SomeCodition...
By joining the the select of 156 records on 1=1 (which is always true), it will return 156 entries for whatever record is in the Your3000ThingTable. So, if you have an inventory item table with
Item Name
1 Thing1
2 Thing2
3 Thing3
Your final insert would be
Item Name WeekNum
1 Thing1 1
1 Thing1 2
1 Thing1 ...
1 Thing1 156
2 Thing2 1
2 Thing2 2
2 Thing2 ...
2 Thing2 156
3 Thing3 1
3 Thing3 2
3 Thing3 ...
3 Thing3 156
And to pre-confirm what you THINK would happen, just try the select/join on 1=1 and you'll see all the records the query WOULD be inserting into your destination table.
In my mysql database, I insert to a table 100 rows every minute. The column named 'time' is of type DATETIME contains the date & hour of the insertion (excluding seconds).
I'm looking for an efficient way to fetch rows from that table - in a specified timeframe.
For example, if my timeframe is 15 minutes, then the following rows would be fetched:
3-11-18 13:00:00
3-11-18 13:15:00
3-11-18 13:30:00
3-11-18 13:45:00
3-11-18 14:00:00
etc...
For timeframe of one hour it will fetch
3-11-18 13:00:00
3-11-18 14:00:00
Currently, I'm using the LIKE operator to do that. For example, 15 minutes timeframe query looks like this:
SELECT * FROM my_table WHERE time LIKE '%:15:00' OR time LIKE '%:30:00' OR time LIKE '%:45:00' OR time LIKE '%:00:00';
But these queries run very very slow.
What can I do to improve performance?
You can try to add a generated column for minutes and set an index on it:
alter table my_table add column minute tinyint unsigned as (minute(time));
alter table my_table add index (minute);
A query like
select * from my_table where minute(time) = 0;
would use that index.
But I'm not sure if it helps with a query like
select * from my_table where minute(time) in (0, 15, 30, 45);
The reason is that the selectivity of the WHERE condition is not very good here. A full table scan skipping 14 of 15 rows can be faster than an index search + a second round trip to the clustered index. In that case you can't do anything. Except of creating a covering index. But an index that will cover SELECT * will be probably harmfull for 100 inserts per minute.
Use MOD to filter by the timeframe requirement.
For timeframe of 15 minutes:
SELECT *
from my_table
where mod(minute(time),15) = 0
For 30 minutes timeframe:
mod(minute(time),30) = 0
For one hour (60-minutes) timeframe:
mod(minute(time),60) = 0
Or
minute(time) = 0
Generalized WHERE clause for 15, 30 or 60 minutes timeframe:
mod( minute(time), <timeframeInMinutes>) = 0
I use the following query to calculate the age of people from their dob and then group the ages in ten year intervals to make a frequency chart.
I'd prefer the user to be able to choose the bin size instead of always having to use 10 years, eg maybe group the ages in 5 year intervals, 20 year intervals or any arbitrary range.
How should I re-write the query so that the bin size (currently 10 years) can be passed as a parameter or maybe picked up from another table that is pre-populated with the bins just before running the query?
Obviously I won't be able to use a hard coded CASE in the same way, if at all. Can it be done?
SELECT
CASE
WHEN age = -1 THEN 0 -- null dob
WHEN age >= 0 AND age < 11 THEN 1
WHEN age >= 11 AND age < 21 THEN 2
WHEN age >= 21 AND age < 31 THEN 3
WHEN age >= 31 AND age < 41 THEN 4
WHEN age >= 41 AND age < 51 THEN 5
WHEN age >= 51 AND age < 61 THEN 6
WHEN age >= 61 AND age < 71 THEN 7
WHEN age >= 71 AND age < 81 THEN 8
WHEN age >= 81 AND age < 91 THEN 9
WHEN age >= 91 AND age < 101 THEN 10
WHEN age > 100 THEN 11
END AS Age_Group,
COUNT(age) AS Number_In_Group
FROM
-- this sub query calculates the age from the dob
-- returning -1 if dob is null
(SELECT
IFNULL(
DATE_FORMAT(NOW(), '%Y')
- DATE_FORMAT(dob, '%Y')
- (DATE_FORMAT(NOW(), '00-%m-%d') < DATE_FORMAT(dob, '00-%m-%d'))
, - 1
) AS age
FROM
people
) AS table_age
GROUP BY Age_Group
This produces the following typical output
Age_Group Number_In_Group
0 55
2 1
3 37
4 47
5 51
6 112
7 139
8 70
9 30
10 6
I think I may have solved this myself so am posting my answer in case it helps anyone else. (or if somebody has a better way!)
The trick seems to be to use a stored procedure (so that the bin increment can be passed in as a parameter).
The procedure first makes a temp table for the bins and then populates it in a while loop using the desired bin_size parameter to set the minimum and maximum ages for each bin.
Finally it does a normal grouped count, grouping on the bin label by joining the table with date of births in it (from which it calculates ages) to the temp table with the frequency bins in it. A left join is used to ensure that all bin labels get returned, even if the count is zero.
Seems to run surprisingly fast as well, eg 0.03s using 1000 dates of birth and a bin frequency of 15 years.
usage:
eg to get a frequency table in steps of 15 years call this procedure using
CALL age_frequency_count(15);
This is the code
DELIMITER $$
CREATE PROCEDURE age_frequency_count(IN bin_size INT)
BEGIN
DECLARE bin_min_age INT; -- minimum age for bin
DECLARE bin_max_age INT; -- maximum age for bin
DECLARE bin_label VARCHAR(8); -- label for bin
-- #########################################
-- make a temporary table for the bins if it doesn't exist
CREATE TEMPORARY TABLE IF NOT EXISTS tbl_bins
( minage INT, maxage INT, agegroup VARCHAR(30) ) ;
-- #########################################
-- empty it, in case it did already exist
DELETE FROM tbl_bins;
-- #########################################
-- loop round, populating temp table using bin_size up to around 100 yrs old
SET bin_min_age = 0;
WHILE (bin_min_age + bin_size) < 100 DO
SET bin_max_age = bin_min_age + bin_size ;
SET bin_label = CONCAT(bin_min_age ,' to ' , bin_max_age);
INSERT INTO tbl_bins VALUES (bin_min_age, bin_max_age, bin_label);
SET bin_min_age = bin_max_age;
END WHILE;
-- now insert bin for any age above around 100 yrs old (up to 200 yrs old)
INSERT INTO tbl_bins VALUES (bin_min_age, 200, CONCAT('over ', bin_max_age));
-- and a bin for any ages of -1 that were generated due to a null dob
INSERT INTO tbl_bins VALUES (-1, -1, 'unknown');
-- #########################################
-- finally select the age counts grouped into bins by joining the 'people'
-- table containing the dob to the bins table we just made and populated
SELECT
agegroup,
COUNT(age) AS NumberInGroup
FROM
tbl_bins
LEFT JOIN -- LEFT so that we will still get zero counts if necessary
(SELECT -- next few lines calculate the age from the dob in the people table
IFNULL(
DATE_FORMAT(NOW(), '%Y')
- DATE_FORMAT(member_dob, '%Y')
- (DATE_FORMAT(NOW(), '00-%m-%d') < DATE_FORMAT(member_dob, '00-%m-%d') )
, - 1 -- if dob is null set the age to -1
) AS age
FROM people
) AS tbl_ages
ON
(tbl_ages.age > tbl_bins.minage AND tbl_ages.age <= tbl_bins.maxage) -- normal bin
OR (tbl_ages.age = tbl_bins.maxage) -- to account for age of -1
GROUP BY agegroup;
-- #########################################
END$$
DELIMITER ;
I want to get the number of Registrations back from a time period (say a week), which isn't that hard to do, but I was wondering if it is in anyway possible to in MySQL to return a zero for days that have no registrations.
An example:
DATA:
ID_Profile datCreate
1 2009-02-25 16:45:58
2 2009-02-25 16:45:58
3 2009-02-25 16:45:58
4 2009-02-26 10:23:39
5 2009-02-27 15:07:56
6 2009-03-05 11:57:30
SQL:
SELECT
DAY(datCreate) as RegistrationDate,
COUNT(ID_Profile) as NumberOfRegistrations
FROM tbl_profile
WHERE DATE(datCreate) > DATE_SUB(CURDATE(),INTERVAL 9 DAY)
GROUP BY RegistrationDate
ORDER BY datCreate ASC;
In this case the result would be:
RegistrationDate NumberOfRegistrations
25 3
26 1
27 1
5 1
Obviously I'm missing a couple of days in between. Currently I'm solving this in my php code, but I was wondering if MySQL has any way to automatically return 0 for the missing days/rows. This would be the desired result:
RegistrationDate NumberOfRegistrations
25 3
26 1
27 1
28 0
1 0
2 0
3 0
4 0
5 1
This way we can use MySQL to solve any problems concerning the number of days in a month instead of relying on php code to calculate for each month how many days there are, since MySQL has this functionality build in.
Thanks in advance
No, but one workaround would be to create a single-column table with a date primary key, preloaded with dates for each day. You'd have dates from your earliest starting point right through to some far off future.
Now, you can LEFT JOIN your statistical data against it - then you'll get nulls for those days with no data. If you really want a zero rather than null, use IFNULL(colname, 0)
Thanks to Paul Dixon I found the solution. Anyone interested in how I solved this read on:
First create a stored procedure I found somewhere to populate a table with all dates from this year.
CREATE Table calendar(dt date not null);
CREATE PROCEDURE sp_calendar(IN start_date DATE, IN end_date DATE, OUT result_text TEXT)
BEGIN
SET #begin = 'INSERT INTO calendar(dt) VALUES ';
SET #date = start_date;
SET #max = SUBDATE(end_date, INTERVAL 1 DAY);
SET #temp = '';
REPEAT
SET #temp = concat(#temp, '(''', #date, '''), ');
SET #date = ADDDATE(#date, INTERVAL 1 DAY);
UNTIL #date > #max
END REPEAT;
SET #temp = concat(#temp, '(''', #date, ''')');
SET result_text = concat(#begin, #temp);
END
call sp_calendar('2009-01-01', '2010-01-01', #z);
select #z;
Then change the query to add the left join:
SELECT
DAY(dt) as RegistrationDate,
COUNT(ID_Profile) as NumberOfRegistrations
FROM calendar
LEFT JOIN
tbl_profile ON calendar.dt = tbl_profile.datCreate
WHERE dt BETWEEN DATE_SUB(CURDATE(),INTERVAL 6 DAY) AND CURDATE()
GROUP BY RegistrationDate
ORDER BY dt ASC
And we're done.
Thanks all for the quick replies and solution.