I'm working on a ranged date query, and trying to adjust the rules for the loop but I have a bit of a problem:
Take the following:
DROP PROCEDURE
IF EXISTS test;
CREATE PROCEDURE test ( IN start_date DATE ) BEGIN
DECLARE group_name VARCHAR ( 10 ) DEFAULT 'clientA';
DECLARE service_name VARCHAR ( 10 ) DEFAULT 'serviceA';
WHILE ( start_date < CURDATE( ) && SUBDATE( start_date, INTERVAL - 2 WEEK ) < CURDATE( ) ) DO
SELECT start_date AS 'Start Day', SUBDATE( start_date, INTERVAL - 2 WEEK ) AS 'End Day';
SET start_date = SUBDATE( start_date, INTERVAL - 2 WEEK );
END WHILE;
END;
This selects a start and end date from a starting point up to today:
CALL test ( '2019-08-29' );
Returns 5 results:
08/29 & 09/12
09/12 & 09/26
09/26 & 10/10
10/10 & 10/24
10/24 & 11/7
This is what I want but rather than 5 results. I want each of these as rows in one result. I think the best way to do this is via a sub-query with the inner query running the loop and doing the selects but the outer query serving as a wrapper to constrain them into one set.
I have the following code:
DROP PROCEDURE
IF EXISTS test;
CREATE PROCEDURE test ( IN start_date DATE ) BEGIN
DECLARE group_name VARCHAR ( 10 ) DEFAULT 'clientA';
DECLARE service_name VARCHAR ( 10 ) DEFAULT 'serviceA';
SELECT * FROM (WHILE ( start_date < CURDATE( ) && SUBDATE( start_date, INTERVAL - 2 WEEK ) < CURDATE( ) ) DO
SELECT start_date AS 'Start Day', SUBDATE( start_date, INTERVAL - 2 WEEK ) AS 'End Day';
SET start_date = SUBDATE( start_date, INTERVAL - 2 WEEK );
END WHILE;
)
END;
But that gives me:
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AS 'Result' FROM (
WHILE ( start_date < CURDATE( ) && SUBDATE( start_date, IN' at line 4
I feel like there's something small wrong with my syntax here but I'm having difficulty understanding exactly what. Any guidance would be great!
you can create a dynamic list of date activity by using MySQL # variables and joining to any table that has as many rows as you expect in result set... ex: if you need 5, 10 or 1000 records in the dynamic result.
select
-- whatever latest date is BECOMES the Begin Date
#beginDT BeginDate,
-- now, add 2 weeks to the #beginDT variable and save as the END Date
#beginDT := date_add( #beginDT, interval 2 week ) EndDate
from
-- pick any table that has as many 2-week cycles as you expect.
-- ex: if you wanted 1 yr, you would need any table with 26 or 27 records
AnyTableWithManyRecords,
-- start the query with your starting date, alias sqlvars is just place-holder
-- and will only prepare the variable and be one row for rest of query
( select #beginDT := '2019-08-29' ) sqlvars
having
-- having will stop until your maximum date of interest
BeginDate < curdate()
-- but limit to 100 so you don't query against table of millions of records.
-- how many records do you REALLY need to go through... again, 26 biweekly = 1 year
-- this limit of 100 would allow for almost 4 yrs worth
limit 100;
Then, if you wanted data from some other table, you could join to the above result set as its own such as
select
SOT.WhateverColumns
from
( above query ) MyDates
JOIN SomeOtherTable SOT
on MyDates.BeginDate <= SOT.SomeDate
AND SOT.SomeDate < MyDates.EndDate
Related
I have a table with temperatures.
Sometimes the message was not received and the information is missing.
I need to fill the missing rows with NULL for every hour.
CREATE TABLE temp_total(
id int(6) NOT NULL PRIMARY KEY,
stamp timestamp NOT NULL,
room_temp decimal(3,1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE temp_total
ADD UNIQUE KEY stamp(stamp),
MODIFY id int(6) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
INSERT INTO temp_total(stamp, room_temp) VALUES
('2019-07-21 19:00:00', '23.4'),
('2019-07-21 22:00:00', '22.7'),
('2019-07-23 02:00:00', '22.5'),
('2019-07-23 06:00:00', '22.4');
The expected result is an array of 36 rows.
I found this query to work fine.
SELECT stamp INTO #deb FROM temp_total ORDER BY stamp ASC LIMIT 1;
SELECT stamp INTO #fin FROM temp_total ORDER BY stamp DESC LIMIT 1;
WITH RECURSIVE all_hours(dt) AS (
SELECT #deb dt
UNION ALL
SELECT dt + INTERVAL 1 HOUR FROM all_hours
WHERE dt + INTERVAL 1 HOUR < #fin + INTERVAL 1 HOUR
)
-- INSERT IGNORE INTO temp_total(stamp, room_temp)
SELECT d.dt stamp, t.room_temp
FROM all_hours d
LEFT JOIN temp_total t ON t.stamp = d.dt
ORDER BY d.dt;
I want to use the result of SELECT with INSERT but I get this message:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'INSERT INTO temp_total(stamp, room_temp)
SELECT d.dt stamp, t.room_temp
...' at line 7
When I uncomment the line.
DbFiddle
You are almost there. With a small change in the syntax the query works as expected:
INSERT IGNORE INTO temp_total(stamp, room_temp)
WITH RECURSIVE all_hours(dt) AS (
SELECT #deb dt
UNION ALL
SELECT dt + INTERVAL 1 HOUR FROM all_hours
WHERE dt + INTERVAL 1 HOUR < #fin + INTERVAL 1 HOUR
)
SELECT d.dt stamp, t.room_temp
FROM all_hours d
LEFT JOIN temp_total t ON t.stamp = d.dt
ORDER BY d.dt;
See running example at db<>fiddle.
I am writing queries for some KPIs (Key Performance Indicators) to track user engagement. One such KPI is "Churn Rate", which I am calculating for a given month by:
Churn rate = (Total users deleted in month)/(Total users on the 1st of month)
I am using a users table with the following columns:
created_at, deleted_at
My process is to get all relevant months of user activity (in this case, based on "created_at" column, since we are getting several new users per month. We also have an activity log table which might technically be more accurate to use but doesn't go back as far) and then loop over them in a stored procedure. For each month, I'm calculating who was deleted that month and who was active on the first of that month (created on or before the 1st of the month and either not deleted or deleted after the first of that month). Then I'm dividing them to find churn rate and inserting into a temporary table. Here is my stored procedure:
DROP PROCEDURE ChurnRate;
DELIMITER $$
CREATE PROCEDURE ChurnRate()
BEGIN
DECLARE start_date DATETIME;
DECLARE end_date DATETIME;
DECLARE cur_date DATETIME;
DECLARE current_month VARCHAR(255);
DECLARE end_month VARCHAR(255);
DECLARE deleted_count BIGINT;
DECLARE active_user_count BIGINT;
DECLARE churn_rate FLOAT;
SELECT created_at FROM users ORDER BY created_at ASC LIMIT 1 INTO start_date;
SELECT created_at FROM users ORDER BY created_at DESC LIMIT 1 INTO end_date;
SET cur_date = start_date;
SET current_month = SUBSTR(cur_date,1,7);
SET end_month = SUBSTR(end_date,1,7);
DROP TEMPORARY TABLE IF EXISTS churn_table;
CREATE TEMPORARY TABLE churn_table
(
user_month VARCHAR(255),
deleted_count BIGINT,
active_user_count BIGINT,
churn_rate FLOAT
);
loop_label: LOOP
SELECT COUNT(U.id) FROM users AS U WHERE SUBSTR(U.deleted_at,1,7) = current_month INTO deleted_count;
SELECT COUNT(U.id) FROM users AS U
WHERE (U.deleted_at >= DATE_ADD(DATE_ADD(LAST_DAY(cur_date),INTERVAL 1 DAY),INTERVAL -1 MONTH) OR U.deleted_at IS NULL)
AND SUBSTR(U.created_at,1,7) <= current_month
INTO active_user_count;
INSERT INTO churn_table (user_month, deleted_count, active_user_count, churn_rate) VALUES (current_month, deleted_count, active_user_count, (deleted_count/active_user_count));
SET cur_date = DATE_ADD(cur_date, INTERVAL 1 MONTH);
SET current_month = SUBSTR(cur_date,1,7);
IF current_month <= end_month THEN
ITERATE loop_label;
END IF;
LEAVE loop_label;
END LOOP;
SELECT * FROM churn_table;
END$$
DELIMITER ;
CALL ChurnRate();
Here is a sample of some data that was produced:
user_month
churn_rate_percentage
2019-12
0
2020-01
0.0396982
2020-02
0
2020-03
0
2020-04
0
2020-05
0.112116
2020-06
0.59691
2020-07
0.26689
2020-08
0.144374
2020-09
0.141767
2020-10
0.125
2020-11
0.272904
2020-12
0.14937
My problem is this: I am using an API that requires this to be a select query. I have previously tried writing select queries for this, but they have been flawed. Grouping by "deleted_at" will not work because we will not show months for which no users have been deleted. Grouping by "created_at" and using subqueries ends up being extremely slow, as we have about 50k users. Is there a clean, efficient way to write this as a select query without affecting performance?
If there is not, I will have to write a chron to run this procedure and export the data.
Thank you
You shouldn't use loops in SQL that is often an indication you are doing something wrong.
Here is how to do this in a single query:
-- recursive CTE to create list of months of interest
with RECURSIVE base_months(d,y,m) AS
(
SELECT DateSerial(Year(min(create_at)), Month(min(create_at)), "1"),
min(create_at) , year(min(create_at)) , month(min(create_at))
FROM users
UNION ALL
SELECT data_add(d INTERVAL 1 MONTH) , year(data_add(d INTERVAL 1 MONTH)) , month(data_add(d INTERVAL 1 MONTH))
FROM base_months
WHERE YEAR(d) <= YEAR(CURDATE()) && MONTH(d) <= MONTH(CURDATE())
)
select
b.y as year,
b.m as month,
count(u.created_at) as total_user
sum(case when month(u.deleted_at) = b.m and year(u.deleted_at = b.y) then 1 else 0 end) as left_this_month
from base_months b
-- for each month join to the users table
join user u on u.created_at < b.d and (u.deleted_at > b.d or u.deleted_at is null)
group by b.y, b.m
If this isn't clear, first we use a recursive CTE to get all the months and years of interest -- you could do a non-recursive query on the table with a group by if only want to include create date months that are in the table -- but I think that would give you interesting results since months that don't have anyone created in that month would not be included.
Then I join that back to the users table with filters on the join to only include the rows we want to count for the given year and month. We use group by and aggregation functions to find the results.
Looping is likely to be terribly slow.
Is this how you decide if a user exists on Nov 1, 2020?
WHERE created_at < '2020-11'
AND deleted_at > '2020-11'
Hence, a COUNT(*) with that test would give that count?
For deletions for that month:
WHERE LEFT(deleted_at, 7) = '2020-11'
Putting those together into a single query or all months:
SELECT LEFT(created_at, 7) AS yyyymm,
( SELECT COUNT(*)
FROM users
WHERE created_at < yyyymm
AND deleted_at > yyyymm
) AS new_users,
( SELECT COUNT(*)
FROM users
WHERE deleted_at >= yyyymm
AND deleted_at < CONCAT(yyyymm, '-01')
) AS deleted_users
FROM users
GROUP BY yyyymm
ORDER BY yyyymm
That gives you 3 columns; check it out. To get the churn:
SELECT LEFT(created_at, 7) AS yyyymm,
( SELECT ... ) / ( SELECT ... ) AS churn
FROM users
GROUP BY yyyymm
ORDER BY yyyymm
for now I was able to collect_set() everyone that is active with no problem:
with aux as(
select date
,collect_set(user_id) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as user_id
,feature
--
from (
select data
,feature
,collect_set(user_id)
--
from table
--
group by date, feature
)
)
--
select date
,distinct_array(flatten(user_id))
,feature
--
from aux
The problem is, now I have to keep only users that are older than last 90 days
I tried this and didn't work:
select date
,collect_set(case when user_created_at < date - interval 90 day
then user_id end) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as teste
,feature
from table
The reason it didn't work is because the filter inside collect_select() filters only users from one day instead filtering all the users from the last 90 days,
Making the result with more results than expected.
How can I get it correctly?
As reference, I'm using this query to verify if is correct:
select
count(distinct user_id) as total
,count(distinct case when user_created_at < date('2020-04-30') - interval 90 day then user_id end)
,count(distinct case when user_created_at >= date('2020-04-30') - interval 90 day then user_id end)
--
from table
--
where 1=1
and date >= date('2020-04-30') - interval 90 day
and date <= '2020-04-30'
and feature = 'a_feature'
pretty ugly workaround but:
select data
,feature
,collect_set(cus.client_id) as client
from (
select data
,explode(array_distinct(flatten(client))) as client
,feature
from(
select data
,collect_set(client_id) over(
partition by feature
order by cast(timestamp(data) as float)
range between (-90*60*60*24) following and 0 preceding
) as cliente
,feature
from (
select data
,feature
,collect_set(client_id) as cliente
from da_pandora.ds_transaction dtr
--
group by data, feature
)
)
)as dtr
left join costumer as cus
on cus.client_id = dtr.client and date(client_created_at) < data - interval 90 day
group by data, feature
This keeps inserting already existing fields although it shouldn't.
BEGIN
INSERT INTO ohrm_attendance_raw_data (punch_time, device_id, card_number)
SELECT punch_time, device_id, card_number
FROM ohrm_attendance_master
WHERE ohrm_attendance_master.punch_time >= DATE_SUB(now(), INTERVAL 1 MONTH)
AND NOT EXISTS (
SELECT 1 FROM ohrm_attendance_record WHERE ohrm_attendance_record.punch_in_user_time = ohrm_attendance_master.punch_time)
AND NOT EXISTS (
SELECT 1 FROM ohrm_attendance_record WHERE ohrm_attendance_record.punch_out_user_time = punch_time);
end
It looks like your field punch_time is datetime type or something like that... so I think your problem is that you are comparing two dates... and what's the problem with that?, that MySQL and other RDBMS compare that including hour, minutes, seconds and miliseconds... so it can make that the comparison be false... You can trunc the date or give it some format:
With DATE function:
BEGIN
INSERT INTO ohrm_attendance_raw_data (punch_time, device_id, card_number)
SELECT punch_time, device_id, card_number
FROM ohrm_attendance_master
WHERE ohrm_attendance_master.punch_time >= DATE_SUB(now(), INTERVAL 1 MONTH)
AND NOT EXISTS (
SELECT 1 FROM ohrm_attendance_record WHERE DATE(ohrm_attendance_record.punch_in_user_time) = DATE(ohrm_attendance_master.punch_time))
AND NOT EXISTS (
SELECT 1 FROM ohrm_attendance_record WHERE DATE(ohrm_attendance_record.punch_out_user_time) = DATE(punch_time));
END
With DATE_FORMAT function:
BEGIN
INSERT INTO ohrm_attendance_raw_data (punch_time, device_id, card_number)
SELECT punch_time, device_id, card_number
FROM ohrm_attendance_master
WHERE ohrm_attendance_master.punch_time >= DATE_SUB(now(), INTERVAL 1 MONTH)
AND NOT EXISTS (
SELECT 1 FROM ohrm_attendance_record WHERE DATE_FORMAT(ohrm_attendance_record.punch_in_user_time, '%d-%b-%Y') = DATE_FORMAT(ohrm_attendance_master.punch_time, '%d-%b-%Y'))
AND NOT EXISTS (
SELECT 1 FROM ohrm_attendance_record WHERE DATE_FORMAT(ohrm_attendance_record.punch_out_user_time, '%d-%b-%Y') = DATE_FORMAT(punch_time,'%d-%b-%Y'));
END
I'm trying to count how many result in each month.
This is my query :
SELECT
COUNT(*) as nb,
CONCAT(MONTH(t.date),0x3a,YEAR(t.date)) as period
FROM table1 t
WHERE t.criteria = 'value'
GROUP BY MONTH(t.date)
ORDER BY YEAR(t.date)
My Result:
nb period
---------------
7 6:2009
46 8:2009
2 10:2009
1 11:2009
14 1:2009
9 9:2010
161 7:2010
5 2:2010
88 3:2010
28 4:2010
4 5:2011
2 12:2011
The problem is, I'm sure that I've result between 5:2011 & 12:2011 , and each other period
since 2009 ... :/
This is a problem of my request or mysql configuration ?
Thx a lot
You have to group by both the year and the month. Otherwise your April 2012 rows are grouped with April 2011 (and April 2010 ...) rows as well.
SELECT
COUNT(*) AS nb,
CONCAT(MONTH(t.date), ':', YEAR(t.date)) AS period
FROM table1 AS t
WHERE t.criteria = 'value'
GROUP BY YEAR(t.date)
, MONTH(t.date) ;
(and is there a reason you used 0x3a and not ':'?)
You could also use some other DATE and TIME functions of MySQL so there are fewer functions calls per row and probably a more efficient query:
SELECT
COUNT(*) AS nb,
DATE_FORMAT(t.date, '%m:%Y') AS period
FROM table1 AS t
WHERE t.criteria = 'value'
GROUP BY EXTRACT( YEAR_MONTH FROM t.date) ;
For several queries, it's useful to have a permanent Calendar table in your database (with all dates or all year-months) or even several Calendar tables. Example:
CREATE TABLE CalendarYear
( Year SMALLINT UNSIGNED NOT NULL
, PRIMARY KEY (Year)
) ENGINE = InnoDB ;
INSERT INTO CalendarYear
(Year)
VALUES
(1900), (1901), ..., (2099) ;
CREATE TABLE CalendarMonth
( Month TINYINT UNSIGNED NOT NULL
, PRIMARY KEY (Month)
) ENGINE = InnoDB ;
INSERT INTO CalendarMonth
(Month)
VALUES
(1), (2), ..., (12) ;
Those can also help us make the one we'll need here:
CREATE TABLE CalendarYearMonth
( Year SMALLINT UNSIGNED NOT NULL
, Month TINYINT UNSIGNED NOT NULL
, FirstDay DATE NOT NULL
, NextMonth_FirstDay DATE NOT NULL
, PRIMARY KEY (Year, Month)
) ENGINE = InnoDB ;
INSERT INTO CalendarYearMonth
(Year, Month, FirstDay, NextMonth_FirstDay)
SELECT
y.Year
, m.Month
, MAKEDATE(y.Year, 1) + INTERVAL (m.Month-1) MONTH
, MAKEDATE(y.Year, 1) + INTERVAL (m.Month) MONTH
FROM
CalendarYear AS y
CROSS JOIN
CalendarMonth AS m ;
Then you can use the Calendar tables to write more complex queries, like the variation you want (with missing months) and probably more efficiently. Tested in SQL-Fiddle:
SELECT
COUNT(t.date) AS nb,
CONCAT(cal.Month, ':', cal.Year) AS period
FROM
CalendarYearMonth AS cal
JOIN
( SELECT
YEAR(MIN(date)) AS min_year
, MONTH(MIN(date)) AS min_month
, YEAR(MAX(date)) AS max_year
, MONTH(MAX(date)) AS max_month
FROM table1
WHERE criteria = 'value'
) AS mm
ON (cal.Year, cal.Month) >= (mm.min_year, mm.min_month)
AND (cal.Year, cal.Month) <= (mm.max_year, mm.max_month)
LEFT JOIN
table1 AS t
ON t.criteria = 'value'
AND t.date >= cal.FirstDay
AND t.date < cal.NextMonth_FirstDay
GROUP BY
cal.Year, cal.Month ;
You must also GROUP BY the year:
GROUP BY MONTH(t.date), YEAR(t.date)
Your original query uses YEAR(t.date) in the SELECT clause outside of any aggregate function without grouping by it -- as a result, you get exactly 12 groups (one for each possible month) and for each group (that possibly contains dates across many years) a "random" year is chosen by MySql for selection. Strictly speaking, this is meaningless and the query should never have been allowed to execute. But MySql... sigh.