How to group mysql results in weeks - mysql

I have a table like this:
I need to sum how many messages were delivered per msisdn in last 8 weeks(but for each week) from date entered. Here is what I came up with:
SELECT count(*) as ukupan_broj, SUM(IF (sent_messages.delivered = 1,1,0 )) as broj_dostavljenih,
count(*) - SUM(IF (sent_messages.delivered = 1,1,0 )) as non_billed,
SUM(IF (sent_messages.delivered = 1,1,0 )) / count(*) as ratio,
`sent_messages`.`msisdn`,
MONTH(`sent_messages`.`datetime`) AS MONTH, WEEK(`sent_messages`.`datetime`) AS WEEK,
DATE_FORMAT(`sent_messages`.`datetime`, '%Y-%m-%d') AS DATE
FROM `sent_messages`
INNER JOIN `received_messages` on `received_messages`.`uniqueid`=`sent_messages`.`originalID`
and `received_messages`.`msisdn`=`sent_messages`.`msisdn`
WHERE `sent_messages`.`datetime` >= '2016-12-12'
AND `sent_messages`.`originalID` = `received_messages`.`uniqueid`
AND `sent_messages`.`datetime` <= '2017-12-30'
AND `sent_messages`.`datetime` >= `received_messages`.`datetime`
AND `sent_messages`.`datetime` <= ( `received_messages`.`datetime` + INTERVAL 2 HOUR )
AND `sent_messages`.`type` = 'PAID'
GROUP BY WEEK
ORDER BY DATE ASC
And because I'm grouping it by WEEK, my result is showing sum of all delivered, undelivered etc. but not per msisdn. Here is how result looks like:
And when I add msisdn in GROUP BY clause I don't get the result the way I need it.
And I need it like this:
Please help me to write optimized query to fetch these results for each msisdn per last 8 weeks, because I'm stuck.

WEEK(...) has a problem near the first of the year. Instead, you could use TO_DAYS:
WHERE datetime > CURDATE() - INTERVAL 8 WEEK -- for the last 8 weeks
GROUP BY MOD(TO_DAYS(datetime), 7) -- group by week
That is quite simple, but there is a bug in it. It only works if today is the last day of a "week". And if date%7 lands on the desired day of week.
WHERE datetime > CURDATE() - INTERVAL 9 WEEK -- for the last 8 weeks
GROUP BY MOD(TO_DAYS(datetime) - 3, 7) -- group by week
Is the first cut at fixing the bugs -- 9-week interval will include the current partial week and the partial week 8 weeks ago. The "- 3" (or whatever number works) will align your "week" to start on Monday or Sunday or whatever.
SUM(IF (sent_messages.delivered = 1,1,0 )) can be shortened to SUM(delivered = 1) or even SUM(delivered) if that column only has 0 or 1 values.

Related

SQL how to count distinct id in changing time ranges

I want to count the distinct number of fd_id over the time between today and yesterday, between today and 3 days ago, between today and 5 days ago, between today and 7 days ago, between today and 15 days ago, between today and 30 days ago.
My data table looks like the following:
user_id. fd_id. date
1. 123a. 20201010
1. 123a. 20201011
1. 124a. 20201011
...
and the desired result is of the following format:
user_id count_fd_id_1d count_fd_id_3d ... count_fd_id_30d
Specifically, I know I can do the following 6 times and join them together (some column bind method):
select user_id, count(distinct fd_id) as count_fd_id_1d
from table
where date <= today and date >= today-1 (#change this part for different dates)
select user_id, count(distinct fd_id) as count_fd_id_3d
from table
where date <= today and date >= today-3 (#change this part for different dates)
...
I am wondering how I may do this in one shot without running almost identical code for 6 times.
You can use conditional aggregation:
select user_id,
count(distinct case when date >= current_date - 1 day and date < current_date then fd_id end) as cnt_1d,
count(distinct case when date >= current_date - 3 day and date < current_date then fd_id end) as cnt_3d,
...
from mytable
goup by user_id
You can play around with the date expressions to set the ranges you want. The above works on entire days, and does not include the current day.
If the date column in the the table really does look like that (not in date/datetime format), I think you need to use STR_TO_DATE() to convert it to date format then uses DATEDIFF to check the date differences. Consider this example query:
SELECT user_id,
MAX(CASE WHEN ddiff=1 THEN cn END) AS count_fd_id_1d,
MAX(CASE WHEN ddiff=2 THEN cn END) AS count_fd_id_2d,
MAX(CASE WHEN ddiff=3 THEN cn END) AS count_fd_id_3d,
MAX(CASE WHEN ddiff=4 THEN cn END) AS count_fd_id_4d,
MAX(CASE WHEN ddiff=5 THEN cn END) AS count_fd_id_5d
FROM (SELECT user_id,
DATEDIFF(CURDATE(), STR_TO_DATE(DATE,'%Y%m%d')) ddiff,
COUNT(DISTINCT fd_id) cn
FROM mytable
GROUP BY user_id, ddiff) A
GROUP BY user_id;
At the moment, if you check date value simply by using direct subtraction, you'll get incorrect result. For example:
*your current date value - how many days:
'20201220' - 30 = '20201190' <-- this is not correct.
*if you convert the date value and using the same subtraction:
STR_TO_DATE('20201220','%Y%m%d') - 30 = '20201190' <-- still get incorrect.
*convert date value then uses INTERVAL for the date subtraction:
STR_TO_DATE('20201220','%Y%m%d') - INTERVAL 30 DAY = '2020-11-20'
OR
DATE_SUB(STR_TO_DATE('20201220','%Y%m%d'),INTERVAL 30 DAY) = '2020-11-20'
*IF your date column is storing standard date format value, then omit STR_TO_DATE
'2020-12-20' - INTERVAL 30 DAY = '2020-11-20'
OR
DATE_SUB('2020-12-20',INTERVAL 30 DAY) = '2020-11-20'
Check out more date manipulation in MySQL.
For the question, I made a fiddle with a bunch of testing.

Aggregating table data in MySQL, is there an easier way to do this?

I'm trying to write a query that aggregates data from a table.
Essentially I have a long list of devices that have been inventoried and eventually installed over the last couple of years.
I want to find the average amount of time between when the device was received and when it was installed, and then have that data sorted by the month the device was installed. BUT in each month's row, I also want to include the data from the previous months.
So essentially what I want to see is: (sorry for terrible formatting)
MonthInstalled | TimeToInstall | Total#Devices
-----------------+---------------+----------------------------
Jan | 10 Days | 5
Feb(=Jan+Feb) | 15 Days | 18 (5 in Jan + 13 in Feb)
Mar(=Jan+Feb+Mar)| 13 Days | 25 (5 + 13 + 7)
...
The query I currently have written looks like this:
INSERT INTO DevicesInstall
SELECT ROUND(AVG(DATEDIFF(dvc.dt_install , dvc.dt_receive)), 1) AS 'Install',
COUNT(dvc.dvc_model) AS 'Total Devices',
MAX(dvc.dt_install) AS 'Date',
loc.loc_campus AS 'Campus'
FROM dvc_info dvc, location loc
WHERE dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < '20160201'
;
Although this is functional, I have to iterate this for each month manually, so it is not scale-able. Is there a way to condense this at all?
We can return the dates using an inline view (derived table), and then join to the dvc_info table, so we can get the "cumulative" results.
To get the results for:
Jan
Jan+Feb
Jan+Feb+Mar
We need to return three copies of the rows for Jan, and two copies of the rows for Feb, and then collapse the those rows into an appropriate group.
The loc_campus is being included in the SELECT list... not clear why that is needed. If we want results "by campus", then we need to include that expression in the GROUP BY clause. Otherwise, the value returned for that non-aggregate is indeterminate... we will get a value for some row "in the group", but it could be any row.
Something like this:
SELECT d.dt AS `before_date`
, loc.loc_campus AS `Campus`
, ROUND(AVG(DATEDIFF(dvc.dt_install,dvc.dt_receive)),1) AS `Install`
, COUNT(dvc.dvc_model) AS `Total Devices`
, MAX(dvc.dt_install) AS `latest_dt_install`
FROM ( SELECT '2016-01-01' + INTERVAL 1 MONTH AS dt
UNION ALL SELECT '2016-01-01' + INTERVAL 2 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 3 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 4 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 5 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 6 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 7 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 8 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 9 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 10 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 11 MONTH
UNION ALL SELECT '2016-01-01' + INTERVAL 12 MONTH
) d
CROSS
JOIN location loc
LEFT
JOIN dvc_info dvc
ON dvc.dvc_loc_bin = loc.loc_bin
AND dvc.dt_install < d.dt
GROUP
BY d.dt
, loc.loc_campus
ORDER
BY d.dt
, loc.loc_campus
Note that the value returned for d.dt will be the "up until" date. We're going to get '2016-02-01' returned for the January results. If we want to return a value of January date, we can use an expression in the SELECT list...
SELECT DATE_FORMAT(d.dt + INTERVAL -1 MONTH,'%Y-%m') AS `month`
Lots of options on query alternatives.
But it looks like the "big hump" is that to get cumulative results, we need to return multiple copies of the dvc_info rows, so the rows can be collapsed into each "grouping".
I recommend working on just the SELECT first. And get that tested working, before monkeying around to turn it into an INSERT ... SELECT.
FOLLOWUP
We can use any query as an inline view (derived table d) that returns a set of dates we want.
e.g.
FROM ( SELECT DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH AS dt
FROM dvc_install m
WHERE m.install_dt >= '2016-01-01'
GROUP BY DATE_FORMAT(m.install_dt,'%Y-%m-01') + INTERVAL 1 MONTH
) d
Note that with this approach, if there are no install_dt in February, we won't get back a row for February. Using the static UNION ALL SELECT approach allows us to get back "zero" counts, i.e. to return rows for months where there isn't an install_dt in that month. (But that's the answer to a different question... how do I get back a "zero" count for February when there aren't any rows for Februrary?)
Alternatively, if we have a calendar table e.g. cal that contains a list of the dates we want, we could just reference the table in place of the inline view, or the inline view query could get rows from that.
FROM ( SELECT cal.dt
FROM cal cal
WHERE cal.dt >= '2016-01-01'
AND cal.dt <= NOW()
AND DATE_FORMAT(cal.dt,'%d') = '01'
) d

How to decrease SQL Query from last 30 days until now?

I am trying to get the amount of data for the last 30 days.
SELECT ( Now() - interval 1 month ),
Count(flightid) AS count
FROM flight
WHERE flightstatus = 0
AND flightvisibility = 1
AND flightvaliddate > Now()
AND flightvaliddate >= ( Now() - interval 1 month )
Right now this is working ok and it's giving me only 1 row that corresponds to the same day of last month.
What I would like is to get the remaining data from each day until now. How can I do this?
I am using MySQL.
The condition in the WHERE clause is wrong.
And since you want day wise data of last thirty days till now then you must have to use GROUP BY.
SELECT
DATE(flightvalidate) AS flightValidateDate,
Count(flightid) AS count
FROM
flight
WHERE
flightstatus = 0
AND flightvisibility = 1
AND DATE(flightvaliddate) >= CURDATE() - INTERVAL 1 MONTH
GROUP BY flightValidateDate
ORDER BY flightvalidate

MySql Select record before x days every month

I have a table tbl_subscriptions and columns like this "id, user_name, join_date(date)", I want to select the users before 7 days every month based on join_date so that I can send them notifications to continue their subscription for the next month. I have records like this
1, user1, 2014-05-02
2, user2, 2014-05-04
3, user3, 2014-06-12
4, user4, 2014-06-20
4, user5, 2014-07-24
If today is 2014-07-28, then I want to get records 1 and 2. I tried below query
SELECT *,
datediff( date_format(date, '2014-07-%d'), now() ) as daysLeft
FROM tbl_subscriptions
HAVING daysLeft >= 0
AND daysLeft < 7
the problem with above sql is that it is selecting the record of the current month only, plz suggest any better query.
Does this do what you want?
SELECT s.*, datediff(date, curdate()) as daysLeft
FROM tbl_subscriptions s
WHERE date >= curdate() and date < curdate() + interval 7 day;
EDIT:
I see. These are recurrent subscriptions and you want to find the next ones. The following logic should work:
select s.*,
(case when day(date) >= day('2014-07-28')
then day(date) - day('2014-07-28')
else day(date) + day(last_day('2014-07-28')) - day('2014-07-28')
end) as diff
from tbl_subscriptions s
having diff <= 7;
Here is the SQL Fiddle.
Ok, first of all I dont know what is subscription renewal period. And idea of only checking date (and not the whole period) doesnt make sense to me.
But this will get you your required output.
SELECT *,
day(date) days,
day(last_day('2014-07-28')) as lastday,
day('2014-07-28') today, day(last_day('2014-07-28'))-day('2014-07-28') as diff
FROM tbl_subscriptions
having days <= (7-diff) or (days > today and days <= today+7)
And here goes the demo (schema thanks to one of the deleted answer) ->
http://sqlfiddle.com/#!2/3cc4f

MySQL group by week

I have a large number of records with a transaction datetime field going back several years. I would like to do a comparative analysis between the same timespan this year and last. How can I group by week over a 3 month range?
I'm running into problems using the YEARWEEK and WEEK functions because of the day the year 2012 starts of versus the day 2011 starts on.
Given that I have records with datetimes everyday from Jan 1st to the current day, and records with the same datetimes from the prior year, how can I group by week so the output is sums with dates like: 01/01/2011, 01/08/2011, 01/15/2011, etc., and 01/01/2012, 01/08/2012, 01/15/2012, etc.?
My query so far is as follows:
SELECT
DATE_FORMAT(A.transaction_date, '%Y-%m-%d') as date,
ROUND(sum(A.quantity), 3) AS quantity,
ROUND(sum(A.total_amount), 3) AS amount,
A.product_code,
D.fuel_type_code,
D.fuel_type_name,
C.customer_code,
C.customer_name
FROM
cl_transactions AS A
INNER JOIN
card AS B ON A.card_number=B.card_number
INNER JOIN
customer AS C ON B.customer_code=C.customer_code
INNER JOIN
fuel_type AS D ON A.fuel_type=D.fuel_type_code
WHERE
((A.transaction_date >= DATE_FORMAT(NOW() - INTERVAL 3 MONTH, '%Y-%m-01')) OR (A.transaction_date - INTERVAL 1 YEAR >= DATE_FORMAT(NOW() - INTERVAL 15 MONTH, '%Y-%m-01') AND A.transaction_date <= NOW() - INTERVAL 1 YEAR))
GROUP BY
A.transaction_date, fuel_type_code;
I would essentially like something that achieves the following pseudo-query:
GROUP BY
STARTING FROM THE OLDEST DATE (A.transaction_date + INTERVAL 6 DAY)
I started with an inner query using sqlvariables to build out from/to ranges for this year and last year of each respective start of year/month/day (ex: 2012-01-01 and 2011-01-01 respectively). From that, I'm also pre-formatting the date for final output so you have ONE master date basis for display reflecting that of whatever the "this year" week would be.
From that, I do a join to the transaction table where the transaction date is BETWEEN the respective start of current week and start of next week. Since date/time stamps include hour minute, 2012-01-01 by itself is implied as 12:00:00am (midnight) of the day. and between will go UP TO 7 days later 12:00:00 am. And that date will become the start date of the following week.
So, by joining on the date being between EITHER last yr or this yr time period, its the same group qualification. So the field selection does a ROUND( SUM( IF() )) per respective last year or this year. if the incoming transaction date is LESS than the current year's week start, then it must be a record from prior year, otherwise its for the current year. So, respectively, add the value itself, or zero as it applies.
So now, you have the group by. The week that it qualified for was already prepared from the inner query via "ThisYearWeekOf" formatted column, regardless of the otherwise computed "YEARWEEK()" or "WEEK()". The date ranges took care of that qualification for us.
Finally, I added the fuel-type as a join and included that as the group by. You have to group by all non-aggregate columns for proper SQL, although MySQL lets you get by by just grabbing the first entry for the given group if it is NOT so specified in group by.
To close, I DID include the information for the customer as you didn't have it in the group by and did not appear to be applicable... it would just arbitrarily grab one. However, I've added it to the group by, so now your records will show at the per customer level, per product and fuel type, how much sales and quantity between this year and last.
SELECT
JustWeekRange.ThisYearWeekOf,
CTrans.product_code,
FT.fuel_type_code,
FT.fuel_type_name,
C.customer_code,
C.customer_name,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, CTrans.Quantity, 0 )), 3) as LastYrQty,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, CTrans.total_amount, 0 )), 3) as LastYrAmt,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, 0, CTrans.Quantity )), 3) as ThisYrQty,
ROUND( SUM( IF( CTrans.transaction_date < JustWeekRange.ThisYrWeekStart, 0, CTrans.total_amount )), 3) as ThisYrAmt,
FROM
( SELECT
DATE_FORMAT(#ThisYearDate, '%Y-%m-%d') as ThisYearWeekOf,
#LastYearDate as LastYrWeekStart,
#ThisYearDate as ThisYrWeekStart,
#LastYearDate := date_add( #LastYearDate, interval 7 day ) LastYrStartOfNextWeek,
#ThisYearDate := date_add( #ThisYearDate, interval 7 day ) ThisYrStartOfNextWeek
FROM
(select #ThisYearDate := '2012-01-01',
#LastYearDate := '2011-01-01' ) sqlvars,
cl_transactions justForLimit
HAVING
ThisYrWeekStart < '2012-04-01'
LIMIT 15 ) JustWeekRange
JOIN cl_transactions AS CTrans
ON CTrans.transaction_date BETWEEN
JustWeekRange.LastYrWeekStart AND JustWeekRange.LastYrStartOfNextWeek
OR CTrans.transaction_date BETWEEN
JustWeekRange.ThisYrWeekStart AND JustWeekRange.ThisYrStartOfNextWeek
JOIN fuel_type FT
ON CTrans.fuel_type = FT.fuel_type_code
JOIN card
ON CTrans.card_number = card.card_number
JOIN customer AS C
ON card.customer_code = C.customer_code
GROUP BY
JustWeekRange.ThisYearWeekOf,
CTrans.product_code,
FT.fuel_type_code,
FT.fuel_type_name,
C.customer_code,
C.customer_name