SQL how to count distinct id in changing time ranges - mysql

I want to count the distinct number of fd_id over the time between today and yesterday, between today and 3 days ago, between today and 5 days ago, between today and 7 days ago, between today and 15 days ago, between today and 30 days ago.
My data table looks like the following:
user_id. fd_id. date
1. 123a. 20201010
1. 123a. 20201011
1. 124a. 20201011
...
and the desired result is of the following format:
user_id count_fd_id_1d count_fd_id_3d ... count_fd_id_30d
Specifically, I know I can do the following 6 times and join them together (some column bind method):
select user_id, count(distinct fd_id) as count_fd_id_1d
from table
where date <= today and date >= today-1 (#change this part for different dates)
select user_id, count(distinct fd_id) as count_fd_id_3d
from table
where date <= today and date >= today-3 (#change this part for different dates)
...
I am wondering how I may do this in one shot without running almost identical code for 6 times.

You can use conditional aggregation:
select user_id,
count(distinct case when date >= current_date - 1 day and date < current_date then fd_id end) as cnt_1d,
count(distinct case when date >= current_date - 3 day and date < current_date then fd_id end) as cnt_3d,
...
from mytable
goup by user_id
You can play around with the date expressions to set the ranges you want. The above works on entire days, and does not include the current day.

If the date column in the the table really does look like that (not in date/datetime format), I think you need to use STR_TO_DATE() to convert it to date format then uses DATEDIFF to check the date differences. Consider this example query:
SELECT user_id,
MAX(CASE WHEN ddiff=1 THEN cn END) AS count_fd_id_1d,
MAX(CASE WHEN ddiff=2 THEN cn END) AS count_fd_id_2d,
MAX(CASE WHEN ddiff=3 THEN cn END) AS count_fd_id_3d,
MAX(CASE WHEN ddiff=4 THEN cn END) AS count_fd_id_4d,
MAX(CASE WHEN ddiff=5 THEN cn END) AS count_fd_id_5d
FROM (SELECT user_id,
DATEDIFF(CURDATE(), STR_TO_DATE(DATE,'%Y%m%d')) ddiff,
COUNT(DISTINCT fd_id) cn
FROM mytable
GROUP BY user_id, ddiff) A
GROUP BY user_id;
At the moment, if you check date value simply by using direct subtraction, you'll get incorrect result. For example:
*your current date value - how many days:
'20201220' - 30 = '20201190' <-- this is not correct.
*if you convert the date value and using the same subtraction:
STR_TO_DATE('20201220','%Y%m%d') - 30 = '20201190' <-- still get incorrect.
*convert date value then uses INTERVAL for the date subtraction:
STR_TO_DATE('20201220','%Y%m%d') - INTERVAL 30 DAY = '2020-11-20'
OR
DATE_SUB(STR_TO_DATE('20201220','%Y%m%d'),INTERVAL 30 DAY) = '2020-11-20'
*IF your date column is storing standard date format value, then omit STR_TO_DATE
'2020-12-20' - INTERVAL 30 DAY = '2020-11-20'
OR
DATE_SUB('2020-12-20',INTERVAL 30 DAY) = '2020-11-20'
Check out more date manipulation in MySQL.
For the question, I made a fiddle with a bunch of testing.

Related

In mysql how to get users that have no event X in past 14 days and neither in the future

Consider I have the following table and current date is 2022-09-01:
Requirement: I want to get all users that have no event_name like cbt care in the past 14 days and onwards into the future.
I have this query:
SELECT * FROM test_table
WHERE event_name LIKE "%cbt care%"
AND start_date <= DATE_SUB(NOW(), INTERVAL 14 DAY)
;
Which returns:
The issue is that user_id = x does have a cbt care event in 2022-09-10 which is 9 days ahead of current date (2022-09-01).
How to return only users satisfy requirement posted above?
SELECT user_id,
COUNT(CASE WHEN event_name LIKE '%cbt care%' AND start_date
> CURDATE() - INTERVAL 14 day THEN 1 END) AS count_recent
FROM test_table
GROUP BY user_id
HAVING count_recent = 0;
https://www.db-fiddle.com/f/64j7L1VZsVdLYqmcQ2NrvV/0
The CASE expression returns 1 for each row with the conditions you described (a specific event name and a start date after the date 14 days ago, which includes all of the future dates too). For rows that don't match that condition, the CASE returns NULL. There's an implicit ELSE NULL in any CASE expression.
COUNT(<expr>), like many set functions, ignores NULLs. It will only count the occurrences of non-NULL values. So if the count returns 0, then the CASE returned only NULLs, which means there are no recent or future 'cbt care' events for that user.
select id
,user_id
,event_name
,start_date
from (
select *
,count(case when abs(datediff(curdate(),start_date)) <= 14 and event_name like "%cbt care%" then 1 end) over (partition by user_id) as cw
from t
) t
where cw = 0
id
user_id
event_name
start_date
0
a
cbt care
2022-06-01 20:00:00
Fiddle

mySQLi --> Is it possible to create a query that would calculate the sum between dates

I have a mySQL table that contains at least the following field:
ID | user | date | balance
I would like to know if it is possible to sum the column balance group by user for dates that are between the following cases:
between TODAY and TODAY + 30days
between TODAY+31days and TODAY+60days
above TODAY+61days
Query:
SELECT user, SUM(balance) AS sumBalance
FROM table
WHERE ... CASE << this is where I think I need help...
GROUP BY user
It may not be possible to do all that once is the same query. If not I can execute three separate query and fill an array.
Thanks for your help!
It seems like you want conditional aggregation:
select
user,
sum(case when date between current_date and current_date + interval 30 day then balance end) balance1,
sum(case when date between current_date + interval 31 day and current_date + interval 60 day then balance end) balance2,
sum(case when date >= current_date + interval 61 day then balance end) balance3
from mytable
where date >= current_date
group by user
This gives you one record per user (which, by the way, is a MySQL keyword, hence not a good choice for a column name), with 3 additional columns that contain the sum of balance for the three distinct date ranges.

Getting Max date from the given two dates in mysql

I have a table with below structure and data.
id ean Control_date qty
1 4046228081410 26.05.2017 568
2 4046228081410 05.06.2017 900
My expected result would be like below
2 4046228081410 05.06.2017 1468
To acheive this i am using the below query
SELECT EAN,Control_date,SUM(Qty) AS Qty FROM mytable WHERE
(STR_TO_DATE(`Control_date`,'%d.%m.%Y') <= STR_TO_DATE('03.06.2017','%d.%m.%Y')
OR
STR_TO_DATE(`Control_date`, '%d.%m.%Y') <= DATE_ADD(STR_TO_DATE('03.06.2017', '%d.%m.%Y'), INTERVAL 7 DAY))
AND ean = 4046228081410
Here i need to sum up the qty where control date< today date and control date > today date and should be less than todaydate + 7 days . Here 2nd control date is 05.06.2017 and greater than today date and less than (03.05.2017 +7 days)
But always i am getting where contorl date is less than today date.
1 4046228081410 26.05.2017 1468
But i need data with control date 05.06.2017.
Any help would be greatly appreciated.
You should really fix your date formats. If you stored the value as a date, the query would simply be:
SELECT EAN, MAX(Control_date), SUM(Qty) AS Qty
FROM mytable
WHERE Control_date < CURDATE() + INTERVAL 7 DAYS AND
Control_date >= CURDATE() AND
ean = 4046228081410
GROUP BY ean;
Note: You can use a constant such as '2017-06-03' if you want a constant date. However, your question specifically says the current date.
Just because you have bogus date formats stored in your data doesn't mean you have to use the same format in queries. The expression '2017-06-03' (or DATE('2017-06-03')) is simpler than the more complex STR_TO_DATE() expression.
In your case, bite the bullet and output the date in a correct format, so you can do:
SELECT EAN, MAX(STR_TO_DATE(`Control_date`, '%d.%m.%Y')), SUM(Qty) AS Qty
FROM mytable
WHERE STR_TO_DATE(`Control_date`, '%d.%m.%Y') < CURDATE() + INTERVAL 7 DAYS AND
STR_TO_DATE(`Control_date`, '%d.%m.%Y') >= CURDATE() AND
ean = 4046228081410
GROUP BY ean;
You have a typo in your SQL statement, and if you do not want today's date, then use less and greater
SELECT EAN,Control_date,SUM(Qty) AS Qty FROM mytable WHERE
(STR_TO_DATE(`Control_date`,'%d.%m.%Y') **<** STR_TO_DATE('03.06.2017','%d.%m.%Y')
OR
STR_TO_DATE(`Control_date`, '%d.%m.%Y') **>** DATE_ADD(STR_TO_DATE('03.06.2017', '%d.%m.%Y'), INTERVAL 7 DAY))
AND ean = 4046228081410
A couple of issues in your statement.
Date conditions are redundant
Need a MAX function on Control_date
Need group by n EAN
SELECT
EAN,
max(Control_date) AS control_date,
SUM(Qty) AS Qty
FROM mytable
WHERE
STR_TO_DATE(Control_date, '%d.%m.%Y') <= DATE_ADD(STR_TO_DATE('03.06.2017', '%d.%m.%Y'), INTERVAL 7 DAY)
AND
ean = 4046228081410
GROUP BY EAN

How to group mysql results in weeks

I have a table like this:
I need to sum how many messages were delivered per msisdn in last 8 weeks(but for each week) from date entered. Here is what I came up with:
SELECT count(*) as ukupan_broj, SUM(IF (sent_messages.delivered = 1,1,0 )) as broj_dostavljenih,
count(*) - SUM(IF (sent_messages.delivered = 1,1,0 )) as non_billed,
SUM(IF (sent_messages.delivered = 1,1,0 )) / count(*) as ratio,
`sent_messages`.`msisdn`,
MONTH(`sent_messages`.`datetime`) AS MONTH, WEEK(`sent_messages`.`datetime`) AS WEEK,
DATE_FORMAT(`sent_messages`.`datetime`, '%Y-%m-%d') AS DATE
FROM `sent_messages`
INNER JOIN `received_messages` on `received_messages`.`uniqueid`=`sent_messages`.`originalID`
and `received_messages`.`msisdn`=`sent_messages`.`msisdn`
WHERE `sent_messages`.`datetime` >= '2016-12-12'
AND `sent_messages`.`originalID` = `received_messages`.`uniqueid`
AND `sent_messages`.`datetime` <= '2017-12-30'
AND `sent_messages`.`datetime` >= `received_messages`.`datetime`
AND `sent_messages`.`datetime` <= ( `received_messages`.`datetime` + INTERVAL 2 HOUR )
AND `sent_messages`.`type` = 'PAID'
GROUP BY WEEK
ORDER BY DATE ASC
And because I'm grouping it by WEEK, my result is showing sum of all delivered, undelivered etc. but not per msisdn. Here is how result looks like:
And when I add msisdn in GROUP BY clause I don't get the result the way I need it.
And I need it like this:
Please help me to write optimized query to fetch these results for each msisdn per last 8 weeks, because I'm stuck.
WEEK(...) has a problem near the first of the year. Instead, you could use TO_DAYS:
WHERE datetime > CURDATE() - INTERVAL 8 WEEK -- for the last 8 weeks
GROUP BY MOD(TO_DAYS(datetime), 7) -- group by week
That is quite simple, but there is a bug in it. It only works if today is the last day of a "week". And if date%7 lands on the desired day of week.
WHERE datetime > CURDATE() - INTERVAL 9 WEEK -- for the last 8 weeks
GROUP BY MOD(TO_DAYS(datetime) - 3, 7) -- group by week
Is the first cut at fixing the bugs -- 9-week interval will include the current partial week and the partial week 8 weeks ago. The "- 3" (or whatever number works) will align your "week" to start on Monday or Sunday or whatever.
SUM(IF (sent_messages.delivered = 1,1,0 )) can be shortened to SUM(delivered = 1) or even SUM(delivered) if that column only has 0 or 1 values.

How to count records as "0" that have value "0" when using WHERE x = 1

I have a table structure that looks like this:
I have a perfectly working query that counts how many records there have been per day the last 30 days. It looks likes this:
SELECT DATE(timestamp) AS date, COUNT(id) AS emails FROM 'emails WHERE timestamp >= now() - interval 1 month GROUP BY DATE(timestamp)
This outputs the following which is perfectly fine:
However, the next thing seems too difficult for me to imagine. Now I want to count how many records there have been per day the last 30 days BUT only where newsletter = 1.
I've tried to put a WHERE statement looking like this:
SELECT DATE(timestamp) AS date, COUNT(*) AS emails, nyhedsbrev FROM emails WHERE timestamp >= now() - interval 1 month AND nyhedsbrev = 1 GROUP BY DATE(timestamp)
... And that outputs the following:
The problem is, that its omitting the records with newsletter = 0 and there by I cant compare my first query against the new one, as the dates doesnt match. I know that is because I use WHERE newsletter = 1.
In stead of omitting the record I want a query that just puts a "0" from that date. How can I do this? The final query should be outputting this:
You should be able to simply use SUM() and IF() to get the desired output:
SELECT
DATE(timestamp) AS date,
COUNT(*) AS emails,
SUM(IF(nyhedsbrev > 0, 1, 0)) as nyhedsbrev_count
FROM
emails
WHERE
timestamp >= now() - interval 1 month
GROUP BY
DATE(timestamp)
SQLFiddle DEMO
Edit: You might even be able to simplify it, since it's a boolean, and simply use SUM(nyhedsbrev), but this REQUIRES that nyhedsbrev is only 0 or 1:
SELECT
DATE(timestamp) AS date,
COUNT(*) AS emails,
SUM(nyhedsbrev) as nyhedsbrev_count
FROM
emails
WHERE
timestamp >= now() - interval 1 month
GROUP BY
DATE(timestamp)
Possibly best to get a list of the dates and then left join that against sub queries to get the counts you require.
Something like this
SELECT Sub1.date, Sub2.emails, IFNULL(Sub3.emails, 0)
FROM (SELECT DISTINCT DATE(timestamp) AS date
FROM emails
WHERE timestamp >= now() - interval 1 month) Sub1
LEFT OUTER JOIN (SELECT DATE(timestamp) AS date, COUNT(id) AS emails
FROM emails WHERE timestamp >= now() - interval 1 month
GROUP BY DATE(timestamp)) Sub2
ON Sub2.date = Sub3.date
LEFT OUTER JOIN (SELECT DATE(timestamp) AS date, COUNT(*) AS emails
FROM emails
WHERE timestamp >= now() - interval 1 month AND nyhedsbrev = 1
GROUP BY DATE(timestamp)) Sub3
ON Sub1.date = Sub3.date
(you can probably optimise one subselect of this away, but I have done it in full to make it obvious how it is working)
Assuming newsletter is boolean 1/0 values then this might give you the table that you want:
SELECT DATE(timestamp) AS date, COUNT(*) AS emails, nyhedsbrev
FROM emails WHERE timestamp >= now() - interval 1 month GROUP BY DATE(timestamp),nyhedsbrev ;
Just adding another GROUP BY parameter.