SQL selecting average score over range of dates - mysql

I have 3 tables:
doctors (id, name) -> has_many:
patients (id, doctor_id, name) -> has_many:
health_conditions (id, patient_id, note, created_at)
Every day each patient gets added a health condition with a note from 1 to 10 where 10 is a good health (full recovery if you may).
What I want to extract is the following 3 statistics for the last 30 days (month):
- how many patients got better
- how many patients got worst
- how many patients remained the same
These statistics are global so I don't care right now of statistics per doctor which I could extract given the right query.
The trick is that the query needs to extract the current health_condition note and compare with the average of past days (this month without today) so one needs to extract today's note and an average of the other days excluding this one.
I don't think the query needs to define who went up/down/same since I can loop and decide that. Just today vs. rest of the month will be sufficient I guess.
Here's what I have so far which obv. doesn't work because it only returns one result due to the limit applied:
SELECT
p.id,
p.name,
hc.latest,
hcc.average
FROM
pacients p
INNER JOIN (
SELECT
id,
pacient_id,
note as LATEST
FROM
health_conditions
GROUP BY pacient_id, id
ORDER BY created_at DESC
LIMIT 1
) hc ON(hc.pacient_id=p.id)
INNER JOIN (
SELECT
id,
pacient_id,
avg(note) AS average
FROM
health_conditions
GROUP BY pacient_id, id
) hcc ON(hcc.pacient_id=p.id AND hcc.id!=hc.id)
WHERE
date_part('epoch',date_trunc('day', hcc.created_at))
BETWEEN
(date_part('epoch',date_trunc('day', hc.created_at)) - (30 * 86400))
AND
date_part('epoch',date_trunc('day', hc.created_at))
The query has all the logic it needs to distinguish between what is latest and average but that limit kills everything. I need that limit to extract the latest result which is used to compare with past results.

Something like this assuming created_at is of type date
select p.name,
hc.note as current_note,
av.avg_note
from patients p
join health_conditions hc on hc.patient_id = p.id
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc2
where created_at between current_date - 30 and current_date - 1
group by patient_id
) avg on t.patient_id = hc.patient_id
where hc.created_at = current_date;
This is PostgreSQL syntax. I'm not sure if MySQL supports date arithmetics the same way.
Edit:
This should get you the most recent note for each patient, plus the average for the last 30 days:
select p.name,
hc.created_at as last_note_date
hc.note as current_note,
t.avg_note
from patients p
join health_conditions hc
on hc.patient_id = p.id
and hc.created_at = (select max(created_at)
from health_conditions hc2
where hc2.patient_id = hc.patient_id)
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc3
where created_at between current_date - 30 and current_date - 1
group by patient_id
) t on t.patient_id = hc.patient_id

SELECT SUM(delta < 0) AS worsened,
SUM(delta = 0) AS no_change,
SUM(delta > 0) AS improved
FROM (
SELECT patient_id,
SUM(IF(DATE(created_at) = CURDATE(),note,NULL))
- AVG(IF(DATE(created_at) < CURDATE(),note,NULL)) AS delta
FROM health_conditions
WHERE DATE(created_at) BETWEEN CURDATE() - INTERVAL 1 MONTH AND CURDATE()
GROUP BY patient_id
) t

Related

Mysql get number of loss users

I have a Users table (id, name, created_at) and a Transaction table(id, user_id, created_at, amount).
For each month, I would like to know the number of users who did not have any transaction in the 3 months interval before that month.
For example, for April 2022, the query would return number of users who did not have a transaction in January 2022, February 2022 and March 2022. And so on for every month.
Can I do this with a single MySQL query, and without PHP loop?
If I wanted it for April 2022 only, then I guess this would do the trick:
SELECT count(distinct(users.id)) FROM users
INNER JOIN transactions
on users.id = transactions.user_id
WHERE transactions.user_id NOT IN
(SELECT user_id FROM transactions WHERE created_at > "2022-01-01" AND created_at < "2022-04-01" );
How to get it for all months?
In a normal situation, you would have a calendar table that, for examples, stores all starts of months over a wide period of time, like calendar(start_of_month).
From there on, you can cross join the calendar with the users table to generate all possible combinations of months and customers (with respect to the user's creation time). The last step is to check each user/month tuple for transations in the last 3 months.
select c.start_of_month, count(*) as cnt_inactive_users
from calendar c
cross join users u
where not exists (
select 1
from transactions t
where t.user_id = u.id
and t.created_at >= c.start_of_month - interval 3 month
and t.created_at < c.start_of_month
)
where c.start_of_month >= '2021-01-01' and c.start_of_month < '2022-01-01'
group by c.start_of_month
order by c.start_of_month
This gives you one row per month that has at least one "inactive" customers,with the corresponding count.
You control the range of months over which the query applies with the where clause to the query (as an example, the above gives you all year 2021).
SELECT count(*)
FROM users
WHERE NOT EXISTS (
SELECT NULL
FROM transactions
WHERE users.id = transactions.user_id
AND created_at > '2022-01-01' AND created_at < '2022-04-01'
);

Can I combine separate month and year column for this query?

I currently am trying to track the number of messages sent by month as well as the volume's percent change in comparison to one year prior.
Here is my current query:
Select
a.mo,
a.ye,
a.Messages,
((a.Messages - b.Messages) / b.Messages) as "% Change"
from(
select
MONTH(post_date) as mo,
count(*) as "Messages",
YEAR(post_date) as ye
from
pm_messages
WHERE
post_date > "2018-01-01 00:00:00"
group by
year(post_date),
month(post_date)
) a
left join (
select
MONTH(post_date) as mo,
YEAR(post_date) as ye,
count(*) as "Messages"
from
pm_messages
group by
year(post_date),
month(post_date)
) b on a.mo = b.mo
and a.ye -1 = b.ye
This works great, however, it places month and year in separate columns, which has been messing up the graphs I am working with. However, when I try to pull month and year into one columns as I've done in other queries from the same table, i.e. using:
SELECT DATE_FORMAT(`post_date`,'%M %Y')
My query does not work.
Does anyone know how I can combine my current query to still calculate the return from a year prior but have month and date come up as one column, as opposed to (Month | Year | Messages | % Change)
Thanks!!
you can use extract instead of separate year() and month() functions :
EXTRACT(YEAR_MONTH from post_date)
of course you have to group by this instead of year, month . for example :
select
EXTRACT(YEAR_MONTH from post_date) yearmonth,
count(*) as "Messages"
from
pm_messages
group by
EXTRACT(YEAR_MONTH from post_date)
If you have data for every month, you can use lag():
select year(post_date) as ye, month(post_date) as mo,
count(*) as Messages,
lag(count(*)) over (partition by month(post_date) order by year(post_date)) as prev_year
from pm_messages
where post_date >= '2018-01-01'
group by year(post_date), month(post_date)

Returning the next-to-last entry using MySQL

A little info: people check-in but they don't check out. Each check-in creates an auto-incremented entry into the _checkins table with a timestamp, MemberID, etc.
Here's the data the query needs to return:
Member info (name, picture, ID, etc)
The number of check-ins they've had in the last 30 days
The time since they're last check-in must be less than 2 hours for
them to be on the list.
The date of their last check-in NOT COUNTING TODAY (in other words,
the next to last "Created" entry in the _checkins table).
I have it all working except the last part. I feel like LIMIT is going to be part of the solution but I just can't find a way to implement it correctly.
Here's what I've got so far:
SELECT m.ImageURI, m.ID, m.FirstName, m.LastName,
ROUND(time_to_sec(timediff(NOW(), MAX(ci.Created))) / 3600, 1) as
'HoursSinceCheckIn', CheckIns
FROM _checkins ci LEFT JOIN _members m ON ci.MemberID = m.ID
INNER JOIN(SELECT MemberID, COUNT(DISTINCT ID) as 'CheckIns'
FROM _checkins
WHERE(
Created BETWEEN NOW() - INTERVAL 30 DAY AND NOW()
)
GROUP BY MemberID
) lci ON ci.MemberID=lci.MemberID
WHERE(
ci.Created BETWEEN NOW() - INTERVAL 30 DAY AND NOW()
AND TIMESTAMPDIFF(HOUR, ci.Created, NOW()) < 2
AND ci.Reverted = 0
)
GROUP BY m.ID
ORDER BY CheckIns ASC
You can simplify greatly (and make your code safer, as well):
SELECT _Members.ImageURI, _Members.ID, _Members.FirstName, _Members.LastName,
ROUND(TIME_TO_SEC(TIMEDIFF(NOW(), _FilteredCheckins.lastCheckin)) / 3600, 1) AS hoursSinceCheckIn, _FilteredCheckins.checkIns,
(SELECT MAX(_Checkins.created)
FROM _Checkins
WHERE _Checkins.memberId = _Members.ID
AND _Checkins.created < _FilteredCheckins.lastCheckin) AS previousCheckin
FROM _Members
JOIN (SELECT memberId, COUNT(*) AS checkIns, MAX(created) AS lastCheckin
FROM _Checkins
WHERE created >= NOW() - INTERVAL 30 DAY
GROUP BY memberId
HAVING lastCheckin >= NOW() - INTERVAL 2 HOURS) _FilteredCheckins
ON _FilteredCheckins.memberId = _Members.ID
ORDER BY _FilteredCheckins.checkIns ASC
We're counting all checkins in the last 30 days, including the most recent, but that's trivially adjustable.
I'm assuming _Checkins.id is unique (it should be), so COUNT(DISTINCT ID) can be simplified to COUNT(*). If this isn't the case you'll need to put it back.
(Side note: please don't use BETWEEN, especially with date/time types)
(humorous side note: I keep mentally reading this as "chickens"....)

MySQL listing all entries within x days of first entry

I have a table orders with the columns id, user_id, created_on and paid_amount. I'm trying to find the entries for each user_id within the first 7 days of their first order. Here's what I have so far:
SELECT user_id, created_on, paid_amount FROM orders WHERE created_on BETWEEN min(created_on) AND DATE_ADD(MIN(created_on), INTERVAL 7 DAY) GROUP BY user_id
I'm guessing that the problem lies in the face that the BETWEEN-command is assigned to a single value instead of the whole table? How could I fix this?
My ultimate goal is to find out the average amount spent by all users within their first 7 days, but I think I can figure out the rest of the steps myself.
This will give you first 7 day records, for each user_id
SELECT orders.* FROM orders
INNER JOIN (
select user_id, min(created_on) as mindt from orders group by user_id
) t
ON orders.user_id = t.user_id AND orders.created_on <= DATE_ADD(t.mindt, INTERVAL 7 DAY)
ORDER BY user_id, created_on
For average paid_amount for each user, in first 7 day, use this:
SELECT orders.user_id, avg(paid_amount) FROM orders
INNER JOIN (
select user_id, min(created_on) as mindt from orders group by user_id
) t
ON orders.user_id = t.user_id AND orders.created_on <= DATE_ADD(t.mindt, INTERVAL 7 DAY)
group by orders.user_id

MySQL Limit and Order Left Join

I have two tables: Processes and Validations; p and v respectively.
For each process there are many validations.
The aim is to:
Retrieve the latest validation for each process.
Generate a
dynamic date (Due_Date) as to when the next validation is due (being 365 days
after the latest validation date).
Filter the results to any due
dates that fall in the current month.
In short terms; I want to see what processes are due to be validated in the current month.
I'm 99% there with the query code. Having read through some posts on here I'm fairly certain I'm on the right track. My problem is that my query still returns all of the results for each process, instead of the top 1.
FYI: The processes table uses "Process_ID" as a primary key; whereas the Validations Table uses "Validation_Process_ID" as a foreign key.
Code at present :
Select p.Process_ID,
p.Process_Name,
v.Validation_Date,
Date_Add(v.Validation_Date, Interval 365 Day) as Due_Date
From processes_active p
left JOIN processes_validations v
on p.Process_ID = (select v.validation_process_id
from processes_validations
order by validation_date desc
limit 1)
Having Month(Due_Date) = Month(Now()) and Year(Due_Date) = Year(Now())
Any help would be thoroughly appreciated! I'm probably pretty close just can't sort that final section!
Thanks
Your actual query is wrong, the subquery will return the very latest record in your validation table, instead of returning the latest per process id.
You should decompose to get what you need.
1) compute the latest validation for each process in the validation table:
SELECT validation_process_id, MAX(validation_date) AS maxdate
FROM processes_validations
GROUP BY validation_process_id
2) For each process in the process table, get the latest validation, and compute the next validation date (use interval 1 YEAR and not 365 DAY... think leap years)
SELECT p.Process_ID, p.Process_Name, v.maxdate,
Date_Add(v.maxdate, Interval 1 year) as Due_Date
FROM processes_active p
LEFT JOIN
(
SELECT validation_process_id, MAX(validation_date) AS maxdate
FROM processes_validations
GROUP BY validation_process_id
)
ON p.Process_ID = v.validation_process_id
3) Filter to keep only the due_date this month. This can be done with a WHERE on query 2, I just make a nested query for your understanding
SELECT * FROM
(
SELECT p.Process_ID, p.Process_Name, v.maxdate,
Date_Add(v.maxdate, Interval 1 year) as Due_Date
FROM processes_active p
LEFT JOIN
(
SELECT validation_process_id, MAX(validation_date) AS maxdate
FROM processes_validations
GROUP BY validation_process_id
)
ON p.Process_ID = v.validation_process_id
) T
WHERE Month(Due_Date) = Month(Now()) and Year(Due_Date) = Year(Now())