Suppose I have a Client with many Payments. How do I query to get all clients that have no payment records in the last 90 days?
clients
=======
id integer
name string
payments
========
id integer
client_id integer
created_at datetime
Essentially the inverse of:
select *
from clients
inner join payments on payments.client_id = clients.id
where payments.created_at > utc_timestamp() - interval 90 day
Hopefully more efficient than:
select *
from clients
where id not in (
select *
from clients
inner join payments on payments.client_id = clients.id
where payments.created_at > utc_timestamp() - interval 90 day
)
Ensure there's an index on payments(client_id), or even better, payments(client_id, created_at).
For alternative way to write your query, you could try a not exists, like:
select *
from clients c
where not exists
(
select *
from payments p
where p.payments.client_id = clients.id
and payments.created_at > utc_timestamp() - interval 90 day
)
Or an exclusive left join:
select *
from clients c
left join
payments p
on p.payments.client_id = clients.id
and payments.created_at > utc_timestamp() - interval 90 day
where p.client_id is null
If both are slow, add the explain extended output to your question, so we can see why.
select *
from clients
left join payments on
payments.client_id = clients.id and
payments.created_at > utc_timestamp() - interval 90 day
where payments.client_id is null
Related
How to use parsed date in where clause with 2 tables f.e
SELECT *
FROM companies
INNER JOIN acquisitions ON companies.id = acquisitions.company_id
WHERE companies.created_at >= acquisitions.delivery_date
The companies.created_at is a date column and acquisitions.delivery_date is a dateTime one.
If I do this one record is skipped
companies.created_at = '2021-04-16'
acquisitions.delivery_date = '2021-04-16 10:00:00'
We see that delivery_date is not greater that created_at BUT both are on the same day. So how can I parse to date and then compare, I've tried with date(acquisitions.delivery_date) and cast(acquisitions.delivery_date as DATE) and didn't work
https://www.db-fiddle.com/f/jB1PyysytiusorEJoHuwx8/0
SELECT *
FROM companies
INNER JOIN acquisitions
ON companies.id= acquisitions.company_id
WHERE companies.created_at >= date(acquisitions.delivery_date);
SELECT *
FROM companies
INNER JOIN acquisitions
ON companies.id= acquisitions.company_id
WHERE companies.created_at + INTERVAL 1 DAY > acquisitions.delivery_date
I'd prefer this variant because the amount of rows in companies must be less than one in acquisitions. So the query will be slightly faster than with WHERE companies.created_at >= DATE(acquisitions.delivery_date).
Is it possible to mulitply some values in the same column but not others if the value meets a certain condition? I don't want to create another column.
Query I am working with:
SELECT
name ,
ROUND(SUM(orderline_sales.amount * orderline_sales.price) * orders_sales.discount * customers.annual_discount) AS total_revenue
FROM
orderline_sales
JOIN
orders_sales ON orders_sales.id = orderline_sales.orders_sales_id
JOIN
employee ON orders_sales.empoyee_id = employee.id
JOIN
customers ON orders_sales.customer_id = customers.id
WHERE
date BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 365 DAY) AND CURRENT_DATE
GROUP BY employee.name
ORDER BY totale_omzet DESC
LIMIT 1;
The orders_sales table contains a date attributetype and the orders_sales table has a 1:n cardinal relationship with orderline_sales. I only want to multiply the SUM result with customers.annual_discount if the YEAR of the order is higher than 2017. How would I go about doing this?
you can use CASE.
SELECT
CASE WHEN YEAR > 2017 THEN
ROUND(SUM(orderline_sales.amount * orderline_sales.price) *
orders_sales.discount *
customers.annual_discount)
ELSE
(orderline_sales.price * orders_sales.discount * customers.annual_discount)
END AS total_revenue FROM orderline_sales
JOIN
orders_sales ON orders_sales.id = orderline_sales.orders_sales_id
JOIN
employee ON orders_sales.empoyee_id = employee.id
JOIN
customers ON orders_sales.customer_id = customers.id
WHERE
date BETWEEN DATE_SUB(CURRENT_DATE, INTERVAL 365 DAY) AND CURRENT_DATE
GROUP BY employee.name
ORDER BY totale_omzet DESC
Could you help me to calculate percent of users, which made payments?
I've got two tables:
activity
user_id login_time
201 01.01.2017
202 01.01.2017
255 04.01.2017
255 05.01.2017
256 05.01.2017
260 15.03.2017
2
payments
user_id payment_date
200 01.01.2017
202 01.01.2017
255 05.01.2017
I try to use this query, but it calculates wrong percent:
SELECT activity.login_time, (select COUNT(distinct payments.user_id)
from payments where payments.payment_time between '2017-01-01' and
'2017-01-05') / COUNT(distinct activity.user_id) * 100
AS percent
FROM payments INNER JOIN activity ON
activity.user_id = payments.user_id and activity.login_time between
'2017-01-01' and '2017-01-05'
GROUP BY activity.login_time;
I need a result
01.01.2017 100 %
02.01.2017 0%
03.01.2017 0%
04.01.2017 0%
05.01.2017 - 50%
If you want the ratio of users who have made payments to those with activity, just summarize each table individually:
select p.cnt / a.cnt
from (select count(distinct user_id) as cnt from activity a) a cross join
(select count(distinct user_id) as cnt from payment) p;
EDIT:
You need a table with all dates in the range. That is the biggest problem.
Then I would recommend:
SELECT d.dte,
( ( SELECT COUNT(DISTINCT p.user_id)
FROM payments p
WHERE p.payment_date >= d.dte and p.payment_date < d.dte + INTERVAL 1 DAY
) /
NULLIF( (SELECT COUNT(DISTINCT a.user_id)
FROM activity a
WHERE a.login_time >= d.dte and p.login_time < d.dte + INTERVAL 1 DAY
), 0
) as ratio
FROM (SELECT date('2017-01-01') dte UNION ALL
SELECT date('2017-01-02') dte UNION ALL
SELECT date('2017-01-03') dte UNION ALL
SELECT date('2017-01-04') dte UNION ALL
SELECT date('2017-01-05') dte
) d;
Notes:
This returns NULL on days where there is no activity. That makes more sense to me than 0.
This uses logic on the dates that works for both dates and date/time values.
The logic for dates can make use of an index, which can be important for this type of query.
I don't recommend using LEFT JOINs. That will multiply the data which can make the query expensive.
First you need a table with all days in the range. Since the range is small you can build an ad hoc derived table using UNION ALL. Then left join the payments and activities. Group by the day and calculate the percentage using the count()s.
SELECT x.day,
concat(CASE count(DISTINCT a.user_id)
WHEN 0 THEN
1
ELSE
count(DISTINCT p.user_id)
/
count(DISTINCT a.user_id)
END
*
100,
'%')
FROM (SELECT cast('2017-01-01' AS date) day
UNION ALL
SELECT cast('2017-01-02' AS date) day
UNION ALL
SELECT cast('2017-01-03' AS date) day
UNION ALL
SELECT cast('2017-01-04' AS date) day
UNION ALL
SELECT cast('2017-01-05' AS date) day) x
LEFT JOIN payments p
ON p.payment_date = x.day
LEFT JOIN activity a
ON a.login_time = x.day
GROUP BY x.day;
I have 3 tables:
doctors (id, name) -> has_many:
patients (id, doctor_id, name) -> has_many:
health_conditions (id, patient_id, note, created_at)
Every day each patient gets added a health condition with a note from 1 to 10 where 10 is a good health (full recovery if you may).
What I want to extract is the following 3 statistics for the last 30 days (month):
- how many patients got better
- how many patients got worst
- how many patients remained the same
These statistics are global so I don't care right now of statistics per doctor which I could extract given the right query.
The trick is that the query needs to extract the current health_condition note and compare with the average of past days (this month without today) so one needs to extract today's note and an average of the other days excluding this one.
I don't think the query needs to define who went up/down/same since I can loop and decide that. Just today vs. rest of the month will be sufficient I guess.
Here's what I have so far which obv. doesn't work because it only returns one result due to the limit applied:
SELECT
p.id,
p.name,
hc.latest,
hcc.average
FROM
pacients p
INNER JOIN (
SELECT
id,
pacient_id,
note as LATEST
FROM
health_conditions
GROUP BY pacient_id, id
ORDER BY created_at DESC
LIMIT 1
) hc ON(hc.pacient_id=p.id)
INNER JOIN (
SELECT
id,
pacient_id,
avg(note) AS average
FROM
health_conditions
GROUP BY pacient_id, id
) hcc ON(hcc.pacient_id=p.id AND hcc.id!=hc.id)
WHERE
date_part('epoch',date_trunc('day', hcc.created_at))
BETWEEN
(date_part('epoch',date_trunc('day', hc.created_at)) - (30 * 86400))
AND
date_part('epoch',date_trunc('day', hc.created_at))
The query has all the logic it needs to distinguish between what is latest and average but that limit kills everything. I need that limit to extract the latest result which is used to compare with past results.
Something like this assuming created_at is of type date
select p.name,
hc.note as current_note,
av.avg_note
from patients p
join health_conditions hc on hc.patient_id = p.id
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc2
where created_at between current_date - 30 and current_date - 1
group by patient_id
) avg on t.patient_id = hc.patient_id
where hc.created_at = current_date;
This is PostgreSQL syntax. I'm not sure if MySQL supports date arithmetics the same way.
Edit:
This should get you the most recent note for each patient, plus the average for the last 30 days:
select p.name,
hc.created_at as last_note_date
hc.note as current_note,
t.avg_note
from patients p
join health_conditions hc
on hc.patient_id = p.id
and hc.created_at = (select max(created_at)
from health_conditions hc2
where hc2.patient_id = hc.patient_id)
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc3
where created_at between current_date - 30 and current_date - 1
group by patient_id
) t on t.patient_id = hc.patient_id
SELECT SUM(delta < 0) AS worsened,
SUM(delta = 0) AS no_change,
SUM(delta > 0) AS improved
FROM (
SELECT patient_id,
SUM(IF(DATE(created_at) = CURDATE(),note,NULL))
- AVG(IF(DATE(created_at) < CURDATE(),note,NULL)) AS delta
FROM health_conditions
WHERE DATE(created_at) BETWEEN CURDATE() - INTERVAL 1 MONTH AND CURDATE()
GROUP BY patient_id
) t
I currently have the following query;
SELECT a.schedID,
a.start AS eventDate, b.div_id AS divisionID, b.div_name AS divisionName
FROM schedules a
INNER JOIN divisions b ON b.div_id = a.div_id
WHERE date_format(a.start, '%Y-%m-%d') >= '2010-01-01'
AND DATE_ADD(a.start, INTERVAL 5 DAY) <= CURDATE()
AND NOT EXISTS (SELECT results_id FROM results e WHERE e.schedID = a.schedID)
ORDER BY eventDate ASC;
Im trying to basically find any schedules that do not have any results 5 days after the schedule date. My current query has major performance issues. It also times out inconsistently. Is there a different way to write the query? Im at a mental roadblock. Any help is appreciated.
Without antcipating much on the outcome I would suggest the following leads :
* try to remove the date_format as this generates one function call per record. I don't know the format of your column a.start but this should be possible.
* same for DATE_ADD, you could probably put it on the other member like :
a.start <= DATE_SUB(CURDATE(), INTERVAL 5 DAYS)
you get a chance the result is cached rather than being calculated for each line, you could even define it as a parameter upfront
* the NOT EXISTS is very expensive, it seems to mee you could replace this by a left join like :
schedules a LEFT JOIN results e ON a.schedId = e.schedId WHERE e.schedId is NULL
double-check that all join fields are well indexed.
Good luck
Maybe something like:
SELECT
a.schedID, a.start AS eventDate, b.div_id AS divisionID, b.div_name AS divisionName
FROM
schedules a
INNER JOIN divisions b ON b.div_id = a.div_id
WHERE
date_format(a.start, '%Y-%m-%d') >= '2010-01-01'
AND NOT EXISTS (
SELECT
*
FROM
results e
INNER JOIN schedules a2 ON e.schedID = a2.schedID
WHERE
DATE_ADD(a2.start, INTERVAL 5 DAY) <= CURDATE()
AND a2.id = a.id
)
ORDER BY eventDate ASC;
dont know if mysql is same as oracle but are you converting a date to a string here and then comparing it with a string '2010-01-01' ? Can you convvert 2010-01-01 to a date instead so that if there is an index on a.start, it can be used ?
Also does this query definitely return the right answer ?
You mention you want schedules without results 5 days after the schedule date but it looks like you are aksing for anything in the last 5 days ?
a.start >= 1-Jan-10 and start date + 5 days is before today
try this query
SELECT a.schedID,
a.start AS eventDate,
b.div_id AS divisionID,
b.div_name AS divisionName
FROM (SELECT * FROM schedules s WHERE DATE(s.start) >= '2010-01-01' AND DATE_ADD(s.start, INTERVAL 5 DAY) <= CURDATE()) a
INNER JOIN divisions b
ON b.div_id = a.div_id
LEFT JOIN (SELECT results_id FROM results) e
ON e.schedID = a.schedID
WHERE e.results_id = ''
ORDER BY eventDate ASC;