Optimise query with Group By and Count - mysql

Would like advice to optimise the query below, as it currently takes about 20 seconds to process on first run, depending on the amount of users being selected and the date range.
The purpose is to return a count of each code, per user.
There are 27 codes and each user has an am and pm attendance record.
SELECT user,
code,
COUNT(code) AS total
FROM attendances AS attendances
WHERE attendances.user IN ('abc123', 'abc456', 'abc789')
AND (
attendances.date >= '2019-10-06' AND attendances.date <= '2019-10-11'
)
GROUP BY user, code
Table Definiton
Indexes
Each of these fields has an index
location
source_id
date
user_recorded
code
user
EXPLAIN Result

For this query:
SELECT user, code,
COUNT(code) AS total
FROM attendances a
WHERE a.user IN ('abc123', 'abc456', 'abc789') AND
a.date >= '2019-10-06' AND a.date <= '2019-10-11'
GROUP BY user, code;
You have a problem. An index on both user and date will not be used in an index (in MySQL).
One method is union all with an index on (user, date, code):
(SELECT user, code, COUNT(*) AS total
FROM attendances a
WHERE a.user = 'abc123'
a.date >= '2019-10-06' AND a.date <= '2019-10-11'
GROUP BY user, code
) UNION ALL
(SELECT user, code, COUNT(*) AS total
FROM attendances a
WHERE a.user = 'abc456'
a.date >= '2019-10-06' AND a.date <= '2019-10-11'
GROUP BY user, code
) UNION ALL
(SELECT user, code, COUNT(*) AS total
FROM attendances a
WHERE a.user = 'abc789'
a.date >= '2019-10-06' AND a.date <= '2019-10-11'
GROUP BY user, code
);
This can make direct use of the index and should be much faster.

Related

Getting the last and oldest price of a grouped row

I have this query and I want to select the currentprice(the most current price sorted by time) and the oldprice(the last row sorted by time) in the same columns per row. I figured out how to select the currentprice but how can I select both in the same query?
In the end I want to make a calculation for the percentage of gain or drop with ROUND((latestprice - oldprice) / oldprice * 100, 2) as gain_ratio
WITH tmp AS (
SELECT TrackID, ID, price, MAX(Time) as maxtime, MIN(Time) as mintime
FROM track
WHERE Time > NOW() - INTERVAL 1 HOUR
GROUP BY ID
)
SELECT T.TrackID, T.ID, tmp.Price as currentprice, T.Time
FROM track AS T
JOIN tmp ON T.ID = tmp.ID
WHERE T.Time = tmp.maxtime;
I'm really struggeling to grasp how to make a CTE query, I have read the documentation several times
Have you tried to change your where clause to...?:
WHERE T.Time = tmp.maxtime or T.Time = tmp.mintime

How to Group a table and get results for a row based on the previous rows' data

I have a lookup table that relates dates and people associated with those dates:
id, user_id,date
1,1,2014-11-01
2,2,2014-11-01
3,1,2014-11-02
4,3,2014-11-02
5,1,2014-11-03
I can group these by date(day):
SELECT DATE_FORMAT(
MIN(date),
'%Y/%m/%d 00:00:00 GMT-0'
) AS date,
COUNT(*) as count
FROM user_x_date
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 43200)
But, how can get the number of unique users, that have now shown up previously? For instance this would be a valid result:
unique, non-unique, date
2,0,2014-11-01
1,1,2014-11-02
0,1,2014-11-03
Is this possibly without having to rely on a scripting language to keep track of this data?
I think this query will do what you want, at least it seems to work for your limited sample data.
The idea is to use a correlated sub-query to check if the user_id has occurred on a date before the date of the current row and then do some basic arithmetic to determine number of unique/non-unique users for each date.
Please give it a try.
select
sum(u) - sum(n) as "unique",
sum(n) as "non-unique",
date
from (
select
date,
count(user_id) u,
case when exists (
select 1
from Table1 i
where i.user_id = o.user_id
and i.date < o.date
) then 1 else 0
end n
from Table1 o
group by date, user_id
) q
group by date
order by date;
Sample SQL Fiddle
I didn't include the id column in the sample fiddle as it's not needed (or used) to produce the result and won't change anything.
This is the relevant question: "But, how can get the number of unique users, that have now shown up previously?"
Calculate the first time a person shows up, and then use that for the aggregation:
SELECT date, count(*) as FirstVisit
FROM (SELECT user_id, MIN(date) as date
FROM user_x_date
GROUP BY user_id
) x
GROUP BY date;
I would then use this as a subquery for another aggregation:
SELECT v.date, v.NumVisits, COALESCE(fv.FirstVisit, 0) as NumFirstVisit
FROM (SELECT date, count(*) as NumVisits
FROM user_x_date
GROUP BY date
) v LEFT JOIN
(SELECT date, count(*) as FirstVisit
FROM (SELECT user_id, MIN(date) as date
FROM user_x_date
GROUP BY user_id
) x
GROUP BY date
) fv
ON v.date = fv.date;

SQL Query to group and add time between consecutive rows

Need help with SQL Query (MySQL)
Say I have a table with data as..
The table has the Latitude and Longitude locations logged for a person at some time intervals (TIME column), And DISTANCE_TRAVELLED column has the distance traveled from its previous record.
If i want to know how many minutes a person was not moving (i.e DISTANCE_TRAVEKLLED <= 0.001)
what query should i use?
Can we also group the data by Date? Basically i want to know how many minutes the person was idle in a specific day.
You need to get the previous time for each record. I like to do this using a correlated subquery:
select t.*,
(select t2.time
from table t2
where t2.device = t.device and t2.time < t.time
order by time desc
limit 1
) as prevtime
from table t;
Now you can get the number of minutes not moved, as something like:
select t.*, TIMESTAMPDIFF(MINUTE, prevftime, time) as minutes
from (select t.*,
(select t2.time
from table t2
where t2.device = t.device and t2.time < t.time
order by time desc
limit 1
) as prevtime
from table t
) t
The rest of what you request is just adding the appropriate where clause or group by clause. For instance:
select device, date(time), sum(TIMESTAMPDIFF(MINUTE, prevftime, time)) as minutes
from (select t.*,
(select t2.time
from table t2
where t2.device = t.device and t2.time < t.time
order by time desc
limit 1
) as prevtime
from table t
) t
where distance_travelled <= 0.001
group by device, date(time)
EDIT:
For performance, create an index on table(device, time).

MySQL query with join or subquery

I have such a schema and queries:
http://sqlfiddle.com/#!2/7b032/3
Seperately I have these queries:
SELECT COUNT(*) AS 'times', userid, name
FROM main
WHERE comedate <= DATE_SUB(CURDATE(),
INTERVAL 5 DAY)
GROUP BY userid ORDER BY times DESC LIMIT 0,2;
SELECT * FROM details WHERE 1;
By comparing userid columns of both table I need to join them.
I need an output having these columns:
"times, userid, name, age, location"
Also order, group and limits should be considered.
I would be happy if you can write one query with JOIN and one query with subquery.
I have a 60k table and I will compare the performances.
How about this:
select x.times,
x.userid,
x.name,
d.age,
d.location
from
(
SELECT COUNT(*) AS 'times', userid, name
FROM main
WHERE comedate <= DATE_SUB(CURDATE(),
INTERVAL 5 DAY)
GROUP BY userid
) x
left join details d
on x.userid = d.userid
see SQL Fiddle with Demo
edit:
select x.times,
x.userid,
x.name,
d.age,
d.location
from
(
SELECT COUNT(*) AS 'times', userid, name
FROM main
WHERE comedate <= DATE_SUB(CURDATE(),
INTERVAL 5 DAY)
GROUP BY userid
ORDER BY times DESC
LIMIT 0,2
) x
left join details d
on x.userid = d.userid
see SQL Fiddle with demo

SQL selecting average score over range of dates

I have 3 tables:
doctors (id, name) -> has_many:
patients (id, doctor_id, name) -> has_many:
health_conditions (id, patient_id, note, created_at)
Every day each patient gets added a health condition with a note from 1 to 10 where 10 is a good health (full recovery if you may).
What I want to extract is the following 3 statistics for the last 30 days (month):
- how many patients got better
- how many patients got worst
- how many patients remained the same
These statistics are global so I don't care right now of statistics per doctor which I could extract given the right query.
The trick is that the query needs to extract the current health_condition note and compare with the average of past days (this month without today) so one needs to extract today's note and an average of the other days excluding this one.
I don't think the query needs to define who went up/down/same since I can loop and decide that. Just today vs. rest of the month will be sufficient I guess.
Here's what I have so far which obv. doesn't work because it only returns one result due to the limit applied:
SELECT
p.id,
p.name,
hc.latest,
hcc.average
FROM
pacients p
INNER JOIN (
SELECT
id,
pacient_id,
note as LATEST
FROM
health_conditions
GROUP BY pacient_id, id
ORDER BY created_at DESC
LIMIT 1
) hc ON(hc.pacient_id=p.id)
INNER JOIN (
SELECT
id,
pacient_id,
avg(note) AS average
FROM
health_conditions
GROUP BY pacient_id, id
) hcc ON(hcc.pacient_id=p.id AND hcc.id!=hc.id)
WHERE
date_part('epoch',date_trunc('day', hcc.created_at))
BETWEEN
(date_part('epoch',date_trunc('day', hc.created_at)) - (30 * 86400))
AND
date_part('epoch',date_trunc('day', hc.created_at))
The query has all the logic it needs to distinguish between what is latest and average but that limit kills everything. I need that limit to extract the latest result which is used to compare with past results.
Something like this assuming created_at is of type date
select p.name,
hc.note as current_note,
av.avg_note
from patients p
join health_conditions hc on hc.patient_id = p.id
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc2
where created_at between current_date - 30 and current_date - 1
group by patient_id
) avg on t.patient_id = hc.patient_id
where hc.created_at = current_date;
This is PostgreSQL syntax. I'm not sure if MySQL supports date arithmetics the same way.
Edit:
This should get you the most recent note for each patient, plus the average for the last 30 days:
select p.name,
hc.created_at as last_note_date
hc.note as current_note,
t.avg_note
from patients p
join health_conditions hc
on hc.patient_id = p.id
and hc.created_at = (select max(created_at)
from health_conditions hc2
where hc2.patient_id = hc.patient_id)
join (
select patient_id,
avg(note) as avg_note
from health_conditions hc3
where created_at between current_date - 30 and current_date - 1
group by patient_id
) t on t.patient_id = hc.patient_id
SELECT SUM(delta < 0) AS worsened,
SUM(delta = 0) AS no_change,
SUM(delta > 0) AS improved
FROM (
SELECT patient_id,
SUM(IF(DATE(created_at) = CURDATE(),note,NULL))
- AVG(IF(DATE(created_at) < CURDATE(),note,NULL)) AS delta
FROM health_conditions
WHERE DATE(created_at) BETWEEN CURDATE() - INTERVAL 1 MONTH AND CURDATE()
GROUP BY patient_id
) t