There are 3 tables, persontbl1, persontbl2 (each 7500 rows) and schedule (~3000 active schedules i.e. schedule.status = 0). Person tables contain data for the same persons as one to one relationship and INNER join between two takes less than a second. And schedule table contains data about persons to be interviewed and not all persons have schedules in schedule table. With Left join query instantly takes around 45 seconds, which is causing all sorts of issues.
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
schedule.id, schedule.call_datetime, schedule.enum_id,
schedule.enum_change, schedule.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN SCHEDULE ON (schedule.survey_id = persontbl1._URI)
AND (SCHEDULE.status=0)
AND (DATE(SCHEDULE.call_datetime) <= CURDATE())
ORDER BY schedule.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
Here is the explain for query:
Schedule Table structure:
Schedule Table indexes:
Please let me know if any further information is required.
Thanks.
Edit: Added fully qualified table names and their columns.
You should just replace this line:
AND (DATE(SCHEDULE.call_datetime) <= CURDATE())
to this one:
AND SCHEDULE.call_datetime <= '2015-04-18 00:00:00'
so mysql will not call 2 functions per every record but will use static constant '2015-04-18 00:00:00'.
So you can just try for performance improvements if your query is:
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
schedule.id, schedule.call_datetime, schedule.enum_id,
schedule.enum_change, schedule.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN SCHEDULE ON (schedule.survey_id = persontbl1._URI)
AND (SCHEDULE.status=0)
AND (SCHEDULE.call_datetime <= '2015-02-01 00:00:00')
ORDER BY schedule.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
EDIT 1 So you said without LEFT JOIN part it was fast enough, so you can try then:
SELECT persontbl1._CREATION_DATE, persontbl2._TOP_LEVEL_AURI,
persontbl2.RESP_CNIC, persontbl2.RESP_CNIC_NAME,
persontbl1.MOB_NUMBER1, persontbl1.MOB_NUMBER2,
s.id, s.call_datetime, s.enum_id,
s.enum_change, s.status
FROM persontbl1
INNER JOIN persontbl2 ON (persontbl2._TOP_LEVEL_AURI = persontbl1._URI)
AND (AGR_CONTACT=1)
LEFT JOIN
(SELECT *
FROM SCHEDULE
WHERE status=0
AND call_datetime <= '2015-02-01 00:00:00'
) s
ON s.survey_id = persontbl1._URI
ORDER BY s.call_datetime IS NULL DESC, persontbl1._CREATION_DATE ASC
I'm guessing that AGR_CONTACT comes from p1. This is the query you want to optimize:
SELECT p1._CREATION_DATE, _TOP_LEVEL_AURI, RESP_CNIC, RESP_CNIC_NAME,
MOB_NUMBER1, MOB_NUMBER2,
s.id, s.call_datetime, s.enum_id, s.enum_change, s.status
FROM persontbl1 p1 INNER JOIN
persontbl2 p2
ON (p2._TOP_LEVEL_AURI = p1._URI) AND (p1.AGR_CONTACT = 1) LEFT JOIN
SCHEDULE s
ON (s.survey_id = p1._URI) AND
(s.status = 0) AND
(DATE(s.call_datetime) <= CURDATE())
ORDER BY s.call_datetime IS NULL DESC, p1._CREATION_DATE ASC;
The best indexes for this query are: persontbl2(agr_contact), persontbl1(_TOP_LEVEL_AURI, _uri), and schedule(survey_id, status, call_datime).
The use of date() around the date time is not recommended. In general, that precludes the use of indexes. However, in this case, you have a left join, so it doesn't make a difference. That column is not being used for filtering anyway. The index on schedule is only for covering the on clause.
Related
I want to find the Daily Active Users, which in each application differs how these are calculated. In my case, I have multiple tables where a user could have had an activity.
I've been able to do a LEFT JOIN in one of the tables, but I don't know how to incorporate the rest of the tables to get the activity that happened the last 30 days.
SELECT
DATE_FORMAT(user_video_plays.created_at, '%Y-%m-%d') AS date,
count(*)
FROM
`users`
INNER JOIN `subscriptions` ON `users`.`id` = `subscriptions`.`user_id`
LEFT JOIN `user_video_plays` ON `users`.`id` = `user_video_plays`.`user_id`
WHERE
`users`.`deleted_at` IS NULL
AND `subscriptions`.`chargebee_status` <> 'cancelled'
AND `user_video_plays`.`created_at` BETWEEN '2022-10-01 00:00:00' AND '2022-10-31 23:59:59'
GROUP BY
DATE_FORMAT(user_video_plays.created_at, '%Y-%m-%d')
I have 2 more tables where the user could have activity: forum_posts and forum_post_replies. How can I incorporate them into my query so I get the activity grouped by day?
I've prepared a DB fiddle with the structure and some sample data, as well as my query: https://www.db-fiddle.com/f/ppRaWP7SPDURm8dePyAkEr/0
Thank you
UPDATE 1: Looking at #Luuk answer, I realized that also somehow we need to make this unique. In the following fiddle, I've simplified the data but user_video_plays have 3 plays from the same user and that shouldn't count as 3 but one: https://dbfiddle.uk/ZszSND-H - I think this is easy on my single table query, with a unique, but I should have this into consideration with the 3 extra tables.
I have added forum_posts:
SELECT
DATE_FORMAT(user_video_plays.created_at, '%Y-%m-%d') AS date,
count(*) countUsers,
count(`user_video_plays`.`user_id`) videoPlays,
count(`forum_posts`.`user_id`) forumPosts
FROM
`users`
INNER JOIN `subscriptions` ON `users`.`id` = `subscriptions`.`user_id`
LEFT JOIN `user_video_plays` ON `users`.`id` = `user_video_plays`.`user_id`
AND `user_video_plays`.`created_at` BETWEEN '2022-10-01 00:00:00' AND '2022-10-31 23:59:59'
LEFT JOIN `forum_posts` ON `users`.`id` = `forum_posts`.`user_id`
AND `forum_posts`.`created_at` BETWEEN '2022-10-01 00:00:00' AND '2022-10-31 23:59:59'
WHERE
`users`.`deleted_at` IS NULL
AND `subscriptions`.`chargebee_status` <> 'cancelled'
GROUP BY
DATE_FORMAT(user_video_plays.created_at, '%Y-%m-%d')
NOTE: I moved AND user_video_plays.created_at BETWEEN .... from the WHERE-clause to the ON-clause of the LEFT JOIN.
for the output, see: DBFIDDLE
Can you can do the other table yourself, following this example?
I'm trying to run count query on a 2 table join. e_amazing_client table is having million entries/rows and m_user has just 50 rows BUT count query is taking forever!
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` >= '2018-11-18')) AND (`e`.`id` >= 1)
I don't know what is wrong with this query?
First, I'm guessing that this is sufficient:
SELECT COUNT(*) AS `count`
FROM e_amazing_client e
WHERE e.date_created >= '2018-11-11' AND e.id >= 1;
If user has only 50 rows, I doubt it is creating duplicates. The comparisons on date_created are redundant.
For this query, try creating an index on e_amazing_client(date_created, id).
Maybe you wanted this:
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` <= '2018-11-18')) AND (`e`.`id` >= 1)
to check between dates?
Also, do you really need
AND (`e`.`id` >= 1)
If id is what an id is usually in a table, is there a case to be <1?
Your query is pulling ALL records on/after 2018-11-11 because your WHERE clause is ID >= 1 You have no clause in there for a specific user. You also had in your original query based on a date of >= 2018-11-18. You MAY have meant you only wanted the count WITHIN the week 11/11 to 11/18 where the sign SHOULD have been >= 11-11 and <= 11-18.
As for the count, you are getting ALL people (assuming no entry has an ID less than 1) and thus a count within that date range. If you want it per user as you indicated you need to group by the cx_hc_user_id (user) column to see who has the most, or make the user part of the WHERE clause to get one person.
SELECT
e.cx_hc_user_id,
count(*) countPerUser
from
e_amazing_client e
WHERE
e.date_created >= '2018-11-11'
AND e.date_created <= '2018-11-18'
group by
e.cx_hc_user_id
You can order by the count descending to get the user with the highest count, but still not positive what you are asking.
I have a performance issue with the query below on MYSQL. The below query has 5 tables involved. When I apply the order by and limit, the results are retrieved in 0.3 secs. But without the order by and limit, I was able to get the results in 0.01 secs. I am tired changing the query but that did not work. Could someone please help me with this query so I can get the results in desired time (<0.3 secs).
Below are the details.
m_todos = 286579 (records)
m_pat = 214858 (records)
users = 119 (records)
m_programs = 26 (records)
role = 4 (records)
SELECT *
FROM (
SELECT t.*,
mp.name as A_name,
u.first_name, u.last_name,
p.first, p.last, p.zone, p.language,p.handling,
r.name,
u2.first_name AS created_first_name,
u2.last_name AS created_last_name
FROM m_todos t
INNER JOIN role r ON t.role_id=r.id
INNER JOIN m_pat p ON t.patient_id = p.id
LEFT JOIN users u2 ON t.created_id=u2.id
LEFT JOIN m_programs mp ON t.prog_id=mp.id
LEFT JOIN users u ON t.user_id=u.id
WHERE t.role_id !='9'
AND t.completed = '0000-00-00 00:00:00'
) C
ORDER BY priority DESC, due ASC
LIMIT 0,10
Get rid of the outer SELECT; move the ORDER BY and LIMIT in.
Indexes:
t: (completed)
t: (priority, due)
I assume priority and due are in t?? Please be explicit in the query. It could make a huge difference.
If the following works, it should speed things up a lot: Start by finding the t.id without all the JOINs:
SELECT id
FROM m_todos
WHERE role_id !='9'
AND completed = '0000-00-00 00:00:00'
ORDER BY priority DESC, due DESC
LIMIT 10
That will benefit from this covering composite index:
INDEX(completed, role_id, priority, due, id)
Debug that. Then use it in the rest:
SELECT t.*, the-other-stuff
FROM ( that-query ) AS t1
JOIN m_todos AS t USING(id)
then-the-rest-of-the-JOINs
ORDER BY priority DESC, due ASC -- yes, again
If you don't need all of t.*, it may be beneficial to spell out the actual columns needed.
The reason for this to run much faster is that the 10 rows are found efficiently by looking only at the one table. The original code was shoveling around a lot more rows than 10 and they included all the columns of t, plus columns from the other tables.
My version does only 10 lookups for all the extra stuff.
I'm working on a query, and I'm a bit stuck.
Here's my query:
SELECT *
FROM
routine_actions AS ra
JOIN
routines AS r ON r.id = ra.routine_id
JOIN
account_routines AS ar ON ar.routine_id = r.id
JOIN
accounts AS a ON a.id = ar.account_id
WHERE
(ra.last_run + INTERVAL ra.interval_minutes MINUTE <= NOW() OR ra.last_run IS NULL)
AND
r.created_at + INTERVAL r.runtime_days DAY > NOW()
What I'm trying to do:
An account has many routines. Only one routine can be used at a time, and that's the routine with the highest priority. The table that contains the priority column is account_routines because accounts can reuse routines and specify a different priority.
A higher number equals a higher priority. Currently the query is pulling all routines from all accounts. But I only need one routine with the highest priority from each account.
How is this possible? I don't need the solution, just where to look so I can figure out how to solve this problem.
I think you need to locate the max(priority) and include that in the joins to limit the rows returned. e.g.
SELECT
*
FROM routine_actions AS ra
JOIN routines AS r ON r.id = ra.routine_id
JOIN account_routines AS ar ON ar.routine_id = r.id
JOIN (
SELECT
account_id
, MAX(priority) AS max_priority
FROM account_routines
GROUP BY
account_id
) AS mxr ON ar.priority = mxr.max_priority
AND ar.account_id = mxr.account_id
JOIN accounts AS a ON a.id = ar.account_id
WHERE ...
I have the following query:
SELECT
fruit.date,
fruit.name,
fruit.reason,
fruit.id,
fruit.notes,
food.name
FROM
fruit
LEFT JOIN
food_fruits AS ff ON fruit.fruit_id = ff.fruit_id AND ff.type='fruit'
LEFT JOIN
food USING (food_id)
LEFT JOIN
fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE
(fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY))
AND (fruit.`status` = 'Rotten')
AND (fruit.location = 'USA')
AND (fruit.size = 'medium')
AND (fs.fruit_id IS NULL)
ORDER BY `food.name` asc
LIMIT 15 OFFSET 0
And all the indexes you could ever want, including the following which are being used:
fruit - fruit_filter (size, status, location, date)
food_fruits - food_type (type)
food - food (id)
fruits_sour - fruit_id (fruit_id)
I even have indexes which I thought would work better which are not being used:
food_fruits - fruit_key (fruit_id, type)
food - id_name (food_id, name)
The ORDER BY clause is causing a temporary table and filesort to be used, unfortunately. Without that, the query runs lickety-split. How can I get this query to not need to filesort? What am I missing?
EDIT:
The Explain:
The reason for this is your ORDER BY clause which is done on the field which is not part of index used for this query. The engine can run the query using the fruit_filter index, but then it has to sort on the different field, and that's when filesort comes into play (which basically means "sort without using index", thanks to the reminder in comments).
I don't know what times you are getting as a result, but if the difference is a lot, then I would create a temporary table with intermediate results and sorted it afterwards.
(By the way, i am not sure why do you use LEFT JOIN instead of INNER JOIN and why do you use food_fruits - answered in comments)
UPDATE.
Try subquery approach, may be (untested), which splits sorting from pre-filtering:
SELECT
fr.date,
fr.name,
fr.reason,
fr.id,
fr.notes,
food.name
FROM
(
SELECT
fruit.date,
fruit.name,
fruit.reason,
fruit.id,
fruit.notes,
FROM
fruit
LEFT JOIN
fruits_sour AS fs ON fruit.id = fs.fruit_id
WHERE
(fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY))
AND (fruit.`status` = 'Rotten')
AND (fruit.location = 'USA')
AND (fruit.size = 'medium')
AND (fs.fruit_id IS NULL)
) as fr
LEFT JOIN
food_fruits AS ff ON fr.fruit_id = ff.fruit_id AND ff.type='fruit'
LEFT JOIN
food USING (food_id)
ORDER BY `food.name` asc
LIMIT 15 OFFSET 0
Your ORDER BY ... LIMIT clauses require some sorting, you know. The trick to optimizing performance is to ORDER BY ... LIMIT the minimal set of columns, and then build your full result set based on the chosen fifteen rows. So let's try for a minimal set of columns in a subquery.
SELECT fruit.id,
food.name
FROM fruit
LEFT JOIN food_fruits AS ff ON fruit.fruit_id = ff.fruit_id
AND ff.type='fruit'
LEFT JOIN food USING (food_id)
LEFT JOIN fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY)
AND fruit.`status` = 'Rotten'
AND fruit.location = 'USA'
AND fruit.size = 'medium'
AND fs.fruit_id IS NULL
ORDER BY food.name ASC
LIMIT 15 OFFSET 0
This query gives you the fifteen top ids and their names.
I would add id to the end of your existing fruit_filter index to give (size, status, location, date, id). That will make it into a compound covering index, and allow your filtering query to be satisfied entirely from the index.
Other than that, it's going to be hard to optimize this using more or different indexes because so much of the query is driven by other factors, like the LEFT JOIN ... IS NULL join-fail criterion you have applied.
Then you can join this subquery to your fruits table to pull the full result set.
That will look like this when it's all done.
SELECT fruit.date,
fruit.name,
fruit.reason,
fruit.id,
fruit.notes,
list.name
FROM fruit
JOIN (
SELECT fruit.id,
food.name
FROM fruit
LEFT JOIN food_fruits AS ff ON fruit.fruit_id = ff.fruit_id
AND ff.type='fruit'
LEFT JOIN food USING (food_id)
LEFT JOIN fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY)
AND fruit.`status` = 'Rotten'
AND fruit.location = 'USA'
AND fruit.size = 'medium'
AND fs.fruit_id IS NULL
ORDER BY food.name ASC
LIMIT 15 OFFSET 0
) AS list ON fruit.id = list.id
ORDER BY list.name
Do you see how this goes? In the subquery you sling around just enough data to identify which tiny subset of rows you want to retrieve. Then, you join that subquery to your main table to pull out all your data. Limiting the row length in the stuff you have to sort helps performance because MySQL can sort it its sort buffer, rather than having to do a more elaborate and slower sort / merge operation. (But, you can't tell from EXPLAIN whether it will do this or not.)