MySQL query SLOW don’t know how to optimize
I think I m fine with hardware 60GB RAM 10 Cores SSD
Hi I m having a big issue with this query running slow on Mysql they query is below:
# Thread_id: 1165100 Schema: back-Alvo-11-07-19 QC_hit: No
# Query_time: 9.015205 Lock_time: 0.000188 Rows_sent: 1 Rows_examined: 2616880
# Rows_affected: 0
SET timestamp=1568549358;
SELECT count(*) as total_rows FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category AS mc
ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd
ON(mcd.video_id = m.video_id)
WHERE m.in_process = 0 AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0 AND m.privacy IN(0)
AND mcd.category_id = 17
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
LIMIT 12
)) AS m
JOIN phpfox_user AS u
ON(u.user_id = m.user_id);
This query is running very slow as you can see 9 seconds
When looking for online help to optimize queries always talk about adding indexes,
as you can see below for EXPLAIN statment I already have indexes
Do you guys have any Idea where I should look to improve speed os this query? I m not DB guy having hard time with this. This is a website and have 400,000 videos.
Thanks
The explain shows that you are not using an index on table phpfox_channel_video as m, and that it is using a temporary index on table phpfox_channel_category AS mc, which means it is not using an index, but is building an index first, which takes considerable time.
Also, the index for table phpfox_channel_category_data AS mcd could be better.
The indexes you need are:
CREATE INDEX idx_cat_data_video_id ON phpfox_channel_category_data
(category_id, video_id);
CREATE INDEX idx_channel_cat_id ON phpfox_channel_category (category_id);
CREATE INDEX idx_video_mult ON phpfox_channel_video
(in_process, view_id, module_id, item_id, privacy, video_id, time_stamp);
Don't fetch m.* if you are only going to do COUNT(*).
If phpfox_channel_category is a many-to-many mapping table, follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
m needs INDEX(in_process, view_id, module_id, item_id, privacy) in any order.
Avoid the GROUP BY:
INNER JOIN phpfox_channel_category AS mc ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd ON(mcd.video_id = m.video_id)
AND mcd.category_id = 17
GROUP BY m.video_id
--> (something like)
AND EXISTS(
SELECT 1
FROM phpfox_channel_category AS mc
JOIN phpfox_channel_category_data AS mcd
ON mcd.video_id = mc.video_id
WHERE mcd.video_id = 17
AND mc.video_id = m.video_id
)
Let's make sure that we are optimizing the right query. I suggest we check this condition in the ON clause:
mc.category_id = mc.category_id
We know that's going to be TRUE for every row in mc with a non-NULL value of category_id. We could express that condition as:
mc.category_id IS NOT NULL
This means the join is almost a cross join; every row returned from m matched with every row from mc. That is, we could get an equivalent result writing:
FROM phpfox_channel_video m
JOIN phpfox_channel_category mc
ON mc.category_id IS NOT NULL
I suspect that's not actually the result we're after. I think we were meaning to match to m.category_id. But that's just a guess.
If video_id column is PRIMARY KEY or UNIQUE KEY on m, we can avoid the potentially expensive GROUP BY operation by avoiding the joins that create duplicated rows, by using EXISTS with correlated subqueries. If we can avoid generating an intermediate result with duplicate values of video_id, then we can avoid the need to do the GROUP BY.
Also, for the inline view query, rather than return all columns * we can return just the expressions that we need. In the outer query, the only column referenced is user_id.
So we could write something like this:
SELECT COUNT(*) AS total_rows
FROM (
SELECT m.user_id
FROM phpfox_channel_video m
WHERE EXISTS ( SELECT 1
FROM phpfox_channel_category mc
WHERE mc.category_id = m.category_id
-- mc.category_id = mc.category_id -- <original
)
AND EXISTS ( SELECT 1
FROM phpfox_channel_category_data mcd
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17
)
AND m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN (0)
ORDER BY m.time_stamp DESC
LIMIT 12
) d
JOIN phpfox_user u
ON u.user_id = d.user_id
For tuning, optimal index for m will have leading columns that have equality predicates, followed by the time_stamp column so that we can avoid a "Using filesort" operation, the ORDER BY can be satisfied by returning rows in index order. It looks like the reason we need the rows ordered is for the LIMIT clause.
... ON phpfox_channel_video (in_process, view_id, item_id, module_id
, time_stamp, video_id, ... )
The other two tables, we want indexes with leading columns that have equality predicates
... ON phpfox_channel_category_data (video_id, category_id, ...)
... ON phpfox_channel_category ( category_id, ... )
NOTES:
(It's not entirely clear why we need an inline view, and we are delaying the join from the user_id reference. Then again, the point of the entire query isn't really obvious to me; I'm just providing a re-write, given the provided SQL, with the change to the condition category_id.)
The above assumed that category_id column exists in m, and that it's a one-to-many relationship.
But if that's not true... if the mcd table is actually junction table, resolving a many-to-many relationship between video and category, such that the join condition was meant to be
mcd.category_id = mc.category_id
^
Then we would want to replace the WHERE EXISTS and AND EXISTS in the query above, into a single correlated subquery. Something like this:
SELECT COUNT(*) AS total_rows
FROM (
SELECT m.user_id
FROM phpfox_channel_video m
WHERE EXISTS ( SELECT 1
FROM phpfox_channel_category mc
JOIN phpfox_channel_category_data mcd
ON mcd.category_id = mc.category_id
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17
)
AND m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN (0)
ORDER BY m.time_stamp DESC
LIMIT 12
) d
JOIN phpfox_user u
ON u.user_id = d.user_id
Related
Inner query:
select up.user_id, up.id as utility_pro_id from utility_pro as up
join utility_pro_zip_code as upz ON upz.utility_pro_id = up.id and upz.zip_code_id=1
where up.available_for_survey=1 and up.user_id not in (select bjr.user_id from book_job_request as bjr where
((1583821800000 between bjr.start_time and bjr.end_time) and (1583825400000 between bjr.start_time and bjr.end_time)))
Divided in two queries:
select up.user_id, up.id as utility_pro_id from utility_pro as up
join utility_pro_zip_code as upz ON upz.utility_pro_id = up.id and upz.zip_code_id=1
Select bjr.user_id as userId from book_job_request as bjr where bjr.user_id in :userIds and (:startTime between bjr.start_time and bjr.end_time) and (:endTime between bjr.start_time and bjr.end_time)
Note:
As per my understanding, when single query will be executed using inner query it will scan all the data of book_job_request but while using multiple queries rows with specified user ids will be checked.
Any other better option for the same operation other than these two is also appreciated.
I expect that the query is supposed to be more like this:
SELECT up.user_id
, up.id utility_pro_id
FROM utility_pro up
JOIN utility_pro_zip_code upz
ON upz.utility_pro_id = up.id
LEFT
JOIN book_job_request bjr
ON bjr.user_id = up.user_id
AND bjr.end_time >= 1583821800000
AND bjr.start_time <= 1583825400000
WHERE up.available_for_survey = 1
AND upz.zip_code_id = 1
AND bjr.user_id IS NULL
For further help with optimisation (i.e. which indexes to provide) we'd need SHOW CREATE TABLE statements for all relevant tables as well as the EXPLAIN for the above
Another possibility:
SELECT up.user_id , up.id utility_pro_id
FROM utility_pro up
JOIN utility_pro_zip_code upz ON upz.utility_pro_id = up.id
WHERE up.available_for_survey = 1
AND upz.zip_code_id = 1
AND bjr.user_id IS NULL
AND NOT EXISTS( SELECT 1 FROM book_job_request
WHERE user_id = up.user_id
AND end_time >= 1583821800000
AND start_time <= 1583825400000 )
Recommended indexes (for my NOT EXISTS and for Strawberry's LEFT JOIN):
book_job_request: (user_id, start_time, end_time)
upz: (zip_code_id, utility_pro_id)
up: (available_for_survey, user_id, id)
The column order given is important. And, no, the single-column indexes you currently have are not as good.
I installed a plug in and I have done optimisation on the back end (SSD, single column indexing for columns called in GROUP BY & WHERE)
but when running this query
SELECT u.user_id, u.profile_page_id, u.server_id AS user_server_id, u.user_name, u.full_name, u.gender, u.user_image, u.is_invisible, u.user_group_id, u.language_id, u.birthday, u.country_iso, m.*
FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category AS mc
ON(mc.category_id = mc.category_id)
INNER JOIN phpfox_channel_category_data AS mcd
ON(mcd.video_id = m.video_id)
WHERE m.in_process = 0 AND m.view_id = 0 AND m.module_id = 'videochannel' AND m.item_id = 0 AND m.privacy IN(0) AND mcd.category_id = 17
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
)) AS m
JOIN phpfox_user AS u
ON(u.user_id = m.user_id)
ORDER BY m.time_stamp DESC
LIMIT 24;
it takes 20 seconds, while changing it to this instead
SELECT u.user_id, u.profile_page_id, u.server_id AS user_server_id, u.user_name, u.full_name, u.gender, u.user_image, u.is_invisible, u.user_group_id, u.language_id, u.birthday, u.country_iso, m.*
FROM(
(SELECT m.*
FROM phpfox_channel_video AS m
INNER JOIN phpfox_channel_category_data AS mcd
ON(mcd.video_id = m.video_id AND mcd.category_id = 17)
WHERE m.in_process = 0 AND m.view_id = 0 AND m.module_id = 'videochannel' AND m.item_id = 0 AND m.privacy IN(0)
GROUP BY m.video_id
ORDER BY m.time_stamp DESC
)) AS m
JOIN phpfox_user AS u
ON(u.user_id = m.user_id)
ORDER BY m.time_stamp DESC
LIMIT 24;
This runs about 5-6 seconds
The phpfox_channel_video contains 2 million rows (and will keep on adding quickly, its a social media site and user can upload files too) so caching isn't quite useful (but activated).
Any hints on how to optimise this ? I have minimum experience with MariaDB/MySQL as I've been accustomed to MS SQL for big data, and creating my own structure. Any recommended method without needing much altering to the tables (adding tables is OK).
Or should I need to restructure the PHP & table to optimise the query to be below 1 second / query.
Thank you!
I found these links
http://mysql.rjweb.org/doc.php/memory &
http://mysql.rjweb.org/doc.php/ricksrots#indexing
Are they still relevant ?
attached is the explain results
And as for the Index, the current config is set to index every column is stated as an index key, all all the tables involved in the query above.
Would a print out of my current server configuration be helpful ? Thanks !
INNER JOIN phpfox_channel_category AS mc ON(mc.category_id = mc.category_id)
Is almost useless.
You don't use any columns of mc for other purposes.
This JOIN is performed.
This JOIN verified that there is a corresponding row in mc.
This JOIN will bloat the temp table if there are multiple corresponding rows.
Bloat leads to wasted work in the GROUP BY.
Similarly, your second query does not use mcd.
Please use different aliases for derived tables. It is hard to follow the multiple uses of m..
This is totally useless:
ORDER BY m.time_stamp DESC
MySQL/MariaDB is free to ignore an ORDER BY in a derived table. A table is defined to be an unordered set of rows. Ordering can only be done at the end.
Suggested index
m: INDEX(item_id, module_id, view_id, in_process, -- any order; tested with '='
privacy, -- sometimes has a list?
video_id) -- last
mcd: INDEX(category_id, video_id) -- in either order
There is a more logical way to do this, and possibly faster:
INNER JOIN phpfox_channel_category_data AS mcd
ON mcd.video_id = m.video_id
AND mcd.category_id = 17
Remove that, and remove the GROUP BY m.id, then add this to the WHERE:
AND EXISTS( SELECT 1 FROM phpfox_channel_category_data AS mcd
WHERE mcd.video_id = m.video_id
AND mcd.category_id = 17 )
(The index mentioned above still applies.)
Not that I have perhaps eliminated two "filesorts" -- for the GROUP BY and the ORDER BY. Another note: EXPLAIN does not always show how many filesorts thre really are. (But EXPLAIN FORMAT=JSON SELECT ... does.)
I managed to clean up the query, after checking the table, turns out that
WHERE m.in_process = 0
AND m.view_id = 0
AND m.module_id = 'videochannel'
AND m.item_id = 0
AND m.privacy IN(0)
Doesn't need to be run, because all the table matches that condition .. (for the current case of this website).. So I just optimize those long queries. And Manage to hit < 1 second now ..
I am attaching results of two Explain statements for an old query and the newer version of that query.
Do you see anything that does not make sense or looks wrong? The query became slow(4.5 seconds) after I added tm, tsa and tcd tables.
Before those three tables were added to the query it was extremely fast (0.001 seconds). Here is what the explain looked like
tm table has four columns (tm_id (PK), owner_id, manager_id, status), tcd has three columns (tm_id, cd_id, created_date). tm_id and cd_id make a composite primary key and there is another index on cd_id. Same is the case with tsa with three columns (tm_id, smpa_id, created_date) with tm_id and smpa_id being a composite primary key and smpa_id has another index.
What could be the reason for such slowness?
old query:
SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
New Query:
SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
JOIN team_content_deck AS tcd ON ( tcd.cd_id = upcm.cd_id )
JOIN team_social_account AS tsa ON tsa.smpa_id = upcm.smpa_id
JOIN team_members AS tm ON tm.team_member_id = tsa.team_member_id
AND tm.team_member_id = tcd.team_member_id
AND tm.owner_id =2196
AND tm.manager_id =2196
AND tm.status =1
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
If I remove the conditions from the tm table, it is fast again. Nothing changed in the joins though.
EXPLAIN SELECT upcm_id, COUNT( * )
FROM user_post_content_master AS upcm
JOIN content_deck AS cd ON cd.cd_id = upcm.cd_id
JOIN social_media_post_account AS smpa ON smpa.smpa_id = upcm.smpa_id
JOIN post_content_master AS pcm ON pcm.pcm_id = upcm.pcm_id
JOIN team_content_deck AS tcd ON ( tcd.cd_id = upcm.cd_id )
JOIN team_social_account AS tsa ON tsa.smpa_id = upcm.smpa_id
JOIN team_members AS tm ON tm.team_member_id = tsa.team_member_id
AND tm.team_member_id = tcd.team_member_id
WHERE smpa.user_id =2196
AND upcm.upcm_post_date >=1545891957
AND upcm.upcm_status =1
AND upcm.upcm_post_date >=1546560000
AND upcm.upcm_post_date <=1546732799
GROUP BY upcm.upcm_id
ORDER BY upcm.upcm_post_date ASC
I see the difference is most likely because the key selected for upcm, old query selected upcm_post_date and new query selected cd_id.
Since the data is not enough, from the name, it seems that cd_id has a much lower cardinality comparing with upcm_post_date.
Update (Extracted from my comments below):
One possible reason is because of the sequence of tables mysql decided for the query, content_deck comes before user_post_content_master. Because mysql uses nested-loop algorithm for JOIN, user_post_content_master is in an inner loop for the join.
You have a constant lookup when tm.owner_id is present, which leads MySQL optimizer to decide it win over a range scan.
In the book High Performance MySQL, there is one chapter discussing the query optimization. There is one technique called: join decomposition, i.e., to separate one big join query to small one. One extra benefit is that you can cache some common data.
I am not sure whether Index Hint can help in this case (just hint or force MySQL to use post_data for upcm): SELECT * FROM user_post_content_master USE INDEX (upcm_post_date)
The below query is very slow (takes around 1 second), but is only searching approx 2500 records (+ inner joined tables).
if i remove the ORDER BY, the query runs in much less time (0.05 or less)
OR if i remove the part nested select below "# used to select where no ProfilePhoto specified" it also runs fast, but i need both of these included.
I have indexes (or primary key) on :tPhoto_PhotoID, PhotoID, p.Enabled, CustomerID, tCustomer_CustomerID, ProfilePhoto (bool), u.UserName, e.PrivateEmail, m.tUser_UserID, Enabled, Active, m.tMemberStatuses_MemberStatusID, e.tCustomerMembership_MembershipID, e.DateCreated
(do i have too many indexes? my understanding is add them anywhere i use WHERE or ON)
The Query :
SELECT e.CustomerID,
e.CustomerName,
e.Location,
SUBSTRING_INDEX(e.CustomerProfile,' ', 25) AS Description,
IFNULL(p.PhotoURL, PhotoTable.PhotoURL) AS PhotoURL
FROM tCustomer e
LEFT JOIN (tCustomerPhoto ep INNER JOIN tPhoto p ON (ep.tPhoto_PhotoID = p.PhotoID AND p.Enabled=1))
ON e.CustomerID = ep.tCustomer_CustomerID AND ep.ProfilePhoto = 1
# used to select where no ProfilePhoto specified
LEFT JOIN ((SELECT pp.PhotoURL, epp.tCustomer_CustomerID
FROM tPhoto pp
LEFT JOIN tCustomerPhoto epp ON epp.tPhoto_PhotoID = pp.PhotoID
GROUP BY epp.tCustomer_CustomerID) AS PhotoTable) ON e.CustomerID = PhotoTable.tCustomer_CustomerID
INNER JOIN tUser u ON u.UserName = e.PrivateEmail
INNER JOIN tmembers m ON m.tUser_UserID = u.UserID
WHERE e.Enabled=1
AND e.Active=1
AND m.tMemberStatuses_MemberStatusID = 2
AND e.tCustomerMembership_MembershipID != 6
ORDER BY e.DateCreated DESC
LIMIT 12
i have similar queries that but they run much faster.
any opinions would be grateful:
Until we get more clarity on your question between working in other query etc..Try EXPLAIN {YourSelectQuery} in MySQL client and see the suggestions to improve the performance.
How can I improve my existing query to display the correct lookup value if the second lookup.id was used already. Would this be better if I use derived tables? sub-queries? Can someone teach me please?
PROBLEM:
RECORDS TYPE TYPE_DESC PROCESS_ID STATUS QUEUE_DESC
1 1 Queued 55 4 Queued
1 2 Cancelled 84 7 Cancelled
MY GOAL:
RECORDS TYPE TYPE_DESC PROCESS_ID STATUS QUEUE_DESC
1 1 Initial 55 4 Queued
1 2 Follow Up 84 7 Cancelled
Existing query:
SELECT
COUNT(q.id) as records,
q.type,
l.description AS type_desc,
q.process_id,
q.status,
l.description AS queue_desc
FROM
queues q,
lookups l
WHERE
l.id = q.status
GROUP BY q.status;
To better understand my problem, please see sqlfiddle entry:
http://sqlfiddle.com/#!2/6b7d10/6
Thanks
You have to join "lookups" table twice.
SELECT COUNT(q.id) AS records
,q.type
,l1.description AS type_desc
,q.process_id
,q.status
,l2.description AS queue_desc
FROM queues q
,lookups l1
,lookups l2
WHERE l1.id = q.type
and l2.id = q.status
GROUP BY q.status;
that's all.
Try this:
select count(q.id) as records,
q.type, a.description, q.process_id, q.status,
b.description as qdesc
from
queues q
inner join lookups a on q.type = a.id
inner join lookups b on q.status = b.id
group by q.status
You need to join with lookups twice - once to get the type and again to get the status. Note the use of explicit join syntax with ON clause.
I think this is the query you want:
SELECT COUNT(q.id) as records,
q.type,
lt.description AS type_desc,
q.process_id,
q.status,
ls.description AS queue_desc
FROM queues q join
lookups ls
on ls.id = q.status and ls.key = 'status' join
lookups lt
on lt.id = q.type and lt.key = 'type'
GROUP BY q.status;
Note that this ensures that the key type matches the values for the joins.