I have seen a few post's regarding slow queries but none had the answer I'm hoping for.
I've been staring at this query for ages and for some reason cant see whats making this so damn slow dates such as 2022-01-01 > 2022-12-21 even taking 80 seconds....
So here is the query
SELECT
accounts.first_name,
accounts.last_name,
accounts.email,
(
SELECT
COUNT(ID)
FROM
customer_migration_details
WHERE
date_opened BETWEEN '2022-01-01' AND '2022-12-31' AND customer_migration_details.Assigned_to = accounts.email GROUP BY `accounts`.`email` LIMIT 1
) AS 'New Customers'
FROM
customer_migration_details
RIGHT JOIN accounts ON customer_migration_details.Assigned_to = accounts.email
WHERE
date_opened BETWEEN '2022-01-01' AND '2022-12-31' AND customer_migration_details.Assigned_to = accounts.email AND accounts.role LIKE '%Sales%'
GROUP BY
`accounts`.`email`
Heres the results
but here is the annoying part.
Showing rows 0 - 7 (8 total, Query took 109.5797 seconds.)
Theres got to be something im missing in the subquery maybe thats causing this to take so long.
acc: INDEX(email)
cmd: INDEX(Assigned_to, date_opened)
Having GROUP BY acc.email in the subquery seems wrong. And it may be unnecessary in the outer query.
Do not say COUNT(x) unless you need to avoid counting rows with x IS NULL. Instead, say simply COUNT(*).
If date_opened is a DATETIME, then you have excluded all but one second of New Years Eve.
LIKE with an initial wildcard is a performance problem. Are there multiple "roles" with "Sales" in them?
My brain gets scrambled when I see RIGHT JOIN. Can it be turned around to be a LEFT JOIN? Anyway, it seems to be an INNER JOIN.
Please provide EXPLAIN SELECT ...
Use a JOIN with GROUP BY or use a correlated sub-query, but not both at the same time.
SELECT
accounts.first_name,
accounts.last_name,
accounts.email,
COUNT(customer_migration_details.id) AS new_customers
FROM
accounts
LEFT JOIN
customer_migration_details
ON customer_migration_details.assigned_to = accounts.email
AND customer_migration_details.date_opened BETWEEN '2022-01-01' AND '2022-12-31'
WHERE
accounts.role LIKE '%Sales%'
GROUP BY
accounts.email
Or...
SELECT
accounts.first_name,
accounts.last_name,
accounts.email,
(
SELECT
COUNT(ID)
FROM
customer_migration_details
WHERE
date_opened BETWEEN '2022-01-01' AND '2022-12-31'
AND assigned_to = accounts.email
)
AS new_customers
FROM
accounts
WHERE
accounts.role LIKE '%Sales%'
Notes:
It's bad practice to put spaces, etc, in column names, so I changed New Customers to new_customers.
LIKE '%Sales%' can't use an index, so will scan each and every account row.
Related
I have made several tests to optimize the query below but none of them helped.
What I tried is;
Add extra indexes
Change query logic by checking other attributes aswell in IN clause
Tested suggestions of online query optimization tools (eversql etc)
Indexes I am using;
radacct (`_accttime`);
radacct (`username`);
radacct (`acctstoptime`,`_accttime`);
Complete Query;
(SELECT *
FROM `radacct`
WHERE (radacct._accttime > NOW() - INTERVAL 1.2 HOUR)
AND radacct.acctstoptime IN
(SELECT MAX(radacct.acctstoptime)
FROM `radacct`
GROUP BY radacct.username) )
UNION
(SELECT *
FROM `radacct`
WHERE (radacct._accttime >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
AND radacct.acctstoptime IS NULL) )
When I execute SELECT statements above by themselves, they only take about few miliseconds.
I have issue with IN clause. So this is the query that takes ages
As I see it, your problem is the dependent subquery in your IN. Apparently the optimizer doesn't get that the subquery technically doesn't change much. (also, the query might be suboptimal). Essentially, the subquery is executed for each row (which is bad).
Now, we have to find out, which part triggers it to be a dependent, because it isn't really. My first try would be to give it a different alias:
IN (SELECT MAX(inner.acctstoptime) FROM radacct AS `inner` GROUP BY inner.username)
If that isn't enough to make it independent, make it a full-blown join (INNER, such that non-joined rows [= non-max rows] are discarded from the result):
INNER JOIN (
SELECT MAX(inner.accstoptime) as maxstoptime, inner.username
FROM `radacct` AS `inner`
GROUP BY inner.username
) sub ON (sub.maxstoptime=radacct.acctstoptime)
Hope that does the trick.
since your result has rows of users with their max acctstoptimes, it might - on rare occasions - contain more than one row for a user, when there is a row with a acctstoptime, which isn't the max for THAT user but it matches the max of another user. In the join part, you can just add another condition in the ON-clause. In the IN subquery, you would drop the explicit group by and add WHERE radacct.username=inner.username. (which would indeed make it an explicit dependent subquery, but the optimizer might be able to handle it)
update: due to miscommunication ...
The resulting complete query with the join:
(SELECT DISTINCT radacct.*
FROM radacct
INNER JOIN (
SELECT MAX(inner.accstoptime) as maxstoptime, inner.username
FROM `radacct` AS `inner`
GROUP BY inner.username
) sub ON (sub.maxstoptime=radacct.acctstoptime)
WHERE (_accttime > NOW() - INTERVAL 1.2 HOUR)
)
UNION
(SELECT *
FROM `radacct`
WHERE (_accttime >= DATE_SUB(NOW(),INTERVAL 2 MONTH)
AND acctstoptime IS NULL)
)
you may still add the username comparison in the ON clause.
What this query does is, it removes the "IN" selector and force a intermediate result for the join (for each username the max acctstoptime). the join will then join the normal rows to an intermediate result row, if and only if the acctstoptime is the max for some user (or THAT user, if you add the username comparison). If it doesn't have the max acctstoptime and thus no join "partner", it will be discarded from the result (caused by the INNER, the LEFT JOIN was somewhat insufficient), thus leaving only the rows with a max acctstoptime (in the first part of the union).
This bellow query will result the post wise sum of like
SELECT
tblPost.Post,
SUM(tblPost.LikeCount),
CASE WHEN tblPost.Time
BETWEEN (SELECT
CONVERT(VARCHAR(10),
DATEADD(DD,DATEDIFF(DD 0,GETDATE()),-60),120)) AND CONVERT(date,GETUTCDATE())
THEN 'Last 60 Days'
ELSE 'More Than 1 Year'
END AS"date type"
FROM tblPost
INNER JOIN tblProfile ON (tblProfile.ID=tblPost.UID)
INNER JOIN tblWatchList ON (tblWatchList.ID=tblProfile.UID)
WHERE dbo.tblPost.Time BETWEEN (SELECT
CONVERT(VARCHAR(10),
DATEADD(DD,DATEDIFF(DD,0,GETDATE()), -60),120))AND CONVERT(date,GETUTCDATE())
GROUP BY tblPost.Post,tblPost.Time
This is my query and it is working fine but I want to rewrite this. How can I describe it here... in my query I am having two GROUP BY clauses (tblPost.Post,tblPost.Time) and exactly here I'm getting a problem. I want to rewrite this query such as a way that I can group my result only by tblPost.Post
Please help me.
Your WHERE clause already eliminates the two options you've presented for tblPost.Time - you explicitly state you're only ever going to retrieve "Last 60 days" so why are you bothering to have a whole CASE statement in the query?
And you're joining tables that aren't even represented. So start by cleaning your query up, use some aliases, and drop what you don't need:
SELECT P.Post, SUM(P.LikeCount)
FROM dbo.tblPost P
WHERE P.Time BETWEEN (SELECT CONVERT(VARCHAR(10), DATEADD(DD,DATEDIFF(DD,0,GETDATE()), -60),120)) AND CONVERT(date, GETUTCDATE())
GROUP BY P.Post
I have the following query, which was developed from a hint found online because of a problem with a GROUP BY returning the maximum value; but it's running really slowly.
Having looked online I'm seeing that WHERE IN (SELECT.... GROUP BY) is probably the issue, but, to be honest, I'm struggling to find a way around this:
SELECT *
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
WHERE (a.train_id, a.TimeStamp) in (
SELECT a.train_id, max(a.TimeStamp)
FROM a
GROUP BY a.train_id
)
I'm thinking I possibly need a derived table, but my experience in this area is zero and it's just not working out!
you can move that to a SUBQUERY and also select only required columns instead of All (*)
SELECT a.train_uid
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
JOIN (SELECT a.train_id, max(a.TimeStamp) as TimeStamp
FROM a
GROUP BY a.train_id )T
on a.train_id = T.train_id
and a.TimeStamp = T.TimeStamp
I have a mysql query and it works fine when i use where clause, but when i donot use
where clause it gone and never gives the output and finally timeout.
Actually i have used Explain command to check the performance of the query and in both cases the Explain gives the same number of rows used in joining.
I have attached the image of output got with Explain command.
Below is the query.
I couldn't figure whats the problem here.
Any help is highly appreciated.
Thanks.
SELECT
MCI.CLIENT_ID AS CLIENT_ID, MCI.NAME AS CLIENT_NAME, MCI.PRIMARY_CONTACT AS CLIENT_PRIMARY_CONTACT,
MCI.ADDED_BY AS SP_ID, CONCAT(MUD_SP.FIRST_NAME, ' ', MUD_SP.LAST_NAME) AS SP_NAME,
MCI.FK_PROSPECT_ID AS PROSPECT_ID, MCI.DATE_ADDED AS ADDED_ON,
(SELECT GROUP_CONCAT(LT.TAG_TEXT SEPARATOR ', ')
FROM LK_TAG LT
INNER JOIN M_OBJECT_TAG_MAPPING MOTM
ON LT.PK_ID = MOTM.FK_TAG_ID
WHERE MOTM.FK_OBJECT_ID = MCI.FK_PROSPECT_ID
AND MOTM.OBJECT_TYPE = 1
AND MOTM.IS_ACTIVE = 1
) AS TAGS,
IFNULL(SUM(GET_DIGITS(MMR.RCP_AMOUNT)), 0) AS REVENUE_SO_FAR,
IFNULL(SUM(GET_DIGITS(MMR.RCP_RUPEES)), 0) AS REVENUE_INR,
COUNT(DISTINCT PMI_MONTHLY.PROJECT_ID) AS MONTHLY,
COUNT(DISTINCT PMI_FIXED.PROJECT_ID) AS FIXED,
COUNT(DISTINCT PMI_HOURLY.PROJECT_ID) AS HOURLY,
COUNT(DISTINCT PMI_ANNUAL.PROJECT_ID) AS ANNUAL,
COUNT(DISTINCT PMI_CURRENTLY_RUNNING.PROJECT_ID) AS CURRENTLY_RUNNING_PROJECTS,
COUNT(DISTINCT PMI_YET_TO_START.PROJECT_ID) AS YET_TO_START_PROJECTS,
COUNT(DISTINCT PMI_TECH_SALES_CLOSED.PROJECT_ID) AS TECH_SALES_CLOSED_PROJECTS
FROM
M_CLIENT_INFO MCI
INNER JOIN M_USER_DETAILS MUD_SP
ON MCI.ADDED_BY = MUD_SP.PK_ID
LEFT OUTER JOIN M_MONTH_RECEIPT MMR
ON MMR.CLIENT_ID = MCI.CLIENT_ID
LEFT OUTER JOIN M_PROJECT_INFO PMI_FIXED
ON PMI_FIXED.CLIENT_ID = MCI.CLIENT_ID AND PMI_FIXED.PROJECT_TYPE = 1
LEFT OUTER JOIN M_PROJECT_INFO PMI_MONTHLY
ON PMI_MONTHLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_MONTHLY.PROJECT_TYPE = 2
LEFT OUTER JOIN M_PROJECT_INFO PMI_HOURLY
ON PMI_HOURLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_HOURLY.PROJECT_TYPE = 3
LEFT OUTER JOIN M_PROJECT_INFO PMI_ANNUAL
ON PMI_ANNUAL.CLIENT_ID = MCI.CLIENT_ID AND PMI_ANNUAL.PROJECT_TYPE = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_CURRENTLY_RUNNING
ON PMI_CURRENTLY_RUNNING.CLIENT_ID = MCI.CLIENT_ID AND PMI_CURRENTLY_RUNNING.STATUS = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_YET_TO_START
ON PMI_YET_TO_START.CLIENT_ID = MCI.CLIENT_ID AND PMI_YET_TO_START.STATUS < 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_TECH_SALES_CLOSED
ON PMI_TECH_SALES_CLOSED.CLIENT_ID = MCI.CLIENT_ID AND PMI_TECH_SALES_CLOSED.STATUS > 4
WHERE YEAR(MCI.DATE_ADDED) = '2012'
GROUP BY MCI.CLIENT_ID ORDER BY CLIENT_NAME ASC
Yes, as many people have said, the key is that when you have the where clause, mysql engine filters the table M_CLIENT_INFO --probably drammatically--.
A similar result as removing the where clause is to to add this where clause:
where 1 = 1
You will see that the performance is degraded also because mysql will try to get all the data.
Remove the where clause and all columns from select and add a count to see how many records you get. If it is reasonable, say up to 10k, then do the following,
put back the select columns related to M_CLIENT_INFO
do not include the nested one "TAGS"
remove all your joins
run your query without where clause and gradually include the joins
this way you'll find out when the timeout is caused.
I would try the following. First, MySQL has a keyword "STRAIGHT_JOIN" which tells the optimizer to do the query in the table order you've specified. Since all you left-joins are child-related (like a lookup table), you don't want MySQL to try and interpret one of those as a primary basis of the query.
SELECT STRAIGHT_JOIN ... rest of query.
Next, your M_PROJECT_INFO table, I dont know how many columns of data are out there, but you appear to be concentrating on just a few columns on your DISTINCT aggregates. I would make sure you have a covering index on these elements to help the query via an index on
( Client_ID, Project_Type, Status, Project_ID )
This way the engine can apply the criteria and get the distinct all out of the index instead of having to go back to the raw data pages for the query.
Third, your M_CLIENT_INFO table. Ensure that has an index on both your criteria, group by AND your Order By, and change your order by from the aliased "CLIENT_NAME" to the actual column of the SQL table so it matches the index
( Date_Added, Client_ID, Name )
I have "name" in ticks as it is also a reserved word and helps clarify the column, not the keyword.
Next, the WHERE clause. Whenever you apply a function to an indexed column name, it doesn't work the greatest, especially on date/time fields... You might want to change your where clause to
WHERE MCI.Date_Added between '2012-01-01' and '2012-12-31 23:59:59'
so the BETWEEN range is showing the entire year and the index can better be utilized.
Finally, if the above do not help, I would consider splitting your query some. The GROUP_CONCACT inline select for the TAGS might be a bit of a killer for you. You might want to have all the distinct elements first for the grouping per client, THEN get those details.... Something like
select
PQ.*,
group_concat(...) tags
from
( the entire primary part of the query ) as PQ
Left join yourGroupConcatTableBasis on key columns
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.