MySql is null vs is not null performance - mysql

I have a query where I am basically doing a left outer join and checking if the joined value is null
select count(T1.code)
from ( select code
from asset
where type = 'meter'
and creation_time <= '2022-04-29 00:00:00'
and (deactivation_time > '2022-04-28 00:00:00' or deactivation_time is null )
group by code
) as T1
left join ( select asset_code
from amr_midnight_data
where server_time between '2022-04-28 00:00:00' and '2022-04-29 00:00:00'
group by asset_code
) as T2 on T1.code = T2.asset_code
Where T2.asset_code is null;
This query takes 3 seconds to execute, but if I replace the is null at the end with is not null, it takes less then a second. Why is there a performance difference here and what alternatives do I have to make my original query faster?

Look at the EXPLAIN. A guess... Changing to IS NOT NULL lets the Optimizer change LEFT JOIN to JOIN, which lets it start with amr_midnight_data which might optimize better.
I think that the LEFT JOIN ( SELECT ... ) .. IS [NOT] NULL can be replaced with
WHERE [NOT] EXISTS ( SELECT 1 FROM amr_midnight_data
WHERE asset_code = T1.code
AND server_time >= '2022-04-28'
AND server_time < '2022-04-28' + INTERVAL 1 DAY )
That would like to have INDEX(asset_code, server_time)
EXISTS is faster than SELECT .. GROUP BY because it can stop as soon as one matching row is found.
asset would probably benefit from INDEX(type, creation_time) or (to make it "covering"):
INDEX(time, creation_time, deactivation_time, code)
If you wish to discuss further, please provide SHOW CREATE TABLE for both tables and EXPLAIN for each SELECT.

Related

Query taking lot of time to execute

I am trying to run a query to get data one time from a client database to our database but a query is taking a lot of time to execute, when I change the order by from primary key user_appoint.id to user_appoint.u_id below is my query
SELECT
CONCAT('D',user_appoint.`id`) AS ApptId,
user_appoint.`u_id`,
tbl_questions.CandAns,
tbl_questions.ExamAns,
tbl_questions.QueNote,
CONCAT("[",GROUP_CONCAT(CONCAT('"',`tbl_investigations`.`test_id`,'":"',tbl_investigations.`result`,'"')),"]") AS CandInv,
CONCAT("[",GROUP_CONCAT(CONCAT('"',`tbl_investigations`.`test_id`,'":"',tbl_investigations.`comments`,'"')),"]") AS IntComm,
IF(tbl_questions.LastUpdatedDateTime>MAX(tbl_investigations.`ModifiedAt`),tbl_questions.LastUpdatedDateTime,MAX(tbl_investigations.`ModifiedAt`)) AS LastUpdatedDateTime,
CONCAT('D',user_appoint.`id`) AS UniqueId
FROM user_appoint
LEFT JOIN tbl_investigations ON tbl_investigations.`appt_id`=user_appoint.`id` AND tbl_investigations.`ModifiedAt`>'2011-01-01 00:00:00'
LEFT JOIN tbl_questions ON tbl_questions.`appt_id` =user_appoint.`id` AND tbl_questions.`LastUpdatedDateTime`>'2011-01-01 00:00:00'
GROUP BY user_appoint.`id`
HAVING LastUpdatedDateTime>'2011-01-01 00:00:00'
ORDER BY user_appoint.`u_id`
LIMIT 0, 2000;
user_appoint.u_id is properly indexed.
Please check the explain plan of your query. And its better to always share explain plan with your original question.
explain format=json
SELECT CONCAT('D',user_appoint.id) AS ApptId, user_appoint.u_id,
tbl_questions.CandAns, tbl_questions.ExamAns, tbl_questions.QueNote,
CONCAT("[",GROUP_CONCAT(CONCAT('"',tbl_investigations.test_id,'":"',tbl_investigations.result,'"')),"]")
AS CandInv,
CONCAT("[",GROUP_CONCAT(CONCAT('"',tbl_investigations.test_id,'":"',tbl_investigations.comments,'"')),"]")
AS IntComm,
IF(tbl_questions.LastUpdatedDateTime>MAX(tbl_investigations.ModifiedAt),tbl_questions.LastUpdatedDateTime,MAX(tbl_investigations.ModifiedAt))
AS LastUpdatedDateTime, CONCAT('D',user_appoint.id) AS UniqueId FROM
user_appoint LEFT JOIN tbl_investigations ON
tbl_investigations.appt_id=user_appoint.id AND
tbl_investigations.ModifiedAt>'2011-01-01 00:00:00' LEFT JOIN
tbl_questions ON tbl_questions.appt_id =user_appoint.id AND
tbl_questions.LastUpdatedDateTime>'2011-01-01 00:00:00' GROUP BY
user_appoint.id HAVING LastUpdatedDateTime>'2011-01-01 00:00:00'
ORDER BY user_appoint.u_id LIMIT 0, 2000;
On looking at your query,I could see lot of concat,aggregate function and join is being performed in single query.
These operations will be performed for all 2000 records as you have set limit on query execution.
This might have caused query to slow down its execution.
You have 2 identical columns with different aliases
CONCAT('D',user_appoint.`id`) AS ApptId,
CONCAT('D',user_appoint.`id`) AS UniqueId
(changed) Assuming NULLs may occur in these date columns then comparing the max() values will overcome any adverse impacts by NULL:
if(max(tbl_questions.lastupdateddatetime) > max(tbl_investigations.`modifiedat`) , max(tbl_questions.lastupdateddatetime), max(tbl_investigations.`modifiedat`)) AS LastUpdatedDateTime
Try this:
SELECT *
FROM (
SELECT
Concat('D', user_appoint.`id`) AS ApptId
, user_appoint.`u_id`
, tbl_questions.candans
, tbl_questions.examans
, tbl_questions.quenote
, Concat("[", Group_concat(Concat('"', `tbl_investigations`.`test_id`, '":"', tbl_investigations.`result`, '"')), "]") AS CandInv
, Concat("[", Group_concat(Concat('"', `tbl_investigations`.`test_id`, '":"', tbl_investigations.`comments`, '"')), "]") AS IntComm
, if(max(tbl_questions.lastupdateddatetime) > max(tbl_investigations.`modifiedat`) , max(tbl_questions.lastupdateddatetime), max(tbl_investigations.`modifiedat`) ) AS LastUpdatedDateTime
, Concat('D', user_appoint.`id`) AS UniqueId
FROM user_appoint
LEFT JOIN tbl_investigations
ON tbl_investigations.`appt_id` = user_appoint.`id`
AND tbl_investigations.`modifiedat` > '2011-01-01 00:00:00'
LEFT JOIN tbl_questions
ON tbl_questions.`appt_id` = user_appoint.`id`
AND tbl_questions.`lastupdateddatetime` > '2011-01-01 00:00:00'
GROUP BY user_appoint.`id`
HAVING lastupdateddatetime > '2011-01-01 00:00:00'
) d
ORDER BY `u_id`
LIMIT 0, 2000
;
HOWEVER
You are using a non-current and non-standard form of GROUP BY clause. MySQL started life allowing this bizarre situation where you could select many columns but only group by one of those. This is completely non-standard for SQL.
In recent versions of MySQL the default settings have changed and using just one column in the GROUP BY clause will cause an error.
So, you may have to change the way you perform the grouping to
GROUP BY
user_appoint.`id`
, user_appoint.`u_id`
, tbl_questions.candans
, tbl_questions.examans
, tbl_questions.quenote
If none of these improve performance please provide the execution plan (as text).

How to decrease Query time of this Query with InnoDB engine

I am trying to optimize this query. Now it takes 28 seconds.
AS used to be missing in my query. After adding, query time dropped 20%
SELECT
g.id,
g.adresid,
g.senaryoid,
g.olayid,
g.gonderilecegitarih
FROM
(
SELECT
adresid
FROM
expose2.800_emsenaryolar_emgidenbulten
WHERE
olayid = '3320'
) AS s
RIGHT JOIN expose2.800_emsenaryolar_emgidenbulten AS g ON s.adresid = g.adresid
WHERE
s.adresid IS NULL
AND g.olayid = '2784'
AND g.durum = '1'
AND g.gonderilecegitarih < DATE_SUB(
'2015-05-13 15:40:15',
INTERVAL 1 DAY
)
If you use s.adresid IS NULL condition in subquery it will join faster then more rows ...
SELECT
g.id,
g.adresid,
g.senaryoid,
g.olayid,
g.gonderilecegitarih
FROM (
SELECT adresid FROM expose2.800_emsenaryolar_emgidenbulten WHERE olayid = '3320' and s.adresid IS NULL
) AS s
RIGHT JOIN expose2.800_emsenaryolar_emgidenbulten AS g ON s.adresid = g.adresid
AND g.olayid = '2784'
AND g.durum = '1'
AND g.gonderilecegitarih < DATE_SUB(
'2015-05-13 15:40:15',
INTERVAL 1 DAY
)
still this query optimized using self join.
For added speed, add this composite index to g:
INDEX(olayid, durum, gonderilecegitarih)
Please provide SHOW CREATE TABLE 800_emsenaryolar_emgidenbulten; I want to verify that you also have an index on adresid.

Let me know if i can optimize this query?

does anybody have any suggestions to optimize this query :
SELECT COUNT(client_id) FROM (
SELECT client_id FROM `cdr` WHERE
(DATE(start) BETWEEN '2014-04-21' AND '2014-04-25') AND
`service` = 'test'
GROUP BY client_id
HAVING SUM(duration) > 300
)as t1
The problem is , inner query scans millions of rows and returns thousands of rows and it makes main query lazy.
Thanks.
Try below query, avoid functions in searching as it does not use index and kill the performance. "start" columns must be indexed.
SELECT COUNT(client_id) FROM (
SELECT client_id FROM `cdr` WHERE
(start BETWEEN '2014-04-21 00:00:00' AND '2014-04-25 23:59:59') AND
`service` = 'test'
GROUP BY client_id
HAVING SUM(duration) > 300
)as t1
Why not this?I have read somewhere that comparing dates directly with < or > works faster than Between.
SELECT Count(client_id) FROM `cdr`
WHERE DATE(start) >= '2014-04-21'
AND DATE(start) <= '2014-04-25'
AND `service` = 'test'
GROUP BY client_id
HAVING SUM(duration) > 300
What was logic behind having sub query in your sql?
you can use direct query instead of sub query like as
SELECT COUNT(client_id) FROM `cdr`
WHERE (DATE(start) BETWEEN '2014-04-21' AND '2014-04-25')
AND `service` = 'test'
GROUP BY client_id
HAVING SUM(duration) > 300

MySQL Query optimization with JOIN and COUNT

I have the following MySQL Query:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
The idea being to select a release and count how many other sites had also released it prior to the recorded date, to get the position.
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 498422 Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 91661
2 DERIVED tracers index NULL releaseid 4 NULL 498422
Any suggestions on how to eliminate the Using temporary; Using filesort ? It takes a loooong time. The indexes I have thought of and tried haven't helped anything.
Try adding an index on tracers.releaseid and one on tracers.date
make sure you have an index on releaseid.
flip your JOIN, the sub-query must be on the left side in the LEFT JOIN.
put the ORDER BY and LIMIT clauses inside the sub-query.
Try having two indices, one on (date) and one on (releaseid, date).
Another thing is that your query does not seem to be doing what you describe it does. Does it actually count correctly?
Try rewriting it as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, COUNT(*) AS pos
FROM tracers AS t1
JOIN tracers AS t2
ON t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
GROUP BY t1.releaseid
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
or as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, ( SELECT COUNT(*)
FROM tracers AS t2
WHERE t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
) AS pos
FROM tracers AS t1
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
This answer below maybe not change explain output, however if your major problem is sorting data, which it identified by removing order clause will makes your query run faster, try to sort your subquery join table first and your query will be:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
ORDER BY `pos` DESC -- additional order
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
Note: My db version is mysql-5.0.96-x64, maybe in another version you get different result.

Sub query or Join which is the optimal solution?

i've 2 tables, 1 user table and a table where the
outgoing emails are queued. I want to select the users
that are not online for a certain amount of time
and send them an email. I also want that, if they
already received such an email in the last 7 days
or have an scheduled email for the next 7 days, that
they are not selected.
I have 2 queries, which i think would be great if
they are working with subqueries.
As an area of which i'm not an expert in, i would
like to kindly invite you to either,
Build a subquery of the second query
Make a JOIN and exclude the second query results.
I would be far more then happy :)
Thank you for reading
SELECT
`user_id`
FROM
`user`
WHERE
DATEDIFF( CURRENT_DATE(), date_seen ) >= 7
The results of the second query should be excluded
from the query above.
SELECT
`mail_queue_id`,
`mail_id`,
`user_id`,
`status`,
`date_scheduled`,
`date_processed`
FROM
`mail_queue`
WHERE
(
DATEDIFF( CURRENT_DATE(), date_scheduled ) >= 7
OR
DATEDIFF( date_scheduled, CURRENT_DATE() ) <= 7
)
AND
(
`mail_id` = 'inactive_week'
AND
(
`status` = 'AWAITING'
OR
`status` = 'DELIVERED'
)
)
SOLUTION
SELECT
`user_id`
FROM
`user` as T1
WHERE
DATEDIFF( CURRENT_DATE(), date_seen ) >= 7
AND NOT EXISTS
(
SELECT
`user_id`
FROM
`mail_queue` as T2
WHERE
T2.`user_id` = T1.`user_id`
AND
(
DATEDIFF( CURRENT_DATE(), date_scheduled ) >= 7
OR
DATEDIFF( date_scheduled, CURRENT_DATE() ) <= 7
AND
(
`mail_id` = 'inactive_week'
AND
(
`status` = 'AWAITING'
OR
`status` = 'DELIVERED'
)
)
)
)
YOu can select the users who match the first criterion (not having logged on in the past seven days) and then "AND" that criterion to another clause using "NOT EXISTS", aliasing the same table:
select * from T where {first criterion}
and not exists
(
select * from T as T2 where T2.userid = T.userid
and ABS( DATEDIFF(datescheduled, CURRENT_DATE()) ) <=7
)
I'm not familiar with the nuances of the mysql DATEDIFF, i.e. whether it matters which date value appears in which position, but the absolute value would make it so that if the user had been sent a notice in the past 7 days or is scheduled to receive a notice in the next seven days, they would satisfy the condition, and thereby fail the NOT EXISTS condition, excluding that user from your final set.