Select statement can trigger dead lock on table in mysql? - mysql

The SQL below is inside a MySQL stored procedure.
The procedure run by a cron job every day once at midnight to populate report table with result.
this procedure take around 2 min to run.
please note that table1 has millions of records.
i put this to run at midnight because there are INSERT/UPDATE transactions during the day but unfortunately there are some few transaction at night also.
when this procedure runs and if there are other transactions running then a deadlock error on table1 occurs.
my question is
why SELECT statement cause deadlock on table1?
how can I avoid deadlock in this kind of situation?
DROP report;
CREATE TABLE IF NOT EXISTS report AS (
SELECT
DISTINCT
companies.id company_id,
(
SELECT
SUM(`message_count`) single_phone
FROM
`table1`
WHERE
`table1`.`company_id` = companies.id
AND
`status` != 'error'
) AS single_phone,
(
SELECT
SUM(`message_count`)
FROM
`table1`
WHERE
`table1`.`company_id` = companies.id
AND
`status` != 'not error'
) AS log,
(
SELECT
SUM(`message_count`)
FROM
`table1`
WHERE
`table1`.`company_id` = companies.id
AND
`status` != 'error'
) AS log_monthly,
(
SELECT
SUM(`number_of_sms`) AS aggregate
FROM
`messages`
WHERE
`messages`.`company_id` = companies.id
) AS p_monthly
FROM
companies
INNER JOIN company_users ON companies.id = company_users.company_id
WHERE
company_users.confirmed = 1
AND
company_users.deleted_at IS NULL
);

thanks you very much for help but i have found the problem. yes this procedure cause the deadlock on table but the actual cause of the issue is that i have put ->everyMinute() in my laravel Kernal for schedule run. and there is also a cron job configured by another developer for the same that run every minute. these will run schedule every minute and that is the real cause of the deadlock problem. i have change my Kernal schedule to ->dailyAt('02:00'); now the problem is solved.

Your field-level queries should be done ONCE in the from clause to get pre-aggregates done ONCE per company ID and left-joined in case a given company may NOT have qualified records in a given category. Additionally, your query to get Single_Phone is the same as your 'log_monthly', but have no criteria showing a
break or filter on the dates of activity to filter out a single month vs overall total of everything. So, I added a where clause for filtering, but only GUESSING if such some date exists.
This query might substantially improve your performance. By moving the COLUMN-based queries for every company ID into its own subquery via left-join, those will be summed() and grouped by company ONCE, then the JOIN for the final result. COALESCE() is used so if no such counts exists, the value returned will be 0 instead of null
DROP report;
CREATE TABLE IF NOT EXISTS report AS (
SELECT
c.id company_id,
coalesce( PhoneSum.Msgs, 0 ) as Single_Phone,
coalesce( PhoneLog.Msgs, 0 ) as Log,
coalesce( MonthLog.Msgs, 0 ) as Log_Monthly,
coalesce( SMSSummary.Aggregate, 0 ) as p_monthly
from
-- this will declare an in-line variable if you do need to filter by a month as a couple of your
-- column result names infer, but have no other indicator of filtering by a given month.
( select #yesterday := date_sub( date(curdate()), interval -1 day ),
#beginOfThatMonth := date_sub( #yesterday, interval dayOfMonth( #yesterday ) -1 day ) sqlvars,
companies c
INNER JOIN company_users cu
ON m.company.id = cu.company_id
AND cu.confirmed = 1
AND cu.deleted_at IS NULL
LEFT JOIN
( SELECT
t.company_id,
SUM( t.message_count ) Msgs
FROM
table1 t
INNER JOIN company_users cu
ON t.company.id = cu.company_id
AND cu.confirmed = 1
AND cu.deleted_at IS NULL
where
t.status != 'error'
GROUP BY
t.company_id ) AS PhoneSum,
on c.id = PhoneSum.company_id
LEFT JOIN
( SELECT
t.company_id,
SUM( t.message_count ) Msgs
FROM
table1 t
INNER JOIN company_users cu
ON t.company.id = cu.company_id
AND cu.confirmed = 1
AND cu.deleted_at IS NULL
where
t.status != 'not error'
GROUP BY
t.company_id ) AS PhoneLog,
on c.id = PhoneLog.company_id
LEFT JOIN
( SELECT
t.company_id,
SUM( t.message_count ) Msgs
FROM
table1 t
INNER JOIN company_users cu
ON t.company.id = cu.company_id
AND cu.confirmed = 1
AND cu.deleted_at IS NULL
where
t.status != 'error'
-- this would only get counts of activity for current month currently active
-- but since you are running at night, you need the day before current
AND t.SomeDateFieldOnTable1 >= #beginOfThatMonth
GROUP BY
t.company_id ) AS MonthLogMsgs,
on c.id = MonthLogMsgs.company_id
LEFT JOIN
( SELECT
m.company_id,
SUM( m.number_of_sms ) aggregate
FROM
messages m
INNER JOIN company_users cu
ON m.company.id = cu.company_id
AND cu.confirmed = 1
AND cu.deleted_at IS NULL
where
m.SomeDateFieldOnMessagesTable >= #beginOfThatMonth
GROUP BY
company_id ) AS SMSSummary,
on c.id = SMSSummary.company_id

Related

GroupBy clause removing all null column values

I have written the following query wherein I am usig groupby clause on server column
select s.server, MAX(s.ipAddress) as ipAddress,
MAX(r.stacks->>"$[0].name") as stackName,
MAX(a.aMessage) as aMessage
from environments e
inner join servers s
on e.objectId = s.environmentId
inner join resources r
on e.objectId = r.environmentId
inner join audits a
on a.id = (select max(a.id) from audits a where a.logObjId = s.cAudit)
WHERE dateSubmitted BETWEEN NOW() - INTERVAL 90 DAY AND NOW()
Group by s.server
ORDER BY dateSubmitted;
Howerver, server column may have NULL values with a valid ipAddress and stackName.
How to modify the query so that all NULL server column values are not missed out.
Expected Sample Data:
server ipAddress stackName aMessage
NULL NULL Stack A Searching for IP pool
NULL NULL Stack B Message XYZ
NULL NULL Stack A Message ABC
It seems the INNER JOIN used to JOIN the table makes NULL value to removed from the result. So just modified the query. Try this one and see if you are able to see all the data of Server table so that NULL data also will come for Server column.
select s.server, MAX(s.ipAddress) as ipAddress,
MAX(r.stacks->>"$[0].name") as stackName,
MAX(a.aMessage) as aMessage
from servers s
left join environments e
on e.objectId = s.environmentId
left join resources r
on e.objectId = r.environmentId
left join audits a
on a.id = (select max(a.id) from audits a where a.logObjId = s.cAudit)
WHERE dateSubmitted BETWEEN NOW() - INTERVAL 90 DAY AND NOW()
Group by s.server
ORDER BY dateSubmitted;

Improving the performance of sql joined count query

In my application the users can create campaigns for sending messages. When the campaign tries to send a message, one of the three things can happen:
The message is suppressed and not let through
The message can't reach the recipient and is considered failed
The message is successfully delivered
To keep track of this, I have the following table:
My problem is that when the application has processed a lot of messages (more than 10 million), the query I use for showing campaign statistics for the user slows down by a considerable margin (~ 15 seconds), even when there are only a few (~ 10) campaigns being displayed for the user.
Here is the query I'm using:
select `campaigns`.*, (select count(*) from `processed_messages`
where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'sent') as `messages_sent`,
(select count(*) from `processed_messages` where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'failed') as `messages_failed`,
(select count(*) from `processed_messages` where `campaigns`.`id` = `processed_messages`.`campaign_id` and `status` = 'supressed') as `messages_supressed`
from `campaigns` where `user_id` = 1 and `campaigns`.`deleted_at` is null order by `updated_at` desc;
So my question is: how can I make this query run faster? I believe there should be some way of not having to use sub-queries multiple times but I am not very experienced with MySQL syntax yet.
You should write this as a single join, using conditional aggregation:
SELECT
c.*,
COUNT(CASE WHEN pm.status = 'sent' THEN 1 END) AS messages_sent,
COUNT(CASE WHEN pm.status = 'failed' THEN 1 END) AS messages_failed,
COUNT(CASE WHEN pm.status = 'suppressed' THEN 1 END) AS messages_suppressed
FROM campaigns c
LEFT JOIN processed_messages pm
ON c.id = pm.campaign_id
WHERE
c.user_id = 1 AND
c.deleted_at IS NULL
GROUP BY
c.id
ORDER BY
c.updated_at DESC;
It should be noted that at first glance, doing SELECT c.* appears to be a violation of the GROUP BY rules which say that only columns which appear in the GROUP BY clause can be selected. However, assuming that campaigns.id is the primary key column, then there is nothing wrong with selecting all columns from this table, provided that we aggregate by the primary key.
Edit:
If the above answer does not run on your MySQL server version, with an error message complaining about only full group by, then use this version:
SELECT c1.*, c2.messages_sent, c2.messages_failed, c2.message_suppressed
FROM campaigns c1
INNER JOIN
(
SELECT
c.id
COUNT(CASE WHEN pm.status = 'sent' THEN 1 END) AS messages_sent,
COUNT(CASE WHEN pm.status = 'failed' THEN 1 END) AS messages_failed,
COUNT(CASE WHEN pm.status = 'suppressed' THEN 1 END) AS messages_suppressed
FROM campaigns c
LEFT JOIN processed_messages pm
ON c.id = pm.campaign_id
WHERE
c.user_id = 1 AND
c.deleted_at IS NULL
GROUP BY
c.id
) c2
ON c1.id = c2.id
ORDER BY
c2.updated_at DESC;

SQL request optimization

I have an SQL request that take 100% of my VM CPU while it's working. I wanna know how to optimize it :
SELECT g.name AS hostgroup
, h.name AS hostname
, a.host_id
, s.display_name AS servicename
, a.service_id
, a.entry_time AS ack_time
, ( SELECT ctime
FROM logs
WHERE logs.host_id = a.host_id
AND logs.service_id = a.service_id
AND logs.ctime < a.entry_time
AND logs.status IN (1, 2, 3)
AND logs.type = 1
ORDER BY logs.log_id DESC
LIMIT 1) AS start_time
, ar.acl_res_name AS timeperiod
, a.state AS state
, a.author
, a.acknowledgement_id AS ack_id
FROM centstorage.acknowledgements a
LEFT JOIN centstorage.hosts h ON a.host_id = h.host_id
LEFT JOIN centstorage.services s ON a.service_id = s.service_id
LEFT JOIN centstorage.hosts_hostgroups p ON a.host_id = p.host_id
LEFT JOIN centstorage.hostgroups g ON g.hostgroup_id = p.hostgroup_id
LEFT JOIN centreon.hostgroup_relation hg ON a.host_id = hg.host_host_id
LEFT JOIN centreon.acl_resources_hg_relations hh ON hg.hostgroup_hg_id = hh.hg_hg_id
LEFT JOIN centreon.acl_resources ar ON hh.acl_res_id = ar.acl_res_id
WHERE ar.acl_res_name != 'All Resources'
AND YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id is not null
ORDER BY a.acknowledgement_id ASC
The problem is at this part :
(SELECT ctime FROM logs
WHERE logs.host_id = a.host_id
AND logs.service_id = a.service_id
AND logs.ctime < a.entry_time
AND logs.status IN (1, 2, 3)
AND logs.type = 1
ORDER BY logs.log_id DESC
LIMIT 1) AS start_time
The table logs is really huge and some friends told me to use a buffer table/database but i pretty knew to this things and i don't know how to do it.
There is an EXPLAIN EXTENDED of the query :
It seems that he will examined only 2 row of the table logs so why it takes so much time ? (There is 560000 row in the table logs).
Here is all indexes of those tables :
centstorage.acknowledgements :
centstorage.hosts :
centstorage.services :
centstorage.hosts_hostgroups :
centstorage.hostgroups :
centreon.hostgroup_relation :
centreon.acl_resources_hg_relations :
centreon.acl_resources :
For SQL Server there is the possibility to define the maximum degree of parallelism of your query using MAXDOP
For example you can define at the end of your query
option (maxdop 2)
I'm pretty sure there's an equivalent in MySql.
You can try to approach this situation if the execution time is not relevant.
Create a Temporary Table from where condition for acknowledgements, schema will have column required in final result and used in JOIN with all your 7 tables
CREATE TEMPORARY TABLE __tempacknowledgements AS SELECT g.name AS hostgroup
, '' AS hostname
, a.host_id
, s.display_name AS servicename
, a.service_id
, a.entry_time AS ack_time
, '' AS AS start_time
, '' AS timeperiod
, a.state AS state
, a.author
, a.acknowledgement_id AS ack_id
FROM centstorage.acknowledgements a
WHERE YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id IS NOT NULL
ORDER BY a.acknowledgement_id ASC;
Or create using proper column definition
Update fields from all tables having left join, you can use Inner Join in update. You should write 7 different update statements. 2 examples are given below.
UPDATE __tempacknowledgements a JOIN centstorage.hosts h USING(host_id)
SET a.name=h.name;
UPDATE __tempacknowledgements s JOIN centstorage.services h USING(service_id)
SET a.acl_res_name=s.acl_res_name;
similar way update ctime from logs using Join with Logs, this is 8th update statement.
pick select from temp table.
drop temp table
a sp can be written for this.
Turn LEFT JOIN into JOIN unless you have a real need for LEFT.
AND YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id is not null
Do you have any rows with a.service_id is not null? If not, get rid of it.
As already mentioned, that date comparison does not optimize. Here is what to use instead:
AND a.entry_time >= CONCAT(LEFT(CURDATE(), 7), '-01')
AND a.entry_time < CONCAT(LEFT(CURDATE(), 7), '-01') + INTERVAL 1 MONTH
And add one of these (depending on my above comment):
INDEX(entry_time)
INDEX(service_id, entry_time)
The correlated subquery is hard to optimize. This index (on logs) may help:
INDEX(type, host_id, service_id, status)
WHERE IN is time killer!
Instead of
logs.status IN (1, 2, 3)
use
logs.status=1 or logs.status=2 or logs.status=3
I have SLIGHTLY reformatted the query for my readability reference and better seeing the relations between the tables... otherwise ignore that part.
SELECT
g.name AS hostgroup,
h.name AS hostname,
a.host_id,
s.display_name AS servicename,
a.service_id,
a.entry_time AS ack_time,
( SELECT
ctime
FROM
logs
WHERE
logs.host_id = a.host_id
AND logs.service_id = a.service_id
AND logs.ctime < a.entry_time
AND logs.status IN (1, 2, 3)
AND logs.type = 1
ORDER BY
logs.log_id DESC
LIMIT 1) AS start_time,
ar.acl_res_name AS timeperiod,
a.state AS state,
a.author,
a.acknowledgement_id AS ack_id
FROM
centstorage.acknowledgements a
LEFT JOIN centstorage.hosts h
ON a.host_id = h.host_id
LEFT JOIN centstorage.services s
ON a.service_id = s.service_id
LEFT JOIN centstorage.hosts_hostgroups p
ON a.host_id = p.host_id
LEFT JOIN centstorage.hostgroups g
ON p.hostgroup_id = g.hostgroup_id
LEFT JOIN centreon.hostgroup_relation hg
ON a.host_id = hg.host_host_id
LEFT JOIN centreon.acl_resources_hg_relations hh
ON hg.hostgroup_hg_id = hh.hg_hg_id
LEFT JOIN centreon.acl_resources ar
ON hh.acl_res_id = ar.acl_res_id
WHERE
ar.acl_res_name != 'All Resources'
AND YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id is not null
ORDER BY
a.acknowledgement_id ASC
I would first recommend starting with your "acknowledgements" table and have an index at a minimum of ( entry_time, acknowledgement_id ). Next, update your WHERE clause. Because you are running a function to convert the unix timestamp to a date and grabbing the YEAR (and month) respectively, I don't believe it is utilizing the index as it has to compute that for every row. To eleviate that, a unix timestamp is nothing but a number representing seconds from a specifc point in time. If you are looking for a specific month, then pre-compute the starting and ending unix times and run for that range. Something like...
and a.entry_time >= UNIX_TIMESTAMP( '2015-10-01' )
and a.entry_time < UNIX_TIMESTAMP( '2015-11-01' )
This way, it accounts for all seconds within the month up to 11:59:59 on Oct 31, just before November 1st.
Then, without my glasses to see all the images more clearly, and short time this morning, I would ensure you have at least the following indexes on each table respectively
table index
logs ( host_id, service_id, type, status, ctime, log_id )
acknowledgements ( entry_time, acknowledgement_id, host_id, service_id )
hosts ( host_id, name )
services ( service_id, display_name )
hosts_hostgroups ( host_id, hostgroup_id )
hostgroups ( hostgroup_id, name )
hostgroup_relation ( host_host_id, hostgroup_hg_id )
acl_resources_hg_relations ( hh_hg_id, acl_res_id )
acl_resources ar ( acl_res_id, acl_res_name )
Finally, your correlated sub-query field is going to be a killer as it is processed for every row, but hopefully the other index optimization ideas will help performance.

mysql join query that needs to return negative set

Suppose i have 2 tables for user & task. have user_id & status in task with status having possible values "complete" & "not complete".
Now I want to retrieve users who have not completed even 1 task.
The most crude way is to first find users who have atleast 1 complete task and run a "not in" query.
Any better ways to achieve this without an "in" subquery. Please note that the data set is huge and i can't afford to have a lock on the task table for long time!
SELECT *
FROM users u
WHERE NOT EXISTS (
SELECT * FROM tasks t
WHERE t.user_id = u.user_id
AND t.status = 'complete'
);
When task.user_id cannot contain NULL (i.e. has a NOT NULL constraint), LEFT JOIN with IS NULL is your best choice:
SELECT user.* FROM user
LEFT JOIN task ON (task.user_id = user.id AND task.status = 'complete')
WHERE task.user_id IS NULL
Try this one below query will list the users that have completed even 1 task or more than 1
SELECT u.*,
COUNT( CASE WHEN t.`status`= 'Completed' THEN t.`status` END ) AS completed ,
COUNT( CASE WHEN t.`status`= 'Not completed' THEN t.`status` END ) AS Not_completed
FROM `user` u
LEFT JOIN `task` t ON (u.id =t.user_id)
GROUP BY t.user_id HAVING completed >0
And this will list the users those who haven't completed even 1 task
SELECT u.*,
COUNT( CASE WHEN t.`status`= 'Completed' THEN t.`status` END ) AS completed ,
COUNT( CASE WHEN t.`status`= 'Not completed' THEN t.`status` END ) AS Not_completed
FROM `user` u
LEFT JOIN `task` t ON (u.id =t.user_id)
GROUP BY t.user_id HAVING completed = 0
See fiddle for task completed users
See fiddle for users that have not completed task

Slow MySQL query with subquery from table

I am trying to bring back a string based on an IF statement but it is extremely slow.
It has something to do with the first subquery but I am unsure of how to rearrange this as to bring back the same results but faster.
Here is my SQL:
SELECT IF
(
(
SELECT COUNT(*)
FROM
(
SELECT DISTINCT enquiryId, type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id
) AS parts
WHERE parts.enquiryId = enquiries.id
) > 1, 'Mixed',
(
SELECT DISTINCT type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id AND enquiryId = enquiries.id
)
) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
How can I make it faster?
I have modified my original query below, but I am getting the error that subquery returns more than one row:
SELECT
(SELECT
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
Please have a look if this query yields the same results:
SELECT
enquiryId,
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId
But N.B.'s comment is still valid. To see if and index is used and other information we need to see the EXPLAIN and the table definitions.
This should get you what you want.
I would first pre-query your parts enquiries and parts service types looking for both the count and MINIMUM of the part 'type', grouped by the enquiry ID.
then, run your IF() against that result. If the distinct count is > 0, then 'Mixed'. If only one, since I did the MIN(), it would only have the description of that one value that you desire anyhow.
SELECT
E.ID
IF ( PreQuery.DistTypes > 1, 'Mixed', PreQuery.FirstType ) as PartType
from
Enquiries E
JOIN ( SELECT
PE.EnquiryID,
COUNT( DISTINCT PE.ServiceTypeID ) as DistTypes,
MIN( PST.Type ) as FirstType
from
Parts_Enquiries PE
JOIN Parts_Service_Types PST
ON PE.ServiceTypeID = PST.ID
group by
PE.EnquiryID ) as PreQuery
ON E.ID = PreQuery.EnquiryID