Can someone optimize this SQL query? - mysql

I am currently working on a project that has 2 very large sql tables Users and UserDocuments having around million and 2-3 millions records respectively. I have a query that will return the count of all the documents that each indvidual user has uploaded provided the document is not rejected.
A user can have multiple documents against his/her id.
My current query:-
SELECT
u.user_id,
u.name,
u.date_registered,
u.phone_no,
t1.docs_count,
t1.last_uploaded_on
FROM
Users u
JOIN(
SELECT user_id,
MAX(updated_at) AS last_uploaded_on,
SUM(CASE WHEN STATUS != 2 THEN 1 ELSE 0 END) AS docs_count
FROM
UserDocuments
WHERE
user_id IN(
SELECT
user_id
FROM
Users
WHERE
region_id = 1 AND city_id = 8 AND user_type = 1 AND user_suspended = 0 AND is_enabled = 1 AND verification_status = -1
) AND document_id IN('1', '2', '3', '4', '10', '11')
GROUP BY
user_id
ORDER BY
user_id ASC
) t1
ON
u.user_id = t1.user_id
WHERE
docs_count < 6 AND region_id = 1 AND city_id = 8 AND user_type = 1 AND user_suspended = 0 AND is_enabled = 1 AND verification_status = -1
LIMIT 1000, 100
Currently the query is taking very long around 20 secs to return data with indexes. can someone suggest some tweaks in the follwing query to gain some more preformance out of it.

SELECT
u.user_id,
max( u.name ) name,
max( u.date_registered ) date_registered,
max( phone_no ) phone_no,
MAX(d.updated_at) last_uploaded_on,
SUM(CASE WHEN d.STATUS != 2
THEN 1 ELSE 0 END) docs_count
FROM
Users u
JOIN UserDocuments d
ON u.user_id = d.user_id
AND d.document_id IN ('1', '2', '3', '4', '10', '11')
WHERE
u.region_id = 1
AND u.city_id = 8
AND u.user_type = 1
AND u.user_suspended = 0
AND u.is_enabled = 1
AND u.verification_status = -1
GROUP BY
u.user_id
HAVING
SUM(CASE WHEN d.STATUS != 2
THEN 1 ELSE 0 END) < 6
ORDER BY
u.user_id ASC
LIMIT
1000, 100
Have indexes on your tables as
user ( region_id, city_id, user_type, user_suspended, is_enabled, verification_status )
UserDocuments ( user_id, document_id, status, updated_at )
You are adding extra querying from the user table to both the inner and outer joins which might be killing it. Having an index on your critical "WHERE" components by user will pre-filter that set out. Only from that will it join to the UserDocuments table. By having the outer query get the counts() at the top level query.
Since the users name, registered and phone dont change per user, applying max() to each respectively prevents the need of adding those columns to the group by clause.
The index on the documents table on only the columns needed to confirm status and document_id and when last updated. This prevents the engine from having to go to the raw data pages as it can get the qualifying details directly from the index parts saving you time too.

LIMIT without ORDER BY does not make sense.
An ORDER BY in a 'derived table' is ignored.
Will you really have thousands of result rows? (I see the "offset of 1000".)
Use JOIN instead of IN ( SELECT ... )
What indexes do you have? I suggest INDEX(region_id, city_id, user_id)
CASE WHEN d.STATUS != 2 THEN 1 ELSE 0 END can be shortened to d.status != 2.
How many different values of status are there? If only two, then flip the test to d.status = 1`.

Related

MySQL - Slow Query when adding multiple derived tables - Optimization

For my query, the two derived tables at the bottom are causing a crazy slow up for this query. The query, as is, takes about 45-55 seconds to execute.. NOW, when i remove just one of those derived tables (it does not matter which one) the query goes down to 0.1 - 0.3 seconds. My questions; Is there an issue with having multiple derived tables? Is there a better way to execute this? My indexes all seem to be correct, I will also include the explain from this query.
select t.name as team, u.name as "REP NAME",
count(distinct activity.id) as "TOTAL VISITS",
count(distinct activity.account_id) as "UNIQUE VISITS",
count(distinct placement.id) as "COMMITMENTS ADDED",
CASE WHEN
count(distinct activity.account_id) = 0 THEN (count(distinct
placement.id) / 1)
else (cast(count(distinct placement.id) as decimal(10,2)) /
cast(count(distinct activity.account_id) as decimal(10,2)))
end as "UNIQUE VISIT TO COMMITMENT %",
case when o.mode='basic' then count(distinct placement.id) else
count(distinct(case when placement.commitmentstatus='fullfilled'
then placement.id else 0 end))
end as "COMMITMENTS FULFILLED",
case when o.mode='basic' then 1 else
(CASE WHEN
count(distinct placement.id) = 0 THEN (count(distinct(case when
placement.commitmentstatus='fullfilled' then placement.id else 0
end)) / 1)
else (cast(count(distinct(case when
placement.commitmentstatus='fullfilled' then placement.id else 0
end)) as decimal(10,2)) / cast(count(distinct placement.id) as
decimal(10,2)))
end) end as "COMMITMENT TO FULFILLMENT %"
from lpmysqldb.users u
left join lpmysqldb.teams t on t.team_id=u.team_id
left join lpmysqldb.organizations o on o.id=t.org_id
left join (select * from lpmysqldb.activity where
org_id='555b918ae4b07b6ac5050852' and completed_at>='2018-05-01' and
completed_at<='2018-06-01' and tag='visit' and accountname is not
null and (status='active' or status='true' or status='1')) as
activity on activity.user_id=u.id
left join (select * from lpmysqldb.placements where
orgid='555b918ae4b07b6ac5050852' and placementdate>='2018-05-01' and
placementdate<='2018-06-01' and (status IN ('1','active','true') or
status is null)) as placement on placement.userid=u.id
where u.org_id='555b918ae4b07b6ac5050852'
and (u.status='active' or u.status='true' or u.status='1')
and istestuser!='1'
group by u.org_id, t.name, u.id, u.name, o.mode
order by count(distinct activity.id) desc
Thank you for assistance!
I have edited below with changing the two bottom joins from joining on subqueries to joining on the table directly. Still yielding the same result.
This is a SLIGHTLY restructured query of your same. Might be simplified as the last two subqueries are all pre-aggregated for your respective counts and count distincts so you can use those column names directly instead of showing all the count( distinct ) embedded throughout the query.
I also tried to simplify the division by multiplying a given count by 1.00 to force decimal-based precision as result.
select
t.name as team,
u.name as "REP NAME",
Activity.DistIdCnt as "TOTAL VISITS",
Activity.UniqAccountCnt as "UNIQUE VISITS",
Placement.DistIdCnt as "COMMITMENTS ADDED",
Placement.DistIdCnt /
CASE WHEN Activity.UniqAccountCnt = 0
THEN 1.00
ELSE Activity.UniqAccountCnt * 1.00
end as "UNIQUE VISIT TO COMMITMENT %",
case when o.mode = 'basic'
then Placement.DistIdCnt
else Placement.DistFulfillCnt
end as "COMMITMENTS FULFILLED",
case when o.mode = 'basic'
then 1
else ( Placement.DistFulfillCnt /
CASE when Placement.DistIdCnt = 0
then 1.00
ELSE Placement.DistIdCnt * 1.00
END TRANSACTION )
END as "COMMITMENT TO FULFILLMENT %"
from
lpmysqldb.users u
left join lpmysqldb.teams t
on u.team_id = t.team_id
left join lpmysqldb.organizations o
on t.org_id = o.id
left join
( select
user_id,
count(*) as AllRecs,
count( distinct id ) DistIdCnt,
count( distinct account_id) as UniqAccountCnt
from
lpmysqldb.activity
where
org_id = '555b918ae4b07b6ac5050852'
and completed_at>='2018-05-01'
and completed_at<='2018-06-01'
and tag='visit'
and accountname is not null
and status IN ( '1', 'active', 'true')
group by
user_id ) activity
on u.id = activity.user_id
left join
( select
userid,
count(*) AllRecs,
count(distinct id) as DistIdCnt,
count(distinct( case when commitmentstatus = 'fullfilled'
then id
else 0 end )) DistFulfillCnt
from
lpmysqldb.placements
where
orgid = '555b918ae4b07b6ac5050852'
and placementdate >= '2018-05-01'
and placementdate <= '2018-06-01'
and ( status is null OR status IN ('1','active','true')
group by
userid ) as placement
on u.id = placement.userid
where
u.org_id = '555b918ae4b07b6ac5050852'
and u.status IN ( 'active', 'true', '1')
and istestuser != '1'
group by
u.org_id,
t.name,
u.id,
u.name,
o.mode
order by
activity.DistIdCnt desc
FINALLY, your inner queries are querying for ALL users. If you have a large count of users that are NOT active, you MIGHT exclude those users from each inner query by adding those join/criteria there too such as...
( ...
from
lpmysqldb.placements
JOIN lpmysqldb.users u2
on placements.userid = u2.id
and u2.status IN ( 'active', 'true', '1')
and u2.istestuser != '1'
where … ) as placement

Alternatives to joins in mysql

I have a query as follows that retrieves the status for different stores in a table and displays it as different columns.
SELECT a.Store_ID,b.total as order_completed,c.total as order_cancelled,d.total as order_processed,e.total as order_failed FROM ORDER_HISTORY a
-> LEFT OUTER JOIN(select Store_ID,count(*) as total from ORDER_HISTORY where Status = 57 group by Store_ID)b on a.Store_ID = b.Store_ID
-> LEFT OUTER JOIN(select Store_ID,count(*) as total from ORDER_HISTORY where Status = 53 group by Store_ID)c on a.Store_ID = c.Store_ID
-> LEFT OUTER JOIN(select Store_ID,count(*) as total from ORDER_HISTORY where Status = 52 group by Store_ID)d on a.Store_ID = d.Store_ID
-> LEFT OUTER JOIN(select Store_ID,count(*) as total from ORDER_HISTORY where Status = 62 group by Store_ID)e on a.Store_ID = e.Store_ID
-> group by a.Store_ID;
Can anybody suggest an alternative to using joins as it affects the performance of db operations.
Create an index on ORDER_HISTORY over (Store_ID, Status), then this should be plenty fast.
SELECT
Store_ID,
status,
COUNT(*) as total
FROM
ORDER_HISTORY
GROUP BY
Store_ID,
status;
Then use your application to display the few resulting rows data in columns. Should not be hard to implement.
Another approach would be (same index as above):
SELECT
Store_ID,
SUM(CASE WHEN Status = 57 THEN 1 ELSE 0 END) AS order_completed,
SUM(CASE WHEN Status = 53 THEN 1 ELSE 0 END) AS order_cancelled,
SUM(CASE WHEN Status = 52 THEN 1 ELSE 0 END) AS order_processed,
SUM(CASE WHEN Status = 62 THEN 1 ELSE 0 END) AS order_processed
FROM
ORDER_HISTORY
GROUP BY
Store_ID;
Replace NULL values as appropriate.
Try using a trigger. Same as a stored procedure that executes when an event occurs within the database. maybe it will help you.

How to split SQL query results into columns based on two WHERE conditions and two calculated COUNT fields?

I have the following (simplified) database schema:
Persons:
[Id] [Name]
-------------------
1 'Peter'
2 'John'
3 'Anna'
Items:
[Id] [ItemName] [ItemStatus]
-------------------
10 'Cake' 1
20 'Dog' 2
ItemDocuments:
[Id] [ItemId] [DocumentName] [Date]
-------------------
101 10 'CakeDocument1' '2016-01-01 00:00:00'
201 20 'DogDocument1' '2016-02-02 00:00:00'
301 10 'CakeDocument2' '2016-03-03 00:00:00'
401 20 'DogDocument2' '2016-04-04 00:00:00'
DocumentProcessors:
[PersonId] [DocumentId]
-------------------
1 101
1 201
2 301
I have also set up an SQL fiddle to play with: http://www.sqlfiddle.com/#!3/e6082
The relation logic is the following: every Person can work on zero or infinite number of ItemDocuments (many-to-many); each ItemDocument belongs to exactly one Item (one-to-many). Item has status 1 - Active, 2 - Closed
What I need is a report that fulfills the following requirements:
for each person in Persons table, display count of Items that have ItemDocuments related to this person
the counts should be split in two columns by ItemStatus
the query should be filterable by two optional date periods (using two BETWEEN conditions on ItemDocuments.Date field) and the Item counts should also be split into two periods
if a Person does not have any ItemDocuments assigned, it still should be shown in the results with all count values set to 0
if a Person has more than one ItemDocument for an Item, the Item still should be counted only once
Essentially, here is how the results should look like if I use both periods to NULL (to read all the data):
[PersonName] [Active Items for period 1] [Closed Items for period 1] [Active Items for period 2] [Closed Items for period 2]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'Peter' 1 1 1 1
'John' 1 0 1 0
'Anna' 0 0 0 0
While I can create an SQL query for each requirement separately, I have a problem to understand how to combine all of them together into one.
For example, I can split ItemStatus counts in two columns using
COUNT(CASE WHEN t.ItemStatus = 1 THEN 1 ELSE NULL END) AS Active,
COUNT(CASE WHEN t.ItemStatus = 2 THEN 1 ELSE NULL END) AS Closed
and I can filter by two periods (with max/min date constants from MS SQL server specification to avoid NULLs for optional period dates) using
between coalesce(#start1, '1753-01-01') and coalesce(#end1, '9999-12-31')
between coalesce(#start2, '1753-01-01') and coalesce(#end2, '9999-12-31')
but how to combine all of this together, considering also JOINs between tables?
Is there any technique, join or MS SQL Server specific approach to do this in efficient way?
My first attempt seems to work as required but it looks like ugly subquery duplications multiple times:
DECLARE #start1 DATETIME, #start2 DATETIME, #end1 DATETIME, #end2 DATETIME
-- SET #start2 = '2017-01-01'
SELECT
p.Name,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Active1,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Closed1,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Active2,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Closed2
FROM Persons p
I'm not absolutely sure if I really got what you want, but you might try this
WITH AllData AS
(
SELECT p.Id AS PersonId
,p.Name AS Person
,id.Date AS DocDate
,id.DocumentName AS DocName
,i.ItemName AS ItemName
,i.ItemStatus AS ItemStatus
,CASE WHEN id.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod1
,CASE WHEN id.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod2
FROM Persons AS p
LEFT JOIN DocumentProcessors AS dp ON p.Id=dp.PersonId
LEFT JOIN ItemDocuments AS id ON dp.DocumentId=id.Id
LEFT JOIN Items AS i ON id.ItemId=i.Id
)
SELECT PersonID
,Person
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ActiveIn1
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ClosedIn1
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ActiveIn2
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ClosedIn2
FROM AllData
GROUP BY PersonID,Person

How can I select the rest of the row with DISTINCT?

I have a table where I keep messages and one where I keep users.
I want to get all the users that interactioned (send or received a message) with user_id 1.
This query works:
http://sqlfiddle.com/#!2/6a2f3/1
EDIT:
SELECT DISTINCT
(CASE WHEN `user_to_id` = 1 THEN `user_from_id` ELSE `user_to_id` END) `user_id`,
users.*
FROM `messages`
INNER JOIN users
ON (CASE WHEN `user_to_id` = 1 THEN `user_from_id` ELSE `user_to_id` END) = users.user_id
WHERE `user_to_id` = 1 OR `user_from_id` = 1
ORDER BY `time` DESC
But if I add to SELECT the message column, it returns duplicate records:
http://sqlfiddle.com/#!2/6a2f3/2
EDIT:
SELECT DISTINCT
(CASE WHEN `user_to_id` = 1 THEN `user_from_id` ELSE `user_to_id` END) `user_id`,
`messages`.`message`,
users.*
FROM `messages`
INNER JOIN users
ON (CASE WHEN `user_to_id` = 1 THEN `user_from_id` ELSE `user_to_id` END) = users.user_id
WHERE `user_to_id` = 1 OR `user_from_id` = 1
ORDER BY `time` DESC
How can I fix that?
And also, I see that it orders the results after the "DISTINCT" selection was made. The first query should return the results inverted because the row with message_id 2 has time 3.
Is there a way I can order them before the "DISTINCT"?
EDIT 2: I wasn't clear about the question. I want to select only the last message for a matched user_id.
Do you want something like this?
SELECT *
FROM (
SELECT
users.*,
(SELECT `message` from messages
WHERE
(CASE WHEN `user_to_id` = 1 THEN `user_from_id` ELSE `user_to_id` END) = users.user_id
AND (`user_to_id` = 1 OR `user_from_id` = 1)
ORDER BY `time` DESC limit 1
) AS message
FROM users
) a
WHERE message IS NOT NULL
SQL Fiddle
It's not returning duplicate records, you have two records with User_ID = 2.
I'm confused by what you want them to be ordered by. If you want to order them in the inverted order, just remove 'DESC'

MySQL select subqueries

This is what I have at the moment.
$db =& JFactory::getDBO();
$query = $db->getQuery(true);
$query->select('`#__catalog_commit`.`id` as id, `#__catalog_commit`.`date` as date, COUNT(`#__catalog_commit_message`.`commit_id`) as count,
(SELECT COUNT(`#__catalog_commit_message`.`type`) as count_notice FROM `#__catalog_commit_message` WHERE `#__catalog_commit_message`.`type` = 1 GROUP BY `#__catalog_commit_message`.`type`) as count_notice,
(SELECT COUNT(`#__catalog_commit_message`.`type`) as count_warning FROM `#__catalog_commit_message` WHERE `#__catalog_commit_message`.`type` = 2 GROUP BY `#__catalog_commit_message`.`type`) as count_warning,
(SELECT COUNT(`#__catalog_commit_message`.`type`) as count_error FROM `#__catalog_commit_message` WHERE `#__catalog_commit_message`.`type` = 3 GROUP BY `#__catalog_commit_message`.`type`) as count_error');
$query->from('#__catalog_commit_message');
$query->leftjoin('`#__catalog_commit` ON `#__catalog_commit`.`id` = `#__catalog_commit_message`.`commit_id`');
$query->group('`#__catalog_commit_message`.`commit_id`');
$query->order('`#__catalog_commit`.`id` DESC');
What I have is 2 tables with the following structures:
catalog_commit
==============
id
date
catalog_commit_message
======================
id
commit_id
type
message
Basically I want to have the count of each different types of messages per group items. In what I have it actually select every rows (Which is normal) but I'm looking for a way (nicier if possible) to have the count per messages type within the query.
EDIT: Just wanted to add that it's a JModelList.
From what I gather, this should be your query:
SELECT c.id
,c.date
,count(cm.commit_id) as ct_total
,sum(CASE WHEN cm.type = 1 THEN 1 ELSE 0 END) AS count_notice
,sum(CASE WHEN cm.type = 2 THEN 1 ELSE 0 END) AS count_warning
,sum(CASE WHEN cm.type = 3 THEN 1 ELSE 0 END) AS count_error
FROM catalog_commit c
LEFT JOIN catalog_commit_message cm ON cm.commit_id = c.id
GROUP BY c.id, c.date
ORDER BY c.id DESC
You had the order of your tables reversed in the LEFT JOIN. Also, you had weird subqueries in the SELECT list.