I'm struggling to make a query efficient enough. I'm using Doctrine2 ORM (the query is build with QueryBuilder) and part of my query is running very slow - takes about 4s with table of 5000 rows.
This is the relevant part of db schema:
TABLE user
id (primary)
... (plenty of rows, not relevant to the query)
TABLE slot
id (primary)
user_id (foreign for user)
date (datetime)
And this is how my query looks like (it's the basic version, there's a lot of filters to be applied, but these work like fine for now)
SELECT
u.id AS uid,
COUNT(DISTINCT s_order.id) AS sclr_1,
COUNT(DISTINCT s_filter.id) AS sclr_2
FROM
user u
LEFT JOIN slot s_order ON (s_order.user_id = u.id)
LEFT JOIN slot s_filter ON (s_filter.user_id = u.id)
WHERE
(
(
(
s_order.date BETWEEN ?
AND ?
)
AND (
s_filter.date BETWEEN ?
AND ?
)
)
AND (u.deleted_at IS NULL)
)
AND u.userType IN ('2')
GROUP BY
u.id
HAVING
sclr_2 > 0
ORDER BY
sclr_1 DESC
LIMIT
12
Let me explain what I'm trying to achieve here:
I need to filter users who has any slots between 1 week ago and 1 week ahead, then order them by count of slots available between now and 1 week ahead. The part of query causing issues is LEFT JOIN of s_filter and I'm wondering whether perhaps there's a way to improve the performance of that query?
Any help appreciated really, even if it's only plain SQL I'll try to convert it to DQL myself!
#UPDATE
Just an additional info that I forgot, the LIMIT in query is for pagination purposes!
#UPDATE 2
After a while of tweaking the query I figured out that I can use JOIN for filtering instead of LEFT JOIN + COUNT, so my query does look like that now:
SELECT
u.id AS uid, COUNT(DISTINCT s_order.id) AS ordinal
FROM
langu_user u
LEFT JOIN
slot s_order ON (s_order.user_id = u.id) AND s_order.date BETWEEN '2017-02-03 14:03:22' AND '2017-02-10 14:03:22'
JOIN
slot s_filter ON (s_filter.user_id = u.id) AND s_filter.date BETWEEN '2017-01-27 14:03:22' AND '2017-02-10 14:03:22'
WHERE
u.deleted_at IS NULL
AND u.userType IN ('2')
GROUP BY u.id
ORDER BY ordinal DESC
LIMIT 12
And it went down from 4.1-4.3s to 3.6~
Related
I need your help on deciding which query to use since we are facing performance issue with MySQL joins and Subqueries.
The problem is that I'm trying to find out user's 'first order date' while they should fit certain conditions:
order_status = 1(completed) or order_status = 2(canceled)
The Tables are tb_order and tb_user; All the columns that contain a 'time' are using Unix Time Stamp.
The result I need looks like this:
order_id
user_id
user_1st_order_date
1
47
1666876594
2
982
1667095997
Option 1: JOIN
Select
o.id as 'order_id',
u.id as 'user_id',
ox.create_time as 'user_1st_order_date'
from
tb_order o
left join tb_user u on o.user_id = u.id
/* here I have about 10 joins */
left join
(
select
ux.id,
ox.create_time
from
tb_user u
left join tb_order ox on ox.user_id = u.id
where
( ox.order_status = 1 or ox.order_status = 2 )
/* Orders can be (completed) or (canceled) */
group by
ux.id
) x on x.id = u.id
/* The thought here is by using group by `ux.id` I will get the
user's earliest completed or canceled order and it's `create_time`
then this can be used to `join` the order info */
where
o.create_time != 0
and
( o.order_status = 1 or o.order_status = 2 )
group by
o.id
Option 2: Subquery
Select
o.id as 'order_id',
u.id as 'user_id',
(
select
ox.create_time
from
tb_order ox
where
(ox.order_status = 1 or ox.order_status = 2)
and
ox.user_id = u.id
order by
ox.id asc
limit 1
) as 'user_1st_order_date'
from
tb_order o
left join tb_user u on o.user_id = u.id
/* here I have about 10 joins */
where
o.create_time != 0
and
( o.order_status = 1 or o.order_status = 2 )
group by
o.id
/* Option 1 stopped working somehow yesterday and start to give me the latest order time instead, and I don't know why. Though I can get the correct date back by putting 'Min()' in front of the ox.create_time */
left join
(
select
ux.id,
Min(ox.create_time)
Both worked but I'm trying to find the most efficient one since I'll use this on a daily basis to update our data source for Tableau Online.
Many thanks in advance.
Just looking at query 1, you have set out a crazy set of table relationships.
Starting with the Select in parentheses, you have a Left Join that implies there are users without orders. That's OK, but your Where filter is based solely on order status, which is NULL when there is no order, so all such users will be filtered out. There is no useful purpose being served by joining the tb_user table and it can be omitted from that subquery.
In the outer query the Left join of tb_order to tb_user implies there are orders without users, but then joining the subquery using u.id instead of o.userid guarantees that nothing from the subquery will be usable in that case. Once again, there is no purpose served in bring tb_user in there either.
To get the desired result set you set out above, you can vastly simplify things by looking only at the tb_order table like Option 3 below:
Option 3
Select * From (
Select id as 'order_id', user_id as 'user_id'
,min(Case When order_status In (1,2) Then create_time End)
Over (Partition By user_id
Between unbounded preceding And unbounded following)
AS 'user_1st_order_date'
From tb_order
)
Where order_status in (1,2)
Order by order_id
This can be further simplified by moving the Where order_status in (1,2) inside the inner query and removing the Case statement around the created_date, but it's less adaptable to use within other queries.
The goal is to load a list of chats where the user sending the request is a member in. Some of the chats are group chats (more than two members) and there I want to show the profile pictures from the users who wrote the last three messages.
The first query to load meta data like the title and the timestamp of the chat is:
SELECT Chat_Users.ID_Chat, Chats.title, Chats.lastMessageAt
FROM Chat_Users
JOIN Chats ON Chats.ID = Chat_Users.ID_Chat
GROUP BY Chat_Users.ID_Chat
HAVING COUNT(Chat_Users.ID_Chat) = 2
AND MAX(Chat_Users.ID_User = $userID) > 0
ORDER BY Chats.lastMessageAt DESC
LIMIT 20
The query to load the last three profile pictures from one of the chats loaded with the query above is:
SELECT GROUP_CONCAT(innerTable.profilePictures SEPARATOR ', ') AS 'ppUrls',
innerTable.ID_Chat
FROM
(
SELECT Chat_Users.ID_Chat, Users.profilePictureUrl AS profilePictures
FROM Users
JOIN Chat_Users ON Chat_Users.ID_User = Users.ID
JOIN Chat_Messages ON Chat_Messages.ID_Chat = Chat_Users.ID_Chat
WHERE Chat_Users.ID_Chat = $chatID
ORDER BY Chat_Messages.timestamp DESC
LIMIT 3
) innerTable
GROUP BY innerTable.ID_Chat
Both are working separately but I want to merge them together so I don't have to run the second query in a loop due to performance reasons. Unfortunately I have no idea how this can be achieved because the second query needs the $chatID, which it only gets from the first query.
So to clarify the desired result: The list with the profile picture urls (second query) should be just another column in the result of the first query.
I hope it is explained in a reasonably understandable way. Any help would be much appreciated.
Edit: Sample data from the affected tables:
Table "Chats":
Table "Chat_Users":
Table "Chat_Messages":
Table "Users":
This fufils the brief, however it requires a view because MySQL 5.x doesn't support the WITH clause.
It's long and cluncky and I've tried to shorten it but this is as good as I can get, hopefully someone will pop up in the comments with a way to make it shorter!
The view:
CREATE VIEW last_interaction AS
SELECT
id_chat,
id_user,
MAX(timestamp) AS timestamp
FROM chat_messages
GROUP BY id_user, id_chat
The query:
SELECT
Chat_Users.ID_Chat,
Chats.title,
Chats.lastMessageAt,
urls.pps AS profilePictureUrls
FROM Chat_Users
JOIN Chats ON Chats.ID = Chat_Users.ID_Chat
JOIN (
SELECT
lo.id_chat,
GROUP_CONCAT(users.profilePictureUrl) AS pps
FROM last_interaction lo
JOIN users ON users.id = lo.id_user
WHERE (
SELECT COUNT(*) -- the amount of more recent interactions
FROM last_interaction li
WHERE (li.timestamp = lo.timestamp AND li.id_user > lo.id_user)
) < 3
GROUP BY id_chat
) urls ON urls.id_chat = Chats.id
GROUP BY Chat_Users.ID_Chat
HAVING COUNT(Chat_Users.ID_Chat) > 2
AND MAX(Chat_Users.ID_User = $userID)
ORDER BY Chats.lastMessageAt DESC
LIMIT 20
I have a performance issue with the query below on MYSQL. The below query has 5 tables involved. When I apply the order by and limit, the results are retrieved in 0.3 secs. But without the order by and limit, I was able to get the results in 0.01 secs. I am tired changing the query but that did not work. Could someone please help me with this query so I can get the results in desired time (<0.3 secs).
Below are the details.
m_todos = 286579 (records)
m_pat = 214858 (records)
users = 119 (records)
m_programs = 26 (records)
role = 4 (records)
SELECT *
FROM (
SELECT t.*,
mp.name as A_name,
u.first_name, u.last_name,
p.first, p.last, p.zone, p.language,p.handling,
r.name,
u2.first_name AS created_first_name,
u2.last_name AS created_last_name
FROM m_todos t
INNER JOIN role r ON t.role_id=r.id
INNER JOIN m_pat p ON t.patient_id = p.id
LEFT JOIN users u2 ON t.created_id=u2.id
LEFT JOIN m_programs mp ON t.prog_id=mp.id
LEFT JOIN users u ON t.user_id=u.id
WHERE t.role_id !='9'
AND t.completed = '0000-00-00 00:00:00'
) C
ORDER BY priority DESC, due ASC
LIMIT 0,10
Get rid of the outer SELECT; move the ORDER BY and LIMIT in.
Indexes:
t: (completed)
t: (priority, due)
I assume priority and due are in t?? Please be explicit in the query. It could make a huge difference.
If the following works, it should speed things up a lot: Start by finding the t.id without all the JOINs:
SELECT id
FROM m_todos
WHERE role_id !='9'
AND completed = '0000-00-00 00:00:00'
ORDER BY priority DESC, due DESC
LIMIT 10
That will benefit from this covering composite index:
INDEX(completed, role_id, priority, due, id)
Debug that. Then use it in the rest:
SELECT t.*, the-other-stuff
FROM ( that-query ) AS t1
JOIN m_todos AS t USING(id)
then-the-rest-of-the-JOINs
ORDER BY priority DESC, due ASC -- yes, again
If you don't need all of t.*, it may be beneficial to spell out the actual columns needed.
The reason for this to run much faster is that the 10 rows are found efficiently by looking only at the one table. The original code was shoveling around a lot more rows than 10 and they included all the columns of t, plus columns from the other tables.
My version does only 10 lookups for all the extra stuff.
I have been researching this for hours and the best code that I have come up with is this from an example i found on overstack. I have been through several derivations but the following is the only query that returns the correct data, the problem is it takes over 139s (more than 2 minutes) to return only 30 rows of data. Im stuck. (life_p is a 'likes'
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
coalesce(cc.Count, 0) as CommentCount,
coalesce(lc.Count, 0) as LikeCount
FROM logos
left outer join(
select comments.logo_id, count( * ) as Count from comments group by comments.logo_id
) cc on cc.logo_id = logos.id
left outer join(
select life_p.logo_id, count( * ) as Count from life_p group by life_p.logo_id
) lc on lc.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY logos.id
ORDER BY logos.in_gallery desc
LIMIT 0, 30
I'm not sure whats wrong. If i do them singularly meaningremove the coalece and one of the joins:
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
count( * ) as lc
FROM logos
left join life_p on life_p.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY logos.id
ORDER BY logos.in_gallery desc
LIMIT 0, 30
that runs in less than half a sec ( 2-300 ms )....
Here is a link to the explain: https://logopond.com/img/explain.png
MySQL has a peculiar quirk that allows a group by clause that does not list all non-aggregating columns. This is NOT a good thing and you should always specify ALL non-aggregating columns in the group by clause.
Note, when counting over joined tables it is useful to know that the COUNT() function ignores NULLs, so for a LEFT JOIN where NULLs can occur don't use COUNT(*), instead use a column from within the joined table and only rows from that table will be counted. From these points I would suggest the following query structure.
SELECT
logos.id
, logos.in_gallery
, logos.active
, logos.pubpriv
, logos.logo_name
, logos.logo_image
, COALESCE(COUNT(cc.logo_id), 0) AS CommentCount
, COALESCE(COUNT(lc.logo_id), 0) AS LikeCount
FROM logos
LEFT OUTER JOIN comments cc ON cc.logo_id = logos.id
LEFT OUTER JOIN life_p lc ON lc.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY
logos.id
, logos.in_gallery
, logos.active
, logos.pubpriv
, logos.logo_name
, logos.logo_image
ORDER BY logos.in_gallery DESC
LIMIT 0, 30
If you continue to have performance issues then use a execution plan and consider adding indexes to suit.
You can create some indexes on the joining fields:
ALTER TABLE table ADD INDEX idx__tableName__fieldName (field)
In your case will be something like:
ALTER TABLE cc ADD INDEX idx__cc__logo_id (logo_id);
I dont really like it because ive always read that sub queries are bad and that joins perform better under stress, but in this particular case subquery seems to be the only way to pull the correct data in under half a sec consistently. Thanks for the suggestions everyone.
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
(Select COUNT(comments.logo_id) FROM comments
WHERE comments.logo_id = logos.id) AS coms,
(Select COUNT(life_p.logo_id) FROM life_p
WHERE life_p.logo_id = logos.id) AS floats
FROM logos
WHERE logos.active = '1' AND logos.pubpriv = '0'
ORDER BY logos.in_gallery desc
LIMIT ". $start .",". $pageSize ."
Also you can create a mapping tables to speed up your query try:
CREATE TABLE mapping_comments AS
SELECT
comments.logo_id,
count(*) AS Count
FROM
comments
GROUP BY
comments.logo_id
) cc ON cc.logo_id = logos.id
Then change your code
left outer join(
should become
inner join mapping_comments as mp on mp.logo_id =cc.id
Then each time a new comment are added to the cc table you need to update your mapping table OR you can create a stored procedure to do it automatically when your cc table changes
In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.