I have this users table with:
id : int (255)
name: char (100)
last_comment_target: int(100)
last_comment_date: datetime
This table has around 1.3mil rows.
PKEY and BTREE is on id, last_comment_target, and last_comment_date.
And, I am trying to perform a range query:
SELECT * FROM users
WHERE id IN (1,2,3,5,...[around 5000 ids])
AND last_comment_target > 0
ORDER BY last_comment_dt DESC LIMIT 0,20;
Sometimes the query can take as long as 3 seconds. I wonder if there are better ways to optimize this query. Or, if this query can be rewritten.
Thank you so much for your help.
SELECT u.*
FROM
users u
JOIN (
SELECT 1 id
UNION ALL
SELECT 2 id
UNION ALL
:
:
SELECT 5000 id
) ids ON ids.id = u.id
WHERE
last_comment_target > 0
ORDER BY
last_comment_dt DESC
LIMIT 0, 20;
Thanks everyone that has contributed.
#Karolis seems to point out that an alternative using join instead of range
So, basically:
SELECT * FROM users WHERE id IN (1,2,3,...[5000 ids]) AND last_comment_target > 0
yields in EXPLAIN statement a type of RANGE. The 5000 ids can be generated from another table.
When I switched the above to:
SELECT *
FROM users u
INNER JOIN user_friends uf ON u.id = uf.to_id
AND u.last_comment_target > 0
AND uf.from_id = [id];
It yields in EXPLAIN statement two types: ref and eq_ref which is faster than range in this query.
The query execution is reduced from 3+ seconds to around 0.2x seconds.
So, lesson learned from my end: TRY to use JOIN instead of RANGE if you have a table that you can derive from.
Related
I have a query that is slow... i want to display the last 12 newest members near me(near the logged user) and my dev database has 150k rows.
It took over 1 second and the explain query tells me that 30k rows are filtered
So 30k filtered for 150k rows in my developpment DB... my server online is much bigger thant this....
Here my query :
SELECT profils.*,
Users.username,
( SELECT count(*)
from profilsphotos pp
where pp.iduser=Profils.iduser
) as nbpics,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)),
2)), (SIN(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)))
) * 6372.795 AS distance
from Users
inner join Profils ON Users.id=Profils.iduser
where Profils.Actif=1
and profils.idsexe=2
and profils.idlookingfor=1
and Profils.iduser<>1
HAVING distance<400
order by Users.id desc, distance asc
limit 12
Note that i add an index on those four fields: actif,idsexe,idlookingfor and iduser
What wrong with my query ?
Thanks a lot !
Pascal
I would extract the subquery from the SELECT clause to a temporary table, index it and join to it, instead of executing it for every record in the select clause (30K times).
So the steps are: create a temp table, index it, run the optimized query.
First, create the relevant indexes for the query:
ALTER TABLE
`Profils`
ADD
INDEX `profils_idx_actif_iduser` (`Actif`, `iduser`);
ALTER TABLE
`Users`
ADD
INDEX `users_idx_id_username` (`id`, `username`);
ALTER TABLE
`profils`
ADD
INDEX `profils_idx_idsexe_idlookingfor` (`idsexe`, `idlookingfor`);
ALTER TABLE
`profilsphotos`
ADD
INDEX `profilsphotos_idx_iduser` (`iduser`);
Now, create the temp table and index it:
-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
count(*) AS nbpics,
iduser
FROM
profilsphotos pp
WHERE
1 = 1
GROUP BY
iduser
ORDER BY
NULL;
ALTER TABLE
`temp1`
ADD
INDEX `temp1_idx_iduser_nbpics` (`iduser`, `nbpics`);
Now try to run this query instead of the original one and see if it runs faster:
SELECT
optimizedSub1.*,
temp1.nbpics
FROM
(SELECT
Users.username,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2)),
(SIN(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)))) * 6372.795 AS distance
FROM
Users
INNER JOIN
Profils
ON Users.id = Profils.iduser
WHERE
Profils.Actif = 1
AND profils.idsexe = 2
AND profils.idlookingfor = 1
AND Profils.iduser <> 1
HAVING
distance < 400
ORDER BY
Users.id DESC,
distance ASC LIMIT 12) AS optimizedSub1
LEFT JOIN
temp1
ON temp1.iduser = optimizedSub1.iduser
Profils needs
INDEX(Actif, idsexe, idlookingfor) -- in any order
Perhaps distance should be first?..
order by Users.id desc, distance asc
What is Y(gm_coor)? If is a Stored Function, we need to know more. What table has gm_coor? After that, maybe we can discuss a "bounding box" as a partial speedup.
Make another nesting of SELECTs and move the computation of nbpics to it. Currently, the COUNT(*) is being performed 30K times. After the change, it will be only 12 times.
Reformulation
SELECT p2.*,
u.username,
( SELECT COUNT(*)
FROM profilsphotos pp
where pp.iduser = p2.iduser
) as nbpics,
x.distance
FROM
( SELECT p1.id, -- assuming this the PK of Profils
(...) AS distance
FROM Profils AS p1
WHERE p1.Actif=1
and p1.idsexe=2
and p1.idlookingfor=1
and p1.iduser<>1
HAVING distance < 400
ORDER BY distance
LIMIT 12
) AS x
JOIN profils AS p2 USING(id)
JOIN Users AS u ON u.id = p2.iduser;
I'm working on a project involving words and its translations. One of the queries a translator must task frequently (once every 10 sec or so) is:
SELECT * FROM pwords p
LEFT JOIN words w ON p.id = w.wordid
WHERE w.code IS NULL
OR (w.code <> "USER1" AND w.code <> "USER2")
ORDER BY rand() LIMIT 10
To receive a word to be translated, which the user has not translated already. In this case we want to disallow words input by USER2
The pwords table has around 66k entries and the words table has around 55k entries.
This query takes about 500 seconds to complete, whereas if I remove the IS NULL the query takes 0.0245 ms. My question here is: is there a way to optimize this query? I really need to squeeze the numbers.
The scenario is: USER1 does not want any database entries from USER2 in the words table. It does not want it's own database entries from the same table. Therefore I need to have the IS NULL or a similar method to get entries from all users except USER1 and USER2, either from other users or NULL entries.
tl;dr So my question is: is there a way to make this query run faster? Is "IS NULL" optimizable?
Any and all help is greatly appreciated.
You can try using subquery in order to filter the rows (use WHERE statement) as soon as possible:
SELECT *
FROM pwords p
LEFT JOIN
(SELECT *
FROM words w
WHERE (w.code <> "USER1" AND w.code <> "USER2")) subq
ON p.id = subq.wordid
WHERE w.code IS NULL
ORDER BY rand() LIMIT 10
another (and maybe a more efficient option) is using NOT EXISTS statement:
SELECT *
FROM pwords p
WHERE NOT EXISTS
(SELECT *
FROM words w
WHERE p.id=w.wordid AND (w.code <> "USER1" AND w.code <> "USER2"))
ORDER BY rand() LIMIT 10
I'm struggling to make a query efficient enough. I'm using Doctrine2 ORM (the query is build with QueryBuilder) and part of my query is running very slow - takes about 4s with table of 5000 rows.
This is the relevant part of db schema:
TABLE user
id (primary)
... (plenty of rows, not relevant to the query)
TABLE slot
id (primary)
user_id (foreign for user)
date (datetime)
And this is how my query looks like (it's the basic version, there's a lot of filters to be applied, but these work like fine for now)
SELECT
u.id AS uid,
COUNT(DISTINCT s_order.id) AS sclr_1,
COUNT(DISTINCT s_filter.id) AS sclr_2
FROM
user u
LEFT JOIN slot s_order ON (s_order.user_id = u.id)
LEFT JOIN slot s_filter ON (s_filter.user_id = u.id)
WHERE
(
(
(
s_order.date BETWEEN ?
AND ?
)
AND (
s_filter.date BETWEEN ?
AND ?
)
)
AND (u.deleted_at IS NULL)
)
AND u.userType IN ('2')
GROUP BY
u.id
HAVING
sclr_2 > 0
ORDER BY
sclr_1 DESC
LIMIT
12
Let me explain what I'm trying to achieve here:
I need to filter users who has any slots between 1 week ago and 1 week ahead, then order them by count of slots available between now and 1 week ahead. The part of query causing issues is LEFT JOIN of s_filter and I'm wondering whether perhaps there's a way to improve the performance of that query?
Any help appreciated really, even if it's only plain SQL I'll try to convert it to DQL myself!
#UPDATE
Just an additional info that I forgot, the LIMIT in query is for pagination purposes!
#UPDATE 2
After a while of tweaking the query I figured out that I can use JOIN for filtering instead of LEFT JOIN + COUNT, so my query does look like that now:
SELECT
u.id AS uid, COUNT(DISTINCT s_order.id) AS ordinal
FROM
langu_user u
LEFT JOIN
slot s_order ON (s_order.user_id = u.id) AND s_order.date BETWEEN '2017-02-03 14:03:22' AND '2017-02-10 14:03:22'
JOIN
slot s_filter ON (s_filter.user_id = u.id) AND s_filter.date BETWEEN '2017-01-27 14:03:22' AND '2017-02-10 14:03:22'
WHERE
u.deleted_at IS NULL
AND u.userType IN ('2')
GROUP BY u.id
ORDER BY ordinal DESC
LIMIT 12
And it went down from 4.1-4.3s to 3.6~
My MySQL query is loading very slow (over 30 secs), I was wondering what tweaks I can make to optimize it.
The query should return the last post with the string "?" of all threads.
SELECT FeedbackId, ParentFeedbackId, PageId, FeedbackTitle, FeedbackText, FeedbackDate
FROM ReaderFeedback AS c
LEFT JOIN (
SELECT max(FeedbackId) AS MaxFeedbackId
FROM ReaderFeedback
WHERE ParentFeedbackId IS NOT NULL
GROUP BY ParentFeedbackId
) AS d ON d.MaxFeedbackId = c.FeedbackId
WHERE ParentFeedbackId IS NOT NULL
AND FeedbackText LIKE '%?%'
GROUP BY ParentFeedbackId
ORDER BY d.MaxFeedbackId DESC LIMIT 50
Before discuss this problem, I have formatted your SQL:
SELECT feedbackid,
parentfeedbackid,
pageid,
feedbacktitle,
feedbacktext,
feedbackdate
FROM readerfeedback AS c
LEFT JOIN (SELECT Max(feedbackid) AS MaxFeedbackId
FROM readerfeedback
WHERE parentfeedbackid IS NOT NULL
GROUP BY parentfeedbackid) AS d
ON d.maxfeedbackid = c.feedbackid
WHERE parentfeedbackid IS NOT NULL
AND feedbacktext LIKE '%?%'
GROUP BY parentfeedbackid
ORDER BY d.maxfeedbackid DESC
LIMIT 50
Since there is an Inefficient query criteria in your SQL:
feedbacktext LIKE '%?%'
Which is not able to take benefit from Index and needs a full scan, I suggest you to add a new field
isQuestion BOOLEAN
to your table, and then add logic in your program to assign this field when insert/update a feedbacktext.
Finally your can query based on this field and take benefit from index.
Firstly your SQL is not valid. The outer Group by is not valid.
According to the SQL the second group by is not needed. I moved the 2 where into inner SQL, as well as the limit, wonder if the following is quicker:
SELECT FeedbackId, ParentFeedbackId, PageId, FeedbackTitle, FeedbackText, FeedbackDate
FROM ReaderFeedback AS c
JOIN (
SELECT max(FeedbackId) AS MaxFeedbackId
FROM ReaderFeedback
WHERE ParentFeedbackId IS NOT NULL
AND FeedbackText LIKE '%?%'
GROUP BY ParentFeedbackId
ORDER BY 1 DESC LIMIT 50
) AS d ON d.MaxFeedbackId = c.FeedbackId
Please have a look at your table structure, see if there is any normalisation be downed for speed concern.
I have the following query…
SELECT DISTINCT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
GROUP BY post_post_id
UNION
SELECT DISTINCT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id )
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
AND post_updated >:updated
GROUP BY post_post_id ORDER BY post_posted_date DESC LIMIT :limit
Where :id = 7, :updated = 0.0 and :limit=40 for example
My issue is that the query is taking about a minute to return results. Is there anything in this query that I can do to speed up the result?
I am using RDS
********EDIT*********
I was asked to run the query with an EXPLAIN the result is below
********EDIT**********
View Definitition
CREATE ALGORITHM=UNDEFINED DEFINER=`MySQLUSer`#`%` SQL SECURITY DEFINER VIEW `vPAS_Posts_Users`
AS SELECT
`PAS_User`.`user_user_id` AS `user_user_id`,
`PAS_User`.`user_country` AS `user_country`,
`PAS_User`.`user_city` AS `user_city`,
`PAS_User`.`user_company` AS `user_company`,
`PAS_User`.`user_account_type` AS `user_account_type`,
`PAS_User`.`user_account_premium` AS `user_account_premium`,
`PAS_User`.`user_sign_up_date` AS `user_sign_up_date`,
`PAS_User`.`user_first_name` AS `user_first_name`,
`PAS_User`.`user_last_name` AS `user_last_name`,
`PAS_User`.`user_avatar_url` AS `user_avatar_url`,
`PAS_User`.`user_cover_image_url` AS `user_cover_image_url`,
`PAS_User`.`user_bio` AS `user_bio`,
`PAS_User`.`user_telephone` AS `user_telephone`,
`PAS_User`.`user_dob` AS `user_dob`,
`PAS_User`.`user_sector` AS `user_sector`,
`PAS_User`.`user_job_type` AS `user_job_type`,
`PAS_User`.`user_unique` AS `user_unique`,
`PAS_User`.`user_deleted` AS `user_deleted`,
`PAS_User`.`user_updated` AS `user_updated`,
`PAS_Post`.`post_post_id` AS `post_post_id`,
`PAS_Post`.`post_language_id` AS `post_language_id`,
`PAS_Post`.`post_type` AS `post_type`,
`PAS_Post`.`post_promoted` AS `post_promoted`,
`PAS_Post`.`post_user_id` AS `post_user_id`,
`PAS_Post`.`post_posted_date` AS `post_posted_date`,
`PAS_Post`.`post_latitude` AS `post_latitude`,
`PAS_Post`.`post_longitude` AS `post_longitude`,
`PAS_Post`.`post_location_name` AS `post_location_name`,
`PAS_Post`.`post_text` AS `post_text`,
`PAS_Post`.`post_media_url` AS `post_media_url`,
`PAS_Post`.`post_image_height` AS `post_image_height`,
`PAS_Post`.`post_link` AS `post_link`,
`PAS_Post`.`post_link_title` AS `post_link_title`,
`PAS_Post`.`post_unique` AS `post_unique`,
`PAS_Post`.`post_deleted` AS `post_deleted`,
`PAS_Post`.`post_updated` AS `post_updated`,
`PAS_Post`.`post_original_post_id` AS `post_original_post_id`,
`PAS_Post`.`post_original_type` AS `post_original_type`,
`PAS_Post`.`post_passed_on_by` AS `post_passed_on_by`,
`PAS_Post`.`post_passed_on_caption` AS `post_passed_on_caption`,
`PAS_Post`.`post_passed_on_fullname` AS `post_passed_on_fullname`,
`PAS_Post`.`post_passed_on_avatar_url` AS `post_passed_on_avatar_url`
FROM (`PAS_User` join `PAS_Post` on((`PAS_User`.`user_user_id` = `PAS_Post`.`post_user_id`)));
try this query:
SELECT *
FROM
vPAS_Posts_Users
WHERE
post_user_id =:id
AND post_type != 4
AND post_updated > :updated
UNION
SELECT u.*
FROM vPAS_Posts_Users u
JOIN PAS_Follow f ON f.folw_followed_user_id = u.post_user_id
WHERE
u.post_updated > :updated
AND ( (f.folw_follower_user_id = :id AND f.folw_deleted = 0)
OR (u.post_type = 4 AND u.post_passed_on_by = f.folw_follower_user_id AND u.post_user_id != :id)
)
ORDER BY u.post_posted_date DESC;
LIMIT :limit
Other improvements
Indices:
Be sure you have indices on the following columns:
PAS_User.user_user_id
PAS_Post.post_user_id
PAS_Post.post_type
PAS_Post.post_updated
PAS_Follow.folw_followed_user_id
PAS_Follow.folw_deleted
PAS_Post.post_passed_on_by
After that is done, please 1- check the performance again (SQL_NO_CACHE) and 2- extract another explain plan so we can adjust the query.
EXPLAIN Results
Here are the some suggestions for the query and view first of all using the UNION for the two result sets which might makes your query to work slow instead you can use the UNION ALL
Why i am referring you to use UNION ALL
Reason is both UNION ALL and UNION use temporary table for result generation.The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.This explains the slight performance improvement when using UNION ALL.
UNION on its own will remove any duplicate records so no need to use the DISTINCT clause, try to only one GROUP BY of the whole result set by subqueries this will also minimize the execution time rather then grouping results in each subquery.
Make sure you have added the right indexes on the columns especially the columns used in the WHERE,ORDER BY, GROUP BY, the data types should be appropriate for each column with respect to the nature of data in it like post_posted_date should be datetime,date with an index also.
Here is the rough idea for the query
SELECT q.* FROM (
SELECT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
UNION ALL
SELECT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id
AND vPAS_Posts_Users.post_updated >:updated)
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
) q
GROUP BY q.post_post_id ORDER BY q.post_posted_date DESC LIMIT :limit
References
Difference Between Union vs. Union All – Optimal Performance Comparison
Optimize Mysql Union
MySQL Performance Blog
From your explain I can see that most of your table don't have any key except for the primary one, I would suggest you to add some extra key on the columns you're going to join, for example on: PAS_Follow.folw_followed_user_id and vPAS_Posts_Users.post_user_id, just this will result in a big performance boost.
Bye,
Gnagno