Slow query with Having on calculate field - mysql

I have a query that is slow... i want to display the last 12 newest members near me(near the logged user) and my dev database has 150k rows.
It took over 1 second and the explain query tells me that 30k rows are filtered
So 30k filtered for 150k rows in my developpment DB... my server online is much bigger thant this....
Here my query :
SELECT profils.*,
Users.username,
( SELECT count(*)
from profilsphotos pp
where pp.iduser=Profils.iduser
) as nbpics,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)),
2)), (SIN(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)))
) * 6372.795 AS distance
from Users
inner join Profils ON Users.id=Profils.iduser
where Profils.Actif=1
and profils.idsexe=2
and profils.idlookingfor=1
and Profils.iduser<>1
HAVING distance<400
order by Users.id desc, distance asc
limit 12
Note that i add an index on those four fields: actif,idsexe,idlookingfor and iduser
What wrong with my query ?
Thanks a lot !
Pascal

I would extract the subquery from the SELECT clause to a temporary table, index it and join to it, instead of executing it for every record in the select clause (30K times).
So the steps are: create a temp table, index it, run the optimized query.
First, create the relevant indexes for the query:
ALTER TABLE
`Profils`
ADD
INDEX `profils_idx_actif_iduser` (`Actif`, `iduser`);
ALTER TABLE
`Users`
ADD
INDEX `users_idx_id_username` (`id`, `username`);
ALTER TABLE
`profils`
ADD
INDEX `profils_idx_idsexe_idlookingfor` (`idsexe`, `idlookingfor`);
ALTER TABLE
`profilsphotos`
ADD
INDEX `profilsphotos_idx_iduser` (`iduser`);
Now, create the temp table and index it:
-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
count(*) AS nbpics,
iduser
FROM
profilsphotos pp
WHERE
1 = 1
GROUP BY
iduser
ORDER BY
NULL;
ALTER TABLE
`temp1`
ADD
INDEX `temp1_idx_iduser_nbpics` (`iduser`, `nbpics`);
Now try to run this query instead of the original one and see if it runs faster:
SELECT
optimizedSub1.*,
temp1.nbpics
FROM
(SELECT
Users.username,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2)),
(SIN(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)))) * 6372.795 AS distance
FROM
Users
INNER JOIN
Profils
ON Users.id = Profils.iduser
WHERE
Profils.Actif = 1
AND profils.idsexe = 2
AND profils.idlookingfor = 1
AND Profils.iduser <> 1
HAVING
distance < 400
ORDER BY
Users.id DESC,
distance ASC LIMIT 12) AS optimizedSub1
LEFT JOIN
temp1
ON temp1.iduser = optimizedSub1.iduser

Profils needs
INDEX(Actif, idsexe, idlookingfor) -- in any order
Perhaps distance should be first?..
order by Users.id desc, distance asc
What is Y(gm_coor)? If is a Stored Function, we need to know more. What table has gm_coor? After that, maybe we can discuss a "bounding box" as a partial speedup.
Make another nesting of SELECTs and move the computation of nbpics to it. Currently, the COUNT(*) is being performed 30K times. After the change, it will be only 12 times.
Reformulation
SELECT p2.*,
u.username,
( SELECT COUNT(*)
FROM profilsphotos pp
where pp.iduser = p2.iduser
) as nbpics,
x.distance
FROM
( SELECT p1.id, -- assuming this the PK of Profils
(...) AS distance
FROM Profils AS p1
WHERE p1.Actif=1
and p1.idsexe=2
and p1.idlookingfor=1
and p1.iduser<>1
HAVING distance < 400
ORDER BY distance
LIMIT 12
) AS x
JOIN profils AS p2 USING(id)
JOIN Users AS u ON u.id = p2.iduser;

Related

Couchbase different result of nested loop WHERE clause

I am facing a wierd issue in couchbase: i was executing the following two queries:
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM ssb_lineorder
LIMIT 10000) AS cte0
INNER JOIN ssb_ddate ON cte0.ssb_lineorder.lo_orderdate = ssb_ddate.d_datekey) AS cte1
JOIN ssb_part USE NL ON cte1.cte0.ssb_lineorder.lo_partkey = ssb_part.p_partkey
WHERE ssb_part.p_size > 10
and
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM ssb_lineorder
LIMIT 10000) AS cte0
INNER JOIN ssb_ddate ON cte0.ssb_lineorder.lo_orderdate = ssb_ddate.d_datekey) AS cte1
JOIN ssb_part USE NL ON cte1.cte0.ssb_lineorder.lo_partkey = ssb_part.p_partkey ) AS cte2
WHERE cte2.ssb_part.p_size > 10
These two are exactly the same except the final WHERE clause. According to my knowledge of relational DBMS, the results should be exactly the same. but I am getting different result: 1 for the first query, 7972 for the second query.
I am wondering if I misunderstood the n1ql mechenism ?
There should not be any different.
LIMIT inside without order by can cause inconsistent results. 1 vs 7972 that is way off.
As this data dependent you need to debug that.
Execute UI and go to Plan Text tab and take look ItemsIn#, ItemsOut# of each operator and take look where things gone wrong.
Also add predicate and reduce data and see what is wrong.
As no OUTER JOIN try the following.
CREATE INDEX ix1 ON ssb_part(p_size, p_partkey);
CREATE INDEX ix2 ON ssb_lineorder(lo_partkey, lo_orderdate);
CREATE INDEX ix3 ON ssb_ddate(d_datekey);
SELECT *
FROM ssb_part AS sp
JOIN ssb_lineorder AS sl ON sp.p_partkey = sl.lo_partkey
JOIN ssb_ddate AS sd ON sl.lo_orderdate = sd.d_datekey
WHERE sp.p_size > 10
SELECT *
FROM ssb_part AS sp
JOIN ssb_lineorder AS sl USE HASH (PROBE) ON sp.p_partkey = sl.lo_partkey
JOIN ssb_ddate AS sd USE HASH (PROBE) ON sl.lo_orderdate = sd.d_datekey
WHERE sp.p_size > 10 ;

Poor performance on joining two large tables

I am struggling since several days with poor performance on joining two large tables. Maybe someone has a hint for me.
The one table is "broker_stock_data" which holds informations about purchases and sells of stocks of a client (this table is currently small but will grow bigger in future). To show the client the current price for his stocks there is the table "stock_data" which holds historical stock prices for a big amount of stocks (currently around 2million rows and growing). It´s a mariaDB/mysql Database and the table uses InnoDB.
Here are some informations about my tables:
broker_stock_data Table
stock_data Table
EXPLAIN Call on the SELECT
Schema of stock_data table
Having that in place I need to somehow get the latest price for each stock which is owned by a client. To do that I have the following query.
SELECT
`brokerStockData`.`id` AS `id`,
`brokerStockData`.`name` AS `name`,
`brokerStockData`.`symbol` AS `symbol`,
`brokerStockData`.`wkn` AS `wkn`,
`brokerStockData`.`modifyDate` AS `modifyDate`,
`brokerStockData`.`addDate` AS `addDate`,
`webApiConfig`.`id` AS `webApiConfigId`,
`webApiConfig`.`name` AS `webApiConfigName`,
`importError`.`msg` AS `importErrorMessage`,
SUM(`brokerStockData`.`purchaseAmount`) AS `purchaseAmount`,
stockData.stock_data_close AS `stockDataClose`,
stockData.stock_data_date AS `purchaseDate`,
stockData.stock_data_close * SUM(purchaseAmount) - SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) * SUM(purchaseAmount) AS `difference`,
(
(stockData.stock_data_close * purchaseAmount - SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) * purchaseAmount) / (SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) * purchaseAmount)
)
* 100 AS `yield`,
SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) AS `avgPurchasePrice`
FROM
`broker_stock_data` `brokerStockData`
INNER JOIN
`broker` `broker`
ON `broker`.`id` = `brokerStockData`.`brokerId`
INNER JOIN
`user` `user`
ON `user`.`id` = `broker`.`userId`
INNER JOIN
`webapi_configuration` `webApiConfig`
ON `webApiConfig`.`id` = `brokerStockData`.`webApiConfigId`
LEFT JOIN
(
SELECT
`stockData`.`date` AS `stock_data_date`,
`stockData`.`symbol` AS `stock_data_symbol`,
`stockData`.`close` AS `stock_data_close`,
`stockData`.`webApiConfigId` AS `stock_data_webApiConfigId`
FROM
`stock_data` `stockData`
WHERE
`stockData`.`date` IN
(
SELECT
MAX(`stockDataSQ`.`date`)
FROM
`stock_data` `stockDataSQ`
WHERE
`stockDataSQ`.`symbol` = `stockData`.`symbol`
GROUP BY
`stockDataSQ`.`symbol`
)
GROUP BY
`stockData`.`symbol`
)
`stockData`
ON `brokerStockData`.`symbol` = stock_data_symbol
AND `webApiConfig`.`id` = stock_data_webApiConfigId
LEFT JOIN
`import_log` `importError`
ON `importError`.`symbol` = `brokerStockData`.`symbol`
WHERE
`user`.`id` = 2
AND `broker`.`id` = 2
AND `brokerStockData`.`symbol` != ""
GROUP BY
`brokerStockData`.`symbol`
ORDER BY
`brokerStockData`.`name` ASC LIMIT 12
The problematic part is the LEFT JOIN on the stock_data table. Any Ideas on how to speed this up?
UPDATE
Changed the query since I copied a modified version of me :/
UPDATE2
Updated the EXPLAIN screenshot with the new query, sorry ;)
LEFT JOIN ( SELECT ... ) is likely to be very inefficient. Can you get rid of the LEFT?
Because of the LIMIT 12, a common trick to improve performance is to do the minimum work to find the 12 ids first. Then do whatever JOINs are needed to find all the other data.
After you have done that, I'll look at the indexes.
GROUP BY
`brokerStockData`.`symbol`
ORDER BY
`brokerStockData`.`name` ASC
Because the lists are different, it requires sorting twice. Change to this:
GROUP BY
`brokerStockData`.`name`, `brokerStockData`.`symbol`
ORDER BY
`brokerStockData`.`name` ASC, `brokerStockData`.`symbol` ASC
Assuming symbol and name are 1:1, you will get same results, but faster.

order by makes query slow

I have two tables :
video (ID, TITLE, ..., UPLOADED_DATE)
join_video_category (ID (not used), ID_VIDEO_ ID_CATEGORY)
rows in video : 4 500 000 |
rows in join_video_category : 5 800 000
1 video can have many category.
I have a query works perfectly, 20 ms max to get result :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
LIMIT 1000;
This query take 1000 video, the order is not important.
BUT, when i would like to get 10 latest video from a category, my query take arround 30-40 seconds :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
I have index on ID_CATEGORY, ID_VIDEO, UPLOADED_DATE, PRIMARY ON ID video and join_video_category.
I have tested it with JOIN on my query, it's the same result.
First, the comparisons are to two very different queries. The first returns a bunch of videos whenever it encounters them. The second has to read all the videos and then sort them.
Try rewriting this as a JOIN:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
That may or may not help. You have a lot of data and so you might have a lot of videos for a given category. If so, a where clause that gets more recent data might really help:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11 AND v.UPLOADED_DATE >= '2015-01-01'
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
Finally, if that doesn't work, consider adding something like UPLOADED_DATE into join_video_category. Then, this query should blaze:
select vc.video_id
from join_vdeo_category vc
where vc.ID_CATEGORY = 11
order by vc.UPLOADED_DATE desc
limit 10;
with an index on join_video_category(id_category, uploaded_date, video_id).
solution #1:
replacing "in" with "exists" would improve the performance, please try the below query.
SELECT * FROM video WHERE exists
(SELECT * FROM join_video_category WHERE ID_CATEGORY=11 AND join_video_category.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
solution #2:
1) create tem_table
CREATE TABLE TEMP_TABLE AS SELECT * FROM join_video_category WHERE ID_CATEGORY=11;
2) use the temp table in solution #1
SELECT * FROM video WHERE exists
(SELECT * FROM temp_table WHERE temp_table.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
Good Luck!!
If it is 1:Many, don't use an extra table between Video and Category. However, your row counts imply that it is Many:Many.
If it is 1:Many, simply have the category_id in the Video table, then simplify all the queries.
If it is Many:Many, then be sure to use this pattern for the junction table:
CREATE TABLE map_video_category (
video_id ...,
category_id ...,
PRIMARY KEY(video_id, category_id), -- both ids, one direction
INDEX (category_id, video_id) -- both ids, the other direction
) ENGINE=InnoDB; -- significantly better than MyISAM on INDEX handling here
The ID that you mentioned is a waste. The composite keys are optimal for all situations, and will improve performance in most situations.
Do not use IN ( SELECT ... ); the optimizer does a poor job of optimizing it. Change to a JOIN, LEFT JOIN, EXISTS, or some other construct.

Slow Execution of MySQL Select Query

I have the following query…
SELECT DISTINCT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
GROUP BY post_post_id
UNION
SELECT DISTINCT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id )
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
AND post_updated >:updated
GROUP BY post_post_id ORDER BY post_posted_date DESC LIMIT :limit
Where :id = 7, :updated = 0.0 and :limit=40 for example
My issue is that the query is taking about a minute to return results. Is there anything in this query that I can do to speed up the result?
I am using RDS
********EDIT*********
I was asked to run the query with an EXPLAIN the result is below
********EDIT**********
View Definitition
CREATE ALGORITHM=UNDEFINED DEFINER=`MySQLUSer`#`%` SQL SECURITY DEFINER VIEW `vPAS_Posts_Users`
AS SELECT
`PAS_User`.`user_user_id` AS `user_user_id`,
`PAS_User`.`user_country` AS `user_country`,
`PAS_User`.`user_city` AS `user_city`,
`PAS_User`.`user_company` AS `user_company`,
`PAS_User`.`user_account_type` AS `user_account_type`,
`PAS_User`.`user_account_premium` AS `user_account_premium`,
`PAS_User`.`user_sign_up_date` AS `user_sign_up_date`,
`PAS_User`.`user_first_name` AS `user_first_name`,
`PAS_User`.`user_last_name` AS `user_last_name`,
`PAS_User`.`user_avatar_url` AS `user_avatar_url`,
`PAS_User`.`user_cover_image_url` AS `user_cover_image_url`,
`PAS_User`.`user_bio` AS `user_bio`,
`PAS_User`.`user_telephone` AS `user_telephone`,
`PAS_User`.`user_dob` AS `user_dob`,
`PAS_User`.`user_sector` AS `user_sector`,
`PAS_User`.`user_job_type` AS `user_job_type`,
`PAS_User`.`user_unique` AS `user_unique`,
`PAS_User`.`user_deleted` AS `user_deleted`,
`PAS_User`.`user_updated` AS `user_updated`,
`PAS_Post`.`post_post_id` AS `post_post_id`,
`PAS_Post`.`post_language_id` AS `post_language_id`,
`PAS_Post`.`post_type` AS `post_type`,
`PAS_Post`.`post_promoted` AS `post_promoted`,
`PAS_Post`.`post_user_id` AS `post_user_id`,
`PAS_Post`.`post_posted_date` AS `post_posted_date`,
`PAS_Post`.`post_latitude` AS `post_latitude`,
`PAS_Post`.`post_longitude` AS `post_longitude`,
`PAS_Post`.`post_location_name` AS `post_location_name`,
`PAS_Post`.`post_text` AS `post_text`,
`PAS_Post`.`post_media_url` AS `post_media_url`,
`PAS_Post`.`post_image_height` AS `post_image_height`,
`PAS_Post`.`post_link` AS `post_link`,
`PAS_Post`.`post_link_title` AS `post_link_title`,
`PAS_Post`.`post_unique` AS `post_unique`,
`PAS_Post`.`post_deleted` AS `post_deleted`,
`PAS_Post`.`post_updated` AS `post_updated`,
`PAS_Post`.`post_original_post_id` AS `post_original_post_id`,
`PAS_Post`.`post_original_type` AS `post_original_type`,
`PAS_Post`.`post_passed_on_by` AS `post_passed_on_by`,
`PAS_Post`.`post_passed_on_caption` AS `post_passed_on_caption`,
`PAS_Post`.`post_passed_on_fullname` AS `post_passed_on_fullname`,
`PAS_Post`.`post_passed_on_avatar_url` AS `post_passed_on_avatar_url`
FROM (`PAS_User` join `PAS_Post` on((`PAS_User`.`user_user_id` = `PAS_Post`.`post_user_id`)));
try this query:
SELECT *
FROM
vPAS_Posts_Users
WHERE
post_user_id =:id
AND post_type != 4
AND post_updated > :updated
UNION
SELECT u.*
FROM vPAS_Posts_Users u
JOIN PAS_Follow f ON f.folw_followed_user_id = u.post_user_id
WHERE
u.post_updated > :updated
AND ( (f.folw_follower_user_id = :id AND f.folw_deleted = 0)
OR (u.post_type = 4 AND u.post_passed_on_by = f.folw_follower_user_id AND u.post_user_id != :id)
)
ORDER BY u.post_posted_date DESC;
LIMIT :limit
Other improvements
Indices:
Be sure you have indices on the following columns:
PAS_User.user_user_id
PAS_Post.post_user_id
PAS_Post.post_type
PAS_Post.post_updated
PAS_Follow.folw_followed_user_id
PAS_Follow.folw_deleted
PAS_Post.post_passed_on_by
After that is done, please 1- check the performance again (SQL_NO_CACHE) and 2- extract another explain plan so we can adjust the query.
EXPLAIN Results
Here are the some suggestions for the query and view first of all using the UNION for the two result sets which might makes your query to work slow instead you can use the UNION ALL
Why i am referring you to use UNION ALL
Reason is both UNION ALL and UNION use temporary table for result generation.The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.This explains the slight performance improvement when using UNION ALL.
UNION on its own will remove any duplicate records so no need to use the DISTINCT clause, try to only one GROUP BY of the whole result set by subqueries this will also minimize the execution time rather then grouping results in each subquery.
Make sure you have added the right indexes on the columns especially the columns used in the WHERE,ORDER BY, GROUP BY, the data types should be appropriate for each column with respect to the nature of data in it like post_posted_date should be datetime,date with an index also.
Here is the rough idea for the query
SELECT q.* FROM (
SELECT * FROM
vPAS_Posts_Users
WHERE (post_user_id =:id AND post_type != 4)
AND post_updated >:updated
UNION ALL
SELECT vPAS_Posts_Users.* FROM PAS_Follow
JOIN vPAS_Posts_Users ON
( PAS_Follow.folw_followed_user_id = vPAS_Posts_Users.post_user_id
AND vPAS_Posts_Users.post_updated >:updated)
WHERE (( PAS_Follow.folw_follower_user_id =:id AND PAS_Follow.folw_deleted = 0 )
OR ( post_type = 4 AND post_passed_on_by = PAS_Follow.folw_follower_user_id
AND post_user_id !=:id ))
) q
GROUP BY q.post_post_id ORDER BY q.post_posted_date DESC LIMIT :limit
References
Difference Between Union vs. Union All – Optimal Performance Comparison
Optimize Mysql Union
MySQL Performance Blog
From your explain I can see that most of your table don't have any key except for the primary one, I would suggest you to add some extra key on the columns you're going to join, for example on: PAS_Follow.folw_followed_user_id and vPAS_Posts_Users.post_user_id, just this will result in a big performance boost.
Bye,
Gnagno

MySQL Query Needs Optimizations

I have this users table with:
id : int (255)
name: char (100)
last_comment_target: int(100)
last_comment_date: datetime
This table has around 1.3mil rows.
PKEY and BTREE is on id, last_comment_target, and last_comment_date.
And, I am trying to perform a range query:
SELECT * FROM users
WHERE id IN (1,2,3,5,...[around 5000 ids])
AND last_comment_target > 0
ORDER BY last_comment_dt DESC LIMIT 0,20;
Sometimes the query can take as long as 3 seconds. I wonder if there are better ways to optimize this query. Or, if this query can be rewritten.
Thank you so much for your help.
SELECT u.*
FROM
users u
JOIN (
SELECT 1 id
UNION ALL
SELECT 2 id
UNION ALL
:
:
SELECT 5000 id
) ids ON ids.id = u.id
WHERE
last_comment_target > 0
ORDER BY
last_comment_dt DESC
LIMIT 0, 20;
Thanks everyone that has contributed.
#Karolis seems to point out that an alternative using join instead of range
So, basically:
SELECT * FROM users WHERE id IN (1,2,3,...[5000 ids]) AND last_comment_target > 0
yields in EXPLAIN statement a type of RANGE. The 5000 ids can be generated from another table.
When I switched the above to:
SELECT *
FROM users u
INNER JOIN user_friends uf ON u.id = uf.to_id
AND u.last_comment_target > 0
AND uf.from_id = [id];
It yields in EXPLAIN statement two types: ref and eq_ref which is faster than range in this query.
The query execution is reduced from 3+ seconds to around 0.2x seconds.
So, lesson learned from my end: TRY to use JOIN instead of RANGE if you have a table that you can derive from.