Complex MySQL Select Left Join Optimization Indexing - mysql

I have a very complex query that is running and finding locations of members joining the subscription details and sorting by distance.
Can someone provide instruction on the correct indexes and cardinality I should add to make this load faster.
Right now on 1 million records it takes 75 seconds and I know it can be improved.
Thank you.
SELECT SQL_CALC_FOUND_ROWS (((acos(sin((33.987541*pi()/180)) * sin((users_data.lat*pi()/180))+cos((33.987541*pi()/180)) * cos((users_data.lat*pi()/180)) * cos(((-118.472153- users_data.lon)* pi()/180))))*180/pi())*60*1.1515) as distance,subscription_types.location_limit as location_limit,users_data.user_id,users_data.last_name,users_data.filename,users_data.user_id,users_data.phone_number,users_data.city,users_data.state_code,users_data.zip_code,users_data.country_code,users_data.quote,users_data.subscription_id,users_data.company,users_data.position,users_data.profession_id,users_data.experience,users_data.account_type,users_data.verified,users_data.nationwide,IF(listing_type = 'Company', company, last_name) as name
FROM `users_data`
LEFT JOIN `users_reviews` ON users_data.user_id=users_reviews.user_id AND users_reviews.review_status='2'
LEFT JOIN users_locations ON users_locations.user_id=users_data.user_id
LEFT JOIN subscription_types ON users_data.subscription_id=subscription_types.subscription_id
WHERE users_data.active='2'
AND subscription_types.searchable='1'
AND users_data.state_code='CA'
AND users_data.country_code='US'
GROUP BY users_data.user_id
HAVING distance <= '50'
OR location_limit='all'
OR users_data.nationwide='1'
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users_reviews system user_id,review_status NULL NULL NULL 0 const row not found
1 SIMPLE users_locations system user_id NULL NULL NULL 0 const row not found
1 SIMPLE users_data ref subscription_id,active,state_code,country_code state_code 47 const 88241 Using where; Using temporary; Using filesort
1 SIMPLE subscription_types ALL PRIMARY,searchable NULL NULL NULL 4 Using where; Using join buffer

You query is not that complex. You have only one join, on a table subscription_types which is certainly a little table with no more than a few hundred rows.
Where are your indexes ? The best way to improve your query is to create indexes on the field you are filtering, like active, country_code, state_code and searchable
Have you create the foreign key on users_data.subscription_id ? You need an index on that too.
ForceIndex is useless, let the RDBMS determine the best indexes to chose.
Left Join is useless too, because the line subscription_types.searchable='1' will remove the unmatch correspondance
The order on search_priority implies that you need indexes on this columns too
The filtering in the HAVING can make the indexes not used. You don't need to put these filters in the HAVING. If I understand your table schema, this is not really the aggregate that is filtered.
Your table contains 1 million rows, but how much rows are returned, without the limit? With the right indexes, the query should execute under a second.
SELECT ...
FROM `users_data`
INNER JOIN subscription_types
ON users_data.subscription_id = subscription_types.subscription_id
WHERE users_data.active='2'
AND users_data.country_code='US'
AND users_data.state_code='NY'
AND subscription_types.searchable='1'
AND (distance <= '50' OR location_limit='all' OR users_data.nationwide='1')
GROUP BY users_data.user_id
ORDER BY subscription_types.search_priority ASC, distance ASC
LIMIT 0,10

Related

Laravel Join tables and group by sum query too slow

I am using Laravel query builder to get desired results from database. The following query if working perfectly but taking too much time to get results. Can you please help me with this?
select
`amz_ads_sp_campaigns`.*,
SUM(attributedUnitsOrdered7d) as order7d,
SUM(attributedUnitsOrdered30d) as order30d,
SUM(attributedSales7d) as sale7d,
SUM(attributedSales30d) as sale30d,
SUM(impressions) as impressions,
SUM(clicks) as clicks,
SUM(cost) as cost,
SUM(attributedConversions7d) as attributedConversions7d,
SUM(attributedConversions30d) as attributedConversions30d
from
`amz_ads_sp_product_targetings`
inner join `amz_ads_sp_report_product_targetings` on `amz_ads_sp_product_targetings`.`campaignId` = `amz_ads_sp_report_product_targetings`.`campaignId`
inner join `amz_ads_sp_campaigns` on `amz_ads_sp_report_product_targetings`.`campaignId` = `amz_ads_sp_campaigns`.`campaignId`
where
(
`amz_ads_sp_product_targetings`.`user_id` = ?
and `amz_ads_sp_product_targetings`.`profileId` = ?
)
group by
`amz_ads_sp_product_targetings`.`campaignId`
Result of Explain SQL
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE amz_ads_sp_report_product_targetings ALL campaignId NULL NULL NULL 50061 Using temporary; Using filesort
1 SIMPLE amz_ads_sp_campaigns ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 1
1 SIMPLE amz_ads_sp_product_targetings ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 33 Using where
Your query could benefit from several indices to cover the WHERE clause as well as the join conditions:
CREATE INDEX idx1 ON amz_ads_sp_product_targetings (
user_id, profileId, campaignId);
CREATE INDEX idx2 ON amz_ads_sp_report_product_targetings (
campaignId);
CREATE INDEX idx3 ON amz_ads_sp_campaigns (campaignId);
The first index idx1 covers the entire WHERE clause, which might let MySQL throw away many records on the initial scan of the amz_ads_sp_product_targetings table. It also includes the campaignId column, which is needed for the first join. The second and third indices cover the join columns of each respective table. This might let MySQL do a more rapid lookup during the join process.
Note that selecting amz_ads_sp_campaigns.* is not valid unless the campaignId of that table be the primary key. Also, there isn't much else we can do speed up the query, as SUM, by its nature, requires touching every record in order to come up the result sum.

Large SQL database - solving efficiency

I have this following SQL query, which, when I originally coded it, was exceptionally fast, it now takes over 1 second to complete:
SELECT counted/scount as ratio, [etc]
FROM
playlists
LEFT JOIN (
select AID, PLID FROM (SELECT AID, PLID FROM p_s ORDER BY `order` asc, PLSID desc)as g GROUP BY PLID
) as t USING(PLID)
INNER JOIN (
SELECT PLID, count(PLID) as scount from p_s LEFT JOIN audio USING(AID) WHERE removed='0' and verified='1' GROUP BY PLID
) as g USING(PLID)
LEFT JOIN (
select AID, count(AID) as counted FROM a_p_all WHERE ".time()." - playtime < 2678400 GROUP BY AID
) as r USING(AID)
LEFT JOIN audio USING (AID)
LEFT JOIN members USING (UID)
WHERE scount > 4 ORDER BY ratio desc
LIMIT 0, 20
I have identified the problem, the a_p_all table has over 500k rows. This is slowing down the query. I have come up with a solution:
Create a smaller temporary table, that only stores the data necessary, and deletes anything older than is needed.
However, is there a better method to use? Optimally I wouldn't need a temporary table; what do sites such as YouTube/Facebook do for large tables to keep query times fast?
edit
This is the EXPLAIN table for the query in the answer from #spencer7593
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 20
1 PRIMARY u eq_ref PRIMARY PRIMARY 8 q.AID 1 Using index
1 PRIMARY m eq_ref PRIMARY PRIMARY 8 q.UID 1 Using index
3 DERIVED <derived6> ALL NULL NULL NULL NULL 20
6 DERIVED t ALL NULL NULL NULL NULL 21
5 DEPENDENT SUBQUERY s ALL NULL NULL NULL NULL 49 Using where; Using filesort
4 DEPENDENT SUBQUERY c ALL NULL NULL NULL NULL 49 Using where
4 DEPENDENT SUBQUERY o eq_ref PRIMARY PRIMARY 8 database.c.AID 1 Using where
2 DEPENDENT SUBQUERY a ALL NULL NULL NULL NULL 510594 Using where
Two "big rock" issues stand out to me.
Firstly, this predicate
WHERE ".time()." - playtime < 2678400
(I'm assuming that this isn't the actual SQL being submitted to the database, but that what's being sent to the database is something like this...
WHERE 1409192073 - playtime < 2678400
such that we want only rows where playtime is within the past 31 days (i.e. within 31*24*60*60 seconds of the integer value returned by time().
This predicate can't make use of a range scan operation on a suitable index on playtime. MySQL evaluates the expression on the left side for every row in the table (every row that isn't excluded by some other predicate), and the result of that expression is compared to the literal on the right.
To improve performance, rewrite the predicate that so that the comparison is made on the bare column. Compare the value stored in the playtime column to an expression that needs to be evaluated one time, for example:
WHERE playtime > 1409192073 - 2678400
With a suitable index available, MySQL can perform a "range" scan operation, and efficiently eliminate a boatload of rows that don't need to be evaluated.
The second "big rock" is the inline views, or "derived tables" in MySQL parlance. MySQL is much different than other databases in how inline views are processed. MySQL actually runs that innermost query, and stores the result set as a temporary MyISAM table, and then the outer query runs against the MyISAM table. (The name that MySQL uses, "derived table", makes sense when we understand how MySQL processes the inline view.) Also, MySQL does not "push" predicates down, from an outer query down into the view queries. And on the derived table, there are no indexes created. (I believe MySQL 5.7 is changing that, and does sometimes create indexes, to improve performance.) But large "derived tables" can have a significant performance impact.
Also, the LIMIT clause gets applied last in the statement processing; that's after all the rows in the resultset are prepared and sorted. Even if you are returning only 20 rows, MySQL still prepares the entire resultset; it just doesn't transfer them to the client.
Lots of the column references are not qualified with the table name or alias, so we don't know, for example, which table (p_s or audio) contains the removed and verified columns.
(We know it can't be both, if MySQL isn't throwing a "ambiguous column" error. But MySQL has access to the table definitions, where we don't. MySQL also knows something about the cardinality of the columns, in particular, which columns (or combination of columns) are UNIQUE, and which columns can contain NULL values, etc.
Best practice is to qualify ALL column references with the table name or (preferably) a table alias. (This makes it much easier on the human reading the SQL, and it also avoids a query from breaking when a new column is added to a table.)
Also, the query as a LIMIT clause, but there's no ORDER BY clause (or implied ORDER BY), which makes the resultset indeterminate. We don't have any guaranteed which will be the "first" rows returned.
EDIT
To return only 20 rows from playlists (out of thousands or more), I might try using correlated subqueries in the SELECT list; using a LIMIT clause in an inline view to winnow down the number of rows that I'd need to run the subqueries for. Correlated subqueries can eat your lunch (and your lunchbox too) in terms of performance with large sets, due to the number of times those need to be run.
From what I can gather, you are attempting to return 20 rows from playlists, picking up the related row from member (by the foreign key in playlists), finding the "first" song in the playlist; getting a count of times that "song" has been played in the past 31 days (from any playlist); getting the number of times a song appears on that playlist (as long as it's been verified and hasn't been removed... the outerness of that LEFT JOIN is negated by the predicates on the removed and verified columns, if either of those columns is from the audio table...).
I'd take a shot with something like this, to compare performance:
SELECT q.*
, ( SELECT COUNT(1)
FROM a_p_all a
WHERE a.playtime < 1409192073 - 2678400
AND a.AID = q.AID
) AS counted
FROM ( SELECT p.PLID
, p.UID
, p.[etc]
, ( SELECT COUNT(1)
FROM p_s c
JOIN audio o
ON o.AID = c.AID
AND o.removed='0'
AND o.verified='1'
WHERE c.PLID = p.PLID
) AS scount
, ( SELECT s.AID
FROM p_s s
WHERE s.PLID = p.PLID
ORDER BY s.order ASC, s.PLSID DESC
LIMIT 1
) AS AID
FROM ( SELECT t.PLID
, t.[etc]
FROM playlists t
ORDER BY NULL
LIMIT 20
) p
) q
LEFT JOIN audio u ON u.AID = q.AID
LEFT JOIN members m ON m.UID = q.UID
LIMIT 0, 20
UPDATE
Dude, the EXPLAIN output is showing that you don't have suitable indexes available. To get any decent chance at performance with the correlated subqueries, you're going to want to add some indexes, e.g.
... ON a_p_all (AID, playtime)
... ON p_s (PLID, order, PLSID, AID)

optimize mysql indexes

I have a slow (1.4 second) query that has been bugging me for a while so I just thought I'd put it up and see if anyone can help me optimize my indexes to speed it up:
select sql_calc_found_rows t.id, q.im_id, concat(t.si_id, ' ', t.de), q.date, q.das, q.dac, u.name, q.ac, q.st
from t300q q
left join t300 t on t.id = q.con_id
left join users u on u.id = q.user_id
order by q.date desc limit 0,100
sql explain results:
SIMPLE q ALL 89126 Using filesort
SIMPLE t eq_ref PRIMARY PRIMARY 4 db.q.con_id 1
SIMPLE u eq_ref PRIMARY PRIMARY 4 db.q.user_id 1
session stats:
Handler_read_first = 0
Handler_read_key = 177934
Handler_read_next = 23
Handler_read_prev = 679
Handler_read_rnd = 15
Handler_read_rnd_next = 89127
and I have the following indexes:
t.id - primary key
q.con_id |
q.date | - all form a single index
q.user_id |
u.id - primary key
as you can see from the handler stats the size of table q is 89126 rows.
It's not a massive problem but I would like to get the speed down below 1 second for this query if possible.
The query is slow because you don't have a date index. The compound index cannot be used because the date is in the middle. Either move the date to be the first field in the existing index or create a stand alone index.
BTW mysql uses only equality for the first two columns of a 3 column index. The last column can use ranged queries.
Namely:
WHERE x=? AND y=? order by z;
will use an index of columns (x,y,z) (since z can be ranged).
Try moving 'date' to the 3rd column and rewriting the query.
If that doesn't work, then mysql isn't being smart enough to treat con_id and user_id in the join.. Perhaps you could rewrite it so those join conditions happen in the where clause.
try to trigger the OPTIMIZE or ANALYZE on your database but make sure that you trigger this on the time on which there are only few request or much better if there are no request that are being done on the server to avoid any problems to arise you may see more informations about this statements on this links:
http://dev.mysql.com/doc/refman/5.6/en/analyze-table.html
http://dev.mysql.com/doc/refman/4.1/en/optimize-table.html

MySQL db is using "Using where; Using temporary; Using filesort" when sorting by date

I have a database with a bunch of records and when I load up the page with the following SQL its really slow.
SELECT goal.title, max(updates.date_updated) as update_sort
FROM `goal`
LEFT OUTER JOIN `update` `updates` ON (`updates`.`goal_id`=`goal`.`goal_id`)
WHERE (goal.private=0)
GROUP BY updates.goal_id
ORDER BY update_sort desc
LIMIT 12
When I do an explain it says its not using any keys and that its searching every row. Also telling me its using "Using where; Using temporary; Using filesort".
Any help is much appreciated
Thanks
It needs to be grouped by goal_id because of the MAX() in the select is returning only one row.
What i'm trying to do is return the MAX date_updated row from the updates table for each goal and then sort it by that column.
Current indices are on goal.private and update.goal_id
output of EXPLAIN (can't upload images so have to put it here sorry if it isnt clear:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE goal ref private private 1 const 27 Using temporary; Using filesort
1 SIMPLE updates ref goal_id goal_id 4 goal.goal_id 1
SELECT u.date_updated, g.title
FROM updates u
JOIN goal g
ON g.goal_id = u.goal_id
WHERE u.id =
(
SELECT ui.id
FROM updates ui
WHERE ui.goal_id = u.goal_id
ORDER BY
ui.goal_id, ui.date_updated, ui.id
LIMIT 1
)
AND g.private = 0
ORDER BY
u.date_update, u.id
LIMIT 12
Create two indexes on updates for the query to work fast:
updates (goal_id, date_updated, id)
updates (date_updated, id)
My guess is that the MAX() function causes this behaviour. It needs to look at every record to decide what rows to select. What kind of grouping are you trying to accomplish?

Strange Performance Issues with INNER JOIN vs. LEFT JOIN

I was using a query that looked similar to this one:
SELECT `episodes`.*, IFNULL(SUM(`views_sum`.`clicks`), 0) as `clicks`
FROM `episodes`, `views_sum`
WHERE `views_sum`.`index` = "episode" AND `views_sum`.`key` = `episodes`.`id`
GROUP BY `episodes`.`id`
... which takes ~0.1s to execute. But it's problematic, because some episodes don't have a corresponding views_sum row, so those episodes aren't included in the result.
What I want is NULL values when a corresponding views_sum row doesn't exist, so I tried using a LEFT JOIN instead:
SELECT `episodes`.*, IFNULL(SUM(`views_sum`.`clicks`), 0) as `clicks`
FROM `episodes`
LEFT JOIN `views_sum` ON (`views_sum`.`index` = "episode" AND `views_sum`.`key` = `episodes`.`id`)
GROUP BY `episodes`.`id`
This query produces the same columns, and it also includes the few rows missing from the 1st query.
BUT, the 2nd query takes 10 times as long! A full second.
Why is there such a huge discrepancy between the execution times when the result is so similar? There's nowhere near 10 times as many rows — it's like 60 from the 1st query, and 70 from the 2nd. That's not to mention that the 10 additional rows have no views to sum!
Any light shed would be greatly appreciated!
(There are indexes on episodes.id, views_sum.index, and views_sum.key.)
EDIT:
I copy-pasted the SQL from above, and here are the EXPLAINs, in order:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE views_sum ref index,key index 27 const 6532 Using where; Using temporary; Using filesort
1 SIMPLE episodes eq_ref PRIMARY PRIMARY 4 db102914_itw.views_sum.key 1 Using where
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE episodes ALL NULL NULL NULL NULL 70 Using temporary; Using filesort
1 SIMPLE views_sum ref index,key index 27 const 6532
Here's the query I ultimately came up with, after many, many iterations. (The SQL_NO_CACHE flag is there so I can test execution times.)
SELECT SQL_NO_CACHE e.*, IFNULL(SUM(vs.`clicks`), 0) as `clicks`
FROM `episodes` e
LEFT JOIN
(SELECT * FROM `views_sum` WHERE `index` = "episode") vs
ON vs.`key` = e.`id`
GROUP BY e.`id`
Because the ON condtion views_sum.index = "episode" is static, i.e., isn't dependent on the row it's joined to, I was able to get a massive performance boost by first using a subquery to limit the views_sum table before joining.
My query now takes ~0.2s. And what's even better, the time doesn't grow as you increase the offset of the query (unlike my first LEFT JOIN attempt). It stays the same, even if you do a sort on the clicks column.
You should have a combined index on views_sum.index and views_sum.key. I suspect you will always use both fields together if i look at the names. Also, I would rewrite the first query to use a proper INNER JOIN clause instead of a filtered cartesian product.
I suspect the performance of both queries will be much closer together if you do this. And, more importantly, much faster than they are now.
edit: Thinking about it, I would probably add a third column to that index: views_sum.clicks, which probably can be used for the SUM. But remember that multi-column indexes can only be used left to right.
It's all about the indexes. You'll have to play around with it a bit or post your database schema on here. Just as a rough guess i'd say you should make sure you have an index on views_sum.key.
Normally, a LEFT JOIN will be slower than an INNER JOIN or a CROSS JOIN because it has to view the first table differently. Put another way, the difference in time isn't related to the size of the result, but the full size of the left table.
I also wonder if you're asking MySQL to figure things out for you that you should be doing yourself. Specifically, that SUM() function would normally require a GROUP BY clause.