MySQL Query Optimization using multiple joins

MySQL Query Optimization using multiple joins - mysql

I'm having trouble optimizing a query and could use some help. I'm currently pulling in events in a system that has to join several other tables to make sure the event is supposed to display, etc... The query was running smoothly (around 480ms) until I introduced another table in the mix. The query is as follows:
SELECT
keyword_terms,
`esf`.*,
`venue`.`name` AS venue_name,
...
`venue`.`zip`, ase.region_id,
(DATE(NOW()) BETWEEN...AND ase.region_id IS NULL) as featured,
getDistance(`venue`.`lat`, `venue`.`lng`, 36.073, -79.7903) as distance,
`network_exclusion`.`id` as net_exc_id
FROM (`event_search_flat` esf)
# Problematic part of query (pulling in the very next date for the event)
LEFT JOIN (
SELECT event_id, MIN(TIMESTAMP(CONCAT(event_date.date, ' ', event_date.end_time))) AS next_date FROM event_date WHERE
event_date.date >= CURDATE() OR (event_date.date = CURDATE() AND TIME(event_date.end_time) >= TIME(NOW()))
GROUP BY event_id
) edate ON edate.event_id=esf.object_id
# Pull in associated ad space
LEFT JOIN `ad_space` ads ON `ads`.`data_type`=`esf`.`data_type` AND ads.object_id=esf.object_id
# and make sure it is featured within region
LEFT JOIN `ad_space_exclusion` ase ON ase.ad_space_id=ads.id AND region_id =5
# Get venue details
LEFT JOIN `venue` ON `esf`.`venue_id`=`venue`.`id`
# Make sure this event should be listed
LEFT JOIN `network_exclusion` ON network_exclusion.data_type=esf.data_type
AND network_exclusion.object_id=esf.object_id
AND network_exclusion.region_id=5
WHERE `esf`.`event_type` IN ('things to do')
AND (`edate`.`next_date` >= '2013-07-18 16:23:53')
GROUP BY `esf`.`esf_id`
HAVING `net_exc_id` IS NULL
AND `distance` <= 40
ORDER BY DATE(edate.next_date) asc,
`distance` asc
LIMIT 6
It seems that the issue lies with the event_date table, but I'm unsure how to optimize this query (I tried various views, indexes, etc... to no avail). I ran EXPLAIN and received the following: http://cl.ly/image/3r3u1o0n2A46 .
At the moment, the query is taking 6.6 seconds. Any help would be greatly appreciated.

You may be able to get Using index on the event_date subquery by creating a compound index over (event_id, date, end_time). That may turn the subquery into an index-only query, which should speed it up slightly.
The subquery might be better written as the following, without GROUP BY:
SELECT event_id, TIMESTAMP(CONCAT(event_date.date, ' ', event_date.end_time))) AS next_date
FROM event_date
WHERE event_date.date >= CURDATE()
OR (event_date.date = CURDATE() AND TIME(event_date.end_time) >= TIME(NOW()))
ORDER BY next_date LIMIT 1
I'm more concerned that your EXPLAIN shows so many tables with type=ALL. That means it has to read every row from those tables and compare to them rows in other tables. You can get an idea of how much work it's doing by multiplying the values in the rows column. Basically, it's making billions of row comparisons to resolve the joins. As the tables grow, this query will get a lot worse.
Using LEFT [OUTER] JOIN has a specific purpose, and if you really mean to use INNER JOIN you should do that, because using an outer join where it doesn't belong can mess up the optimization. Use an outer join like A LEFT JOIN B only if you want rows in A that may not have matching rows in B.
For example, I assume based on column naming convention that LEFT JOIN venue ON esf.venue_id=venue.id should be an inner join, because there should always be a venue referenced by esf.venue_id (unless esf.venue_id is sometimes null).
event_search_flat should have a compound index with columns used in the WHERE clause first, then columns to join to other tables: (event_type, object_id, data_type, event_id)
ad_space should have a compound index for the join: (data_type, object_id). Does this need to be an inner join too?
ad_space_exclusion should have a compound index for the join: (ad_space_id, region_id)
network_exclusion should have a compound index for the join: (data_type, object_id, region_id)
venue is okay because it's doing a primary key lookup already.

Related

How to optimize query which has to get last 5 rows with joined table

I have two tables events and user_device. events and user_device has a common field device_id.
user_device table has fields user_id and device_id. Basically, user_device table holding all devices belongs to users.
events table holding all events belongs to devices.
Now I want to get last 5 alerts for a specific user.
So I have made a query by joining both tables like below.
SELECT *
FROM events
LEFT JOIN user_device ON user_device.deviceid=events.deviceid
WHERE user_device.userid=101
ORDER BY events.id DESC
LIMIT 5
events table has more than 4 million records. This query takes 30 seconds to return the results.
If I remove ORDER BY, the query takes only two seconds.
How can I optimize this?

First: don't use SELECT *. Instead, give the names of the columns you want.
Second: You're looking for an equality match on user_device.userid. So you need an index on user_device starting with the userid column. You're then employing the value of deviceid in that same table. So, create this index. It's called a covering index.
ALTER TABLE user_device ADD INDEX x_user_device (userid, deviceid);
Third: You're looking up rows in events by deviceid, then ordering by id. So you need another covering index on those two columns.
ALTER TABLE events ADD INDEX x_device_id (deviceid, id);
Fourth: you mention a column from your LEFT JOINed table in a WHERE clause. That converts the LEFT JOIN to an ordinary inner JOIN. So use JOIN.
Fifth: SELECT * ... ORDER BY ... LIMIT is a notorious performance antipattern. Why? It has to order a whole mess of records, just to discard all but a few. Try this instead. First get the relevant events.id values with a subquery.
SELECT events.id
FROM events
JOIN user_device ON user_device.deviceid=events.deviceid
WHERE user_device.userid=101
ORDER BY events.id DESC
LIMIT 5
Test that subquery. It should give you five relevant event id values, and it should do it very quickly indeed. Then use this subquery to look up the details you need from your two tables:
SELECT events.*, user_device.* /* not optimal. list only the columns you need */
FROM (
SELECT events.id
FROM events
JOIN user_device ON user_device.deviceid=events.deviceid
WHERE user_device.userid=101
ORDER BY events.id DESC
LIMIT 5
) sel
JOIN events ON sel.id = events.id
JOIN user_device ON events.deviceid = user_device.deviceid
This is called the deferred join query pattern. It does all the ordering on just a pile of id values, then pulls out only a few records.
This should help you keep performance in check as your database grows.

The fact that this takes longer than usual is most likely because of lack of indexes on the table. Add an indexes for deviceId and userId will help a lot with the speed of the query.
SELECT *
FROM events
LEFT JOIN user_device ON user_device.**deviceid**=events.**deviceid**
WHERE user_device.**userid**=101
ORDER BY events.id DESC
LIMIT 5
Bold text requires indexes for quick 'hooks'. the Order by doesn't requre an index.

mysql query is taking too much time to execute

Hello everyone I am working on phpmyadmin database. Whenever I try to execute query it takes too much time more than 10 mins to show results. Is there any way to speed it up. please response.
The query is
SELECT ib.*, b.brand_name, m.model_name,
s.id as sale_id, br.branch_code,br.branch_name,r.rentry_date,r.id as rid
from in_book ib
left join brand b on ib.brand_id=b.id
left join model m on ib.vehicle_id=m.id
left join re_entry r on r.in_book_id=ib.id
left join sale s on ib.id=s.in_book_id
left join branch br on ib.branch_id=br.id
where ib.id !=''
and ib.branch_id='65'
group by ib.id
order by r.id ASC,
count(r.in_book_id) DESC ,
ib.purchaes_date ASC,
ib.id ASC
there are almost 7 tables

make sure you got an index on every key you use to join the tables.
from http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html:
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
.. this of course also applies to the JOIN conditions.

You don't list any such indexes, however, I would start with the following suggested indexes
table index
in_book ( branch_id, id, brand_id, vehicle_id )
brand ( id, brand_name )
model ( id, model_name )
re_entry ( in_book_id, id, reentry_date )
sale ( in_book_id, id )
branch ( id )
Also, with MySQL, you can use a special keyword "STRAIGHT_JOIN" which tells the engine to query in the order you have selected the tables... Although you are doing LEFT JOINs, I don't think it will matter as it appears the secondary tables are all lookup type of tables and in_book is your primary. But as just a try it would be..
SELECT STRAIGHT_JOIN (...rest of query...)

how can I make this query more efficient?

edit: here is a simplified version of the original query (runs in 3.6 secs on a products table of 475K rows)
SELECT p.*, shop FROM products p JOIN
users u ON p.date >= u.prior_login and u.user_id = 22 JOIN
shops s ON p.shop_id = s.shop_id
ORDER BY shop, date, product_id;
this is the explain plan
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u const PRIMARY,prior_login,user_id PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE s ALL PRIMARY NULL NULL NULL 90
1 SIMPLE p ref shop_id,date,shop_id_2,shop_id_3 shop_id 4 bitt3n_minxa.s.shop_id 5338 Using where
the bottleneck seems to be ORDER BY date,product_id. Removing these two orderings, the query runs in 0.06 seconds. (Removing either one of the two (but not both) has virtually no effect, query still takes over 3 seconds.) I have indexes on both product_id and date in the products table. I have also added an index on (product,date) with no improvement.
newtover suggests the problem is the fact that the INNER JOIN users u1 ON products.date >= u1.prior_login requirement is preventing use of the index on products.date
Two variations of the query that execute in ~0.006 secs (as opposed to 3.6 secs for the original) have been suggested to me (not from this thread).
this one uses a subquery, which appears to force the order of the joins
SELECT p.*, shop
FROM
(
SELECT p.*
FROM products p
WHERE p.date >= (select prior_login FROM users where user_id = 22)
) as p
JOIN shops s
ON p.shop_id = s.shop_id
ORDER BY shop, date, product_id;
this one uses the WHERE clause to do the same thing (although the presence of SQL_SMALL_RESULT doesn't change the execution time, 0.006 secs without it as well)
SELECT SQL_SMALL_RESULT p . * , shop
FROM products p
INNER JOIN shops s ON p.shop_id = s.shop_id
WHERE p.date >= (
SELECT prior_login
FROM users
WHERE user_id =22 )
ORDER BY shop, DATE, product_id;
My understanding is that these queries work much faster on account of reducing the relevant number of rows of the product table before joining it to the shops table. I am wondering if this is correct.

Use the EXPLAIN statement to see the execution plan. Also you can try adding an index to products.date and u1.prior_login.
Also please just make sure you have defined your foreign keys and they are indexed.
Good luck.

We do need an explain plan... but
Be very careful of select * from table where id in (select id from another_table) This is a notorious. Generally these can be replaced by a join. The following query might run, although I haven't tested it.
SELECT shop,
shops.shop_id AS shop_id,
products.product_id AS product_id,
brand,
title,
price,
image AS image,
image_width,
image_height,
0 AS sex,
products.date AS date,
fav1.favorited AS circle_favorited,
fav2.favorited AS session_user_favorited,
u2.username AS circle_username
FROM products
LEFT JOIN favorites fav2
ON fav2.product_id = products.product_id
AND fav2.user_id = 22
AND fav2.current = 1
INNER JOIN shops
ON shops.shop_id = products.shop_id
INNER JOIN users u1
ON products.date >= u1.prior_login AND u1.user_id = 22
LEFT JOIN favorites fav1
ON products.product_id = fav1.product_id
LEFT JOIN friends f1
ON f1.star_id = fav1.user_id
LEFT JOIN users u2
ON fav1.user_id = u2.user_id
WHERE f1.fan_id = 22 OR fav1.user_id = 22
ORDER BY shop,
DATE,
product_id,
circle_favorited

the fact that the query is slow because of the ordering is rather obvious since it is hard to find an index that would to apply ORDER BY in this case. The main problem is products.date >= comparison which breaks using any index for ORDER BY. And since you have a lot of data to output, MySQL starts using temporary tables for sorting.
what i would to is to try to force MySQL output data in the order of an index which already has the required order and remove the ORDER BY clause.
I am not at a computer to test, but how would I do it:
I would do all inner joins
then I would LEFT JOIN to a subquery which makes all computations on favorites ordered by product_id, circle_favourited (which would provide the last ordering criterion).
So, the question is how to make the data be sorted on shop, date, product_id
I am going to write about it a bit later =)
UPD1:
You should probably read something on how btree indexes work in MySQL. There is a good article on mysqlperformanceblog.com about it (I currently write from a mobile and don't have the link at hand). In short, you seem to talk about one-column indexes which arrange pointers to rows based on values sorted in a single column. Compound indexes store an order based on several columns. Indexes mostly used to operate on clearly defined ranges of them to obtain most of the information before retrieving data from the rows they point at. Indexes usually do not know about other indexes on the same table, as result they are rarely merged. when there is no more info to take from the index, MySQL starts to operate directly on data.
That is an index on date can not make use of the index on product_id, but an index on (date, product_id) can get some more info on product_id after a condition on date (sort on product id for a specific date match).
Nevertheless, a range condition on date (>=) breaks this. That is what I was talking about.
UPD2:
As I uderstand the problem can be reduced to (most of the time it spends on that):
SELECT p.*, shop
FROM products p
JOIN users u ON p.`date` >= u.prior_login and u.user_id = 22
JOIN shops s ON p.shop_id = s.shop_id
ORDER BY shop, `date`, product_id;
Now add an index (user_id, prior_login) on users and (date) on products, and try the following query:
SELECT STRAIGHT_JOIN p.*, shop
FROM (
SELECT product_id, shop
FROM users u
JOIN products p
user_id = 22 AND p.`date` >= prior_login
JOIN shops s
ON p.shop_id = s.shop_id
ORDER BY shop, p.`date`, product_id
) as s
JOIN products p USING (product_id);
If I am correct the query should return the same result but quicker. If would be nice if you would post the result of EXPLAIN for the query.

How can I make these two queries into one?

I have two tables, one for downloads and one for uploads. They are almost identical but with some other columns that differs them. I want to generate a list of stats for each date for each item in the table.
I use these two queries but have to merge the data in php after running them. I would like to instead run them in a single query, where it would return the columns from both queries in each row grouped by the date. Sometimes there isn't any download data, only upload data, and in all my previous tries it skipped the row if it couldn't find log data from both rows.
How do I merge these two queries into one, where it would display data even if it's just available in one of the tables?
SELECT DATE(upload_date_added) as upload_date, SUM(upload_size) as upload_traffic, SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
SELECT DATE(download_date_added) as download_date, SUM(download_size) as download_traffic, SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
I want to get result rows like this:
date, upload_traffic, upload_files, download_traffic, download_files
All help appreciated!

Your two queries can be executed and then combined with the UNION cluase along with an extra field to identify Uploads and Downloads on separate lines:
SELECT
'Uploads' TransmissionType,
DATE(upload_date_added) as TransmissionDate,
SUM(upload_size) as TransmissionTraffic,
SUM(upload_files) as TransmittedFileCount
FROM
packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
'Downloads',
DATE(download_date_added),
SUM(download_size),
SUM(download_files)
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC;
Give it a Try !!!

What you're asking can only work for rows that have the same add date for upload and download. In this case I think this SQL should work:
SELECT
DATE(u.upload_date_added) as date,
SUM(u.upload_size) as upload_traffic,
SUM(u.upload_files) as upload_files,
SUM(d.download_size) as download_traffic,
SUM(d.download_files) as download_files
FROM
packages_uploads u, packages_downloads d
WHERE u.upload_date_added = d.download_date_added
AND u.upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY date
ORDER BY date DESC

Without knowing the schema is hard to give the exact answer so please see the following as a concept not a direct answer.
You could try left join, im not sure if the table package exists but the following may be food for thought
SELECT
p.id,
up.date as upload_date
dwn.date as download_date
FROM
package p
LEFT JOIN package_uploads up ON
( up.package_id = p.id WHERE up.upload_date = 'etc' )
LEFT JOIN package_downloads dwn ON
( dwn.package_id = p.id WHERE up.upload_date = 'etc' )
The above will select all the packages and attempt to join and where the value does not join it will return null.

There is number of ways that you can do this. You can join using primary key and foreign key. In case if you do not have relationship between tables,
You can use,
LEFT JOIN / LEFT OUTER JOIN
Returns all records from the left table and the matched
records from the right table. The result is NULL from the
right side when there is no match.
RIGHT JOIN / RIGHT OUTER JOIN
Returns all records from the right table and the matched
records from the left table. The result is NULL from the left
side when there is no match.
FULL OUTER JOIN
Return all records when there is a match in either left or right table records.
UNION
Is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of,
columns The columns must also have similar data types The columns in,
each SELECT statement must also be in the same order.
INNER JOIN
Select records that have matching values in both tables. -this is good for your situation.
INTERSECT
Does not support MySQL.
NATURAL JOIN
All the column names should be matched.
Since you dont need to update these you can create a view from joining tables then you can use less query in your PHP. But views cannot update. And you did not mentioned about relationship between tables. Because of that I have to go with the UNION.
Like this,
CREATE VIEW checkStatus
AS
SELECT
DATE(upload_date_added) as upload_date,
SUM(upload_size) as upload_traffic,
SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
DATE(download_date_added) as download_date,
SUM(download_size) as download_traffic,
SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
Then anywhere you want to select you just need one line:
SELECT * FROM checkStatus
learn more.

A better way to build this MySQL statement with subselects

I have five tables in my database. Members, items, comments, votes and countries. I want to get 10 items. I want to get the count of comments and votes for each item. I also want the member that submitted each item, and the country they are from.
After posting here and elsewhere, I started using subselects to get the counts, but this query is taking 10 seconds or more!
SELECT `items_2`.*,
(SELECT COUNT(*)
FROM `comments`
WHERE (comments.Script = items_2.Id)
AND (comments.Active = 1))
AS `Comments`,
(SELECT COUNT(votes.Member)
FROM `votes`
WHERE (votes.Script = items_2.Id)
AND (votes.Active = 1))
AS `votes`,
`countrys`.`Name` AS `Country`
FROM `items` AS `items_2`
INNER JOIN `members` ON items_2.Member=members.Id AND members.Active = 1
INNER JOIN `members` AS `members_2` ON items_2.Member=members.Id
LEFT JOIN `countrys` ON countrys.Id = members.Country
GROUP BY `items_2`.`Id`
ORDER BY `Created` DESC
LIMIT 10
My question is whether this is the right way to do this, if there's better way to write this statement OR if there's a whole different approach that will be better. Should I run the subselects separately and aggregate the information?

Yes, you can rewrite the subqueries as aggregate joins (see below), but I am almost certain that the slowness is due to missing indices rather than to the query itself. Use EXPLAIN to see what indices you can add to make your query run in a fraction of a second.
For the record, here is the aggregate join equivalent.
SELECT `items_2`.*,
c.cnt AS `Comments`,
v.cnt AS `votes`,
`countrys`.`Name` AS `Country`
FROM `items` AS `items_2`
INNER JOIN `members` ON items_2.Member=members.Id AND members.Active = 1
INNER JOIN `members` AS `members_2` ON items_2.Member=members.Id
LEFT JOIN (
SELECT Script, COUNT(*) AS cnt
FROM `comments`
WHERE Active = 1
GROUP BY Script
) AS c
ON c.Script = items_2.Id
LEFT JOIN (
SELECT votes.Script, COUNT(*) AS cnt
FROM `votes`
WHERE Active = 1
GROUP BY Script
) AS v
ON v.Script = items_2.Id
LEFT JOIN `countrys` ON countrys.Id = members.Country
GROUP BY `items_2`.`Id`
ORDER BY `Created` DESC
LIMIT 10
However, because you are using LIMIT 10, you are almost certainly as well off (or better off) with the subqueries that you currently have than with the aggregate join equivalent I provided above for reference.
This is because a bad optimizer (and MySQL's is far from stellar) could, in the case of the aggregate join query, end up performing the COUNT(*) aggregation work for the full contents of the Comments and Votes table before wastefully throwing everything but 10 values (your LIMIT) away, whereas in the case of your original query it will, from the start, only look at the strict minimum as far as the Comments and Votes tables are concerned.
More precisely, using subqueries in the way that your original query does typically results in what is called nested loops with index lookups. Using aggregate joins typically results in merge or hash joins with index scans or table scans. The former (nested loops) are more efficient than the latter (merge and hash joins) when the number of loops is small (10 in your case.) The latter, however, get more efficient when the former would result in too many loops (tens/hundreds of thousands or more), especially on systems with slow disks but lots of memory.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Query Optimization using multiple joins - mysql

Related

How to optimize query which has to get last 5 rows with joined table

mysql query is taking too much time to execute

how can I make this query more efficient?

How can I make these two queries into one?

A better way to build this MySQL statement with subselects

Categories

Resources