It is common to use SELECT within SELECT to reduce the number of queries; but as I examined this leads to slow query (which is obviously harmful for mysql performance). I had a simple query as
SELECT something
FROM posts
WHERE id IN (
SELECT tag_map.id
FROM tag_map
INNER JOIN tags
ON tags.tag_id=tag_map.tag_id
WHERE tag IN ('tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6')
)
This leads to slow queries of "query time 3-4s; lock time about 0.000090s; with about 200 rows examined".
If I split the SELECT queries, each of them will be quite fast; but this will increase the number of queries which is not good at high concurrency.
Is it the usual situation, or something is wrong with my coding?
In MySQL, doing a subquery like this is a "correlated query". This means that the results of the outer SELECT depend on the result of the inner SELECT. The outcome is that your inner query is executed once per row, which is very slow.
You should refactor this query; whether you join twice or use two queries is mostly irrelevant. Joining twice would give you:
SELECT something
FROM posts
INNER JOIN tag_map ON tag_map.id = posts.id
INNER JOIN tags ON tags.tag_id = tag_map.tag_id
WHERE tags.tag IN ('tag1', ...)
For more information, see the MySQL manual on converting subqueries to JOINs.
Tip: EXPLAIN SELECT will show you how the optimizer plans on handling your query. If you see DEPENDENT SUBQUERY you should refactor, these are mega-slow.
You could improve it by using the following:
SELECT something
FROM posts
INNER JOIN tag_map ON tag_map.id = posts.id
INNER JOIN tags
ON tags.tag_id=tag_map.tag_id
WHERE <tablename>.tag IN ('tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6')
Just make sure you only select what you need and do not use *; also state in which table you have the tag column so you can substitute <tablename>
Join does filtering of results. First join will keep results having 1st ON condition satisfied and then 2nd condition gives final result on 2nd ON condition.
SELECT something
FROM posts
INNER JOIN tag_map ON tag_map.id = posts.id
INNER JOIN tags ON tags.tag_id = tag_map.tag_id AND tags.tag IN ('tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6');
You can see these discussions on stack overflow :
question1
question2
Join helps to decrease time complexity and increases stability of server.
Information for converting sub queries to joins:
link1
link2
link3
Related
I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.
I have the performance problem with query that have order by and group by. I have checked similar problems on SO but I did not find the solution to this:(
I have something like this in my db schema:
pattern has many pattern_file belongs to project_template which belongs to project
Now I want to get projects filtered by some data(additional tables that I join) and want to get the result ordered for example by projects.priority and grouped by patterns.id. I have tried many things and to get the desired result I've figured out this query:
SELECT DISTINCT `projects`.* FROM `projects`
INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
INNER JOIN (SELECT DISTINCT projects.id FROM `projects` INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
WHERE [here my conditions] ORDER BY [here my order]) P
ON P.id = projects.id
WHERE [here my conditions]
GROUP BY patterns.id
ORDER BY [here my order]
From my research I have to INNER JOIN with subquery to conquer the problem "ORDER BY before GROUPing BY" => then I have put the same conditions on the outer query for performance purpose. The order by I had to use again in the outer query too, otherwise the result will be sorted by default.
Now there is real performance problem as I have about 6k projects and when I run this query without any conditions it takes about 15s :/ When I narrow the result by specify the conditions the time drastically dropped down. I've found somewhere that the subquery is run for every outer query row result which could be true when you watch at the execution time :/
Could you please give some advice how I can optimize the query? I do not work much with sql so maybe I do it from the wrong side from the very beginning?
P.S. I have tried WHERE projects.id IN (Select project.id FROM projects ....) and that discarded the performance issue but also discarded the ORDER BY before GROUPing BY
EDIT.
I want to retrieve list of projects, but I want also to filter it and order, and finally I want to get patterns.id unique(that is why I use the group by).
order by in your inner query (p) doesn't make sense (any inner sort will only
have an arbitrary effect).
#Solarflare Unfortunately it does. group by will take first row from grouped result. It preserve the order for join. Well, I believe that it is specific to MySql. Furthermore to keep the order from subquery I could use ORDER BY NULL in outer query :-)
Also, select projects.* ... group by pattern.id is fishy (although MySQL, in contrast to every other dbms, allows you to do this)
so we can assume I retrieve only projects.id, but from docs:
MySQL extends the use of GROUP BY to permit selecting fields that are not mentioned in the GROUP BY clause
I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.
I have two tables:
Shop_Products
Shop_Products_Egenskaber_Overruling
I want to select all records in Shop_Products_Egenskaber_Overruling which has a related record in
Shop_Products. This Means a record with an equal ProductNum.
This Works for me with the statement below, but I don't think a CROSS JOIN is the best approach for large record sets. When using the statement in web controls, it becomes pretty slow, even with only 1000 records. Is there a better way to accomplish this?
SELECT Shop_Products.*, Shop_Products_Egenskaber_Overruling.*
FROM Shop_Products CROSS JOIN
Shop_Products_Egenskaber_Overruling
WHERE Shop_Products.ProductNum = Shop_Products_Egenskaber_Overruling.ProductNum
Any optimizing suggestions?
Best regards.
You can do it that way but not sure it will ensure an optimization
SELECT Shop_Products.*, Shop_Products_Egenskaber_Overruling.*
FROM Shop_Products
INNER JOIN Shop_Products_Egenskaber_Overruling on Shop_Products.ProductNum = Shop_Products_Egenskaber_Overruling.ProductNum
You are actually looking for an INNER JOIN.
SELECT
SO.*,
SPEO.*
FROM SHOP_PRODUCTS SP
INNER JOIN Shop_Products_Egenskaber_Overruling SPEO
ON SP.ProductNum = SPEO.ProductNum
This will have improved performance over your CROSS-JOIN, because the condition to look for records with equal ProductNum is implicit in the JOIN condition and the WHERE clause is eliminated.
WHERE clauses always execute AFTER a JOIN. In your case, all possible combinations are created by the CROSS JOIN and then filtered by the conditions in the WHERE clause.
By using an INNER JOIN you are doing the filtering in the first step.
Cross join is slower, because it produce all combinations, which filtred after by where predicate. So you can use INNER JOIN for better performance. But I think It would be useful if you check execution plan of this query anyway, because in Oracle there is no difference between where and inner join solutions Inner join vs Where
Try using INNER JOIN
SELECT Produkter.*, Egenskaber.*
FROM Shop_Products Produkter
INNER JOIN Shop_Products_Egenskaber_Overruling Egenskaber ON Produkter.ProductNum=Egenskaber.ProductNum
Jag namngav aven dem pa Norska..
I have something similar to the following:
SELECT c.id
FROM contact AS c
WHERE c.id IN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%')
ORDER BY c.name
The problem is that the query takes a very very very long time (>2minutes), but if I take the subquery, run it separately, implode the ids and insert them into the main query, it runs in well less than 1 second, including the data retrival and implosion.
I have checked the explains on both methods and keys are being used appropriately and the same ways. The subquery doesn't return more than 200 IDs.
What could be causing the subquery method to take so much longer?
BTW, I know the query above can be written with joins, but the query I have can't be--this is just a simplified version.
Using MySQL 5.0.22.
Sounds suspiciously like MySQL bug #32665: Query with dependent subquery is too slow.
What happens if you try it like this?
SELECT c.id
FROM contact AS c
INNER JOIN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%') subq ON subq.contact_id=c.id
ORDER BY c.name
Assuming that the result of s.contact_id is unique. You can add distinct to the subquery if it is not.
I always use uncorrelated subqueries this way rather than using the IN operator in the where clause.
Have you checked the Execution Plan for the query? This will usually show you the problem.
Can't you do another join instead of a subquery?
SELECT c.id
FROM contact AS c
JOIN sub_table AS s on c.id = s.contact_id
LEFT JOIN contact_sub AS cs ON (s.id = cs.sub_field)
WHERE cs.phone LIKE '535%'
ORDER BY c.name
Since the subquery is referring to a field sub_field in the outer select, it has to be run once for each row in the outer table - the results for the inner query will change with each row in the outer table.
It's a correlated subquery. It runs once for each row in the outer select. (I think. You have two tables with the same correlation name, I'm assuming that's a typo. That you say it can't be rewritten as a join means it's correlated. )
Ok, I'm going to give you something to try. You say that the subquery is not correlated, and that you still can't join on it. And that it you take the output of the subquery, and lexically substitute that for the subquery, the main query runs much faster.
So try this: make the subquery into a view: create view foo followed by the text of the subquery. Then rewrite the main query to get rid of the "IN" clause and instead join to the view.
How's the timing on that?