Better way to accomplish Nested SQL Query? - mysql

Right now I'm implementing the following sql query for an iphone-app, and I'm using HTTP GET. The SQL query does not contain joins, so is it efficient enough?
SELECT
menu_name
FROM Menus
WHERE
menu_id IN (
SELECT
menus_id
FROM Restaurants_Menus
WHERE Restaurants_id = '$restaurantID'
)

Only you can answer if it is efficient enough. If it meets your needs, then it is fine. However, it may be faster if you use a JOIN:
SELECT
Menus.menu_name
FROM
Menus
JOIN Restaurants_Menus ON Menus.menu_id = Restaurants_Menus.menus_id
WHERE Restaurants_Menus.Restaurants_id = '$restaurantID'
You can run them both with EXPLAIN to determine where indexes are being used and judge the query execution time. If Restaurants_Menus is not a large table, and Restaurants_id is a primary key, the two queries are not likely to differ much in execution time.

Related

explain plan meanings in mysql

i use explain plan,but i am confused what is its real meaning.
explain extended
select *
from (select type_id from con_consult_type cct
where cct.consult_id = (select id
from con_consult
where id = 1))
cctt left join con_type ct on cctt.type_id = ct.id;
the results is
i google the derived is temporary table,but what is its sql of the temporary table?is ctt table?
and the step 2,is result of cctt left join con_type ct on cctt.type_id = ct.id?
the FK_CONSULT_TO_CONSULT_TYPE is consult_id refer con_consult id column,
how to use the index in the sql?
get all results of ctt,and then use the index filter?
please help me explain what the explain meanings.
This is a bad query to learn the basics of the explain output, there is simply too much happening with all the sub queries, and joins.
I can give a run down of some of the essentials;
'rows' column: Less is better, it shows how many rows had to be scanned by the database, anything less than a couple of hundred is good, generally indicates how well it is able to find your data from the indexes;
'possible_keys': and 'keys': If 'rows' is big, you may have to tweek your keys to provide the engine with some help finding your data
'type': Type of join
To answer some of your questions;
'sql of the temporary table' - it's the first subquery in your sql
With FK_CONSULT_TO_CONSULT_TYPE you dont have to do anything, the engine has allready picked this up as an index which is what the explain is saying.
Queries are broken into 3 essentials steps; select data, filter, and join. Each row in the explain is a detail into one or more of these operations, it may not necessarily relate to a specific section of your SQL as the engine may have combined various parts into one.

EXISTS vs ALL, ANY, SOME

I'm trying to understand the difference between EXISTS and ALL in MySQL. Let me give you an example:
SELECT *
FROM table1
WHERE NOT EXISTS (
SELECT *
FROM table2
WHERE table2.val < table1.val
);
SELECT *
FROM table1
WHERE val <= ALL( SELECT val FROM table2 );
A quote from MySQL docs:
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference. [1]
Reading this, it seems to me that mysql should be able to translate both queries to the same relational algebra expression. Both queries are just a simple comparison between values from two tables. However, that doesn't seem to be the case. I tried both queries and the second one performs much better than the first one.
How are this queries exactly handled by the optimizer?
Why the optimizer can't make the first query perform as the second one?
Is it always more efficient to use an ALL/ANY/SOME condition?
The queries in your question are not equivalent, so they will have different execution plans regardless of how well they're optimized. If you used NOT val > ANY(...) then it would be equivalent.
You should always use EXPLAIN to see the execution plan of a query and realize that the execution plan can change as your data changes. Testing and understanding the execution plan will help you determine which methods perform better. There is no hard and fast rule for ALL/ANY/SOME and they're often optimized down to an EXISTS.

SQL: difference between INNER JOIN and INNER SELECT in particular case

I start to learn SQL. And I find that we often can achieve the same result with help of JOINs or Inner Select statements.
Question1 (broad): Where JOINs are faster than inner selects and vise versa?
Question2 (narrow): Can you explain me what causes performance difference of three queries below?
P.S. There is very nice site which calculates query performance, but I can't understand it estimation results.
Query1:
SELECT DISTINCT maker
FROM Product pro INNER JOIN Printer pri
on pro.model = pri.model
Query2:
SELECT DISTINCT maker
FROM Product
WHERE model IN (
SELECT model FROM Printer
)
Query3:
SELECT distinct maker
FROM Product pro, Printer pri
WHERE pro.model = pri.model
When the server evaluate a JOIN it matches the join equivalence scanning only the columns needed only for the value in the other table, and filter out everything else, it is usually done with a specific action.
When you have a subquery the server need to evaluate the plan for the subquery before the JOIN equivalence match, so if the subquery doesn't make up for the extra effort filtering out a lot of noise you have a better perfomance without it.
The server are quite smart, and they try to shave everything they don't need to evaluate the join. Then they try to use every index they can to have the best performance, where the best performance mean the best they can find in a limited amount of time, so that the plan time itself don't kill the performance.
Added after the comment of the OP
The O(n) estimation depent on the complexity of the query and the subquery, if you are interested on the query plan building you'll have to navigate the help section of your database of choice and probably you will not find a lot, if the DB is not opensource.
In layman term:
a the simple join is evaluated on one level, the main query plan
a sub query is evaluated on two level, the subquery plan and the main query plan.
Some DB IDE can display a visual rappresentation of the total plan, that usually help to understand some of those point (I don't know if mySQL has that)
Query1 is faster in general but RDBMC could optimize the Query2 to provide approximately the same result.
If the IN subquery rather complicated with dependencies from main table(s) it could be executed for each row retrieved to check the condition.
Normally INNER JOIN is to join two different table values ,where as INNER SELECT is to select a particular value from a different table and use the result to produce a single output.

Performans of nested queries

I want to ask a question about database queries. In case of query such like where clause of the query is coming from the another query. For example
select ? from ? where ? = select ? from ?
This is the simple example so it is easy to write this. But for the more complex case, i want to know what is the best way in case of performance. Join? seperate queries? nested or another?
Thank you for answers.
Best Regards.
You should test it. These things depend a lot on the details of the query and of the indices it can use.
In my experience JOINs tend to be faster than nested queries in MySQL. In some cases MySQL isn't very smart and appears to run the subquery for every row produced by the outer query.
You can read more about these things in the official documentation:
Optimizing subqueries: http://dev.mysql.com/doc/refman/5.6/en/optimizing-subqueries.html
Rewriting subqueries as joins: http://dev.mysql.com/doc/refman/5.6/en/rewriting-subqueries.html
This is case dependent. In case you have a very less result in the inner query you should go for it. The flow works in the manner where in the inner query is executed first and the result set is being used in the outer query.
Meanwhile joins give you a Cartesian product which is again a heavy operation.
As Mitch and Joni stated, it depends. But generally a join will offer the best performance. You're trying to avoid running the nested query for each row of the outer query. A good query optimizer may do this for you anyway, by interpreting what you're trying to do and essentially "fixing" your mistake. But with the vast majority of queries, you should be writing it as a join in the first place. That way you're being explicit about what you're trying to do and you're fully understanding yourself what is being done, and what the most efficient way to do the work is.
I EXPECT the joins to be quicker, mainly because you have an equivalence and an explicit JOIN. Still use explain to see the differences in how the SQl engine will interpret them.
I would not expect these to be so different, where you can get real, large performance gains in using joins instead of subqueries is when you use correlated subqueries.
Since almost everyone is saying that joins will give the optimal performance I just logged in to say the exact opposite experience I had.
So some days back I was writing a query for 3-4 tables which had huge amount of data. I wrote a big sql query with joins and it was taking around 2-3 hours to execute it. Then I restructured it, created a nested select query, put as many where constraints as I can inside the nested one & made it as stricter as possible and then the performance improved by >90%, it now takes less than 4 mins to run.
This is just my experience and may be theoretically joins are better. I just felt to share my experience. Its better to try out different things, getting additional knowledge about the tables, it's indexes etc would help a lot.
Update:
And I just found out what I did is actually suggested in this optimization reference page of MySQL. http://dev.mysql.com/doc/refman/5.6/en/optimizing-subqueries.html
Pasting it here for quick reference:
Replace a join with a subquery. For example, try this:
SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN ( SELECT column1
FROM t2);
Instead of this:
SELECT DISTINCT t1.column1 FROM t1, t2 WHERE t1.column1 =
t2.column1;
Move clauses from outside to inside the subquery. For example, use
this query:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1
FROM t2); Instead of this query:
SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1
FROM t2); For another example, use this query:
SELECT (SELECT column1 + 5 FROM t1) FROM t2; Instead of this query:
SELECT (SELECT column1 FROM t1) + 5 FROM t2;

Should criteria be duplicated on subqueries

I have a query which actually runs two queries on a table. I query the whole table, a datediff and then a subquery which tells me the sum of hours each unit spent in certain operational steps. The main query limits the results to the REP depot so technically I don't need to put that same criteria on the subquery since repair_order is unique.
Would it be faster, slower or no difference to apply the depot filter on the subquery?
SELECT
*,
DATEDIFF(date_shipped, date_received) as htg_days,
(SELECT SUM(t3.total_days) FROM report_tables.cycle_time_days as t3 WHERE t1.repair_order=t3.repair_order AND (operation='MFG' OR operation='ENG' OR operation='ENGH' OR operation='HOLD') GROUP BY t3.repair_order) as subt_days
FROM
report_tables.cycle_time_days as t1
WHERE
YEAR(t1.date_shipped)=2010
AND t1.depot='REP'
GROUP BY
repair_order
ORDER BY
date_shipped;
I run into this with a lot of situations but I never know if it would be better to put the filter in the sub query, main query or both.
In this example, it would actually alter the query if you moved your WHERE clause to filter by REP into the subquery. So it wouldn't be about performance at that point, it would be about getting the same result set. In general, though, if you will get the same exact result set by moving a WHERE clause elsewhere in a complex query, it is better to do so at the most atomic level possible, ie, in the subquery. Then the subquery returns a smaller result set to the main query before the main query has to process it.
The answer to your question will vary depending on your schema, the complexity of your queries, the reliability of your data, etc. A general rule of thumb is to try to process the least amount of data possible, which generally means filtering it at the lowest level possible as well.
When you want to optimize a query the absolute number one place to start is to use the EXPLAIN output to see what optimizations the query parser was able to figure out and check to see what the weakest link is in the query plan. Resolve that, rinse, repeat.
You can also use explain's "extended" keyword to see the actual query it built to run which will reveal more about its usage of your criteria. In some cases, it will optimize away duplicate conditions between parent/subqueries. In other cases, it may push the conditions down from the parent in to the subquery. In some cases for (too) complex queries I've seen the it repeat the condition when it was only specified in the query once. Thankfully, you don't have to guess, mysql's explain plan will reveal all, albeit sometimes in cryptic ways.
I usually use a derived table as a "driver or aggregating" query then join that result back onto whatever table that i want to pull data from:
select
t1.*,
datediff(t1.date_shipped, t1.date_received) as htg_days,
subt_days.total_days
from
cycle_time_days as t1
inner join
(
-- aggregating/driver query
select
repair_order,
sum(total_days) as total_days
from
cycle_time_days
where
year(date_shipped) = 2010 and depot = 'REP' and
operation in ('MFG','ENG','ENGH','HOLD') -- covering index on date, depot, op ???
group by
repair_order -- indexed ??
having
total_days > 14 -- added for demonstration purposes
order by
total_days desc limit 10
) as subt_days on t1.repair_order = subt_days.repair_order
order by
t1.date_shipped;