Optimize SQL select query with join - mysql

I have two tables:
Shop_Products
Shop_Products_Egenskaber_Overruling
I want to select all records in Shop_Products_Egenskaber_Overruling which has a related record in
Shop_Products. This Means a record with an equal ProductNum.
This Works for me with the statement below, but I don't think a CROSS JOIN is the best approach for large record sets. When using the statement in web controls, it becomes pretty slow, even with only 1000 records. Is there a better way to accomplish this?
SELECT Shop_Products.*, Shop_Products_Egenskaber_Overruling.*
FROM Shop_Products CROSS JOIN
Shop_Products_Egenskaber_Overruling
WHERE Shop_Products.ProductNum = Shop_Products_Egenskaber_Overruling.ProductNum
Any optimizing suggestions?
Best regards.

You can do it that way but not sure it will ensure an optimization
SELECT Shop_Products.*, Shop_Products_Egenskaber_Overruling.*
FROM Shop_Products
INNER JOIN Shop_Products_Egenskaber_Overruling on Shop_Products.ProductNum = Shop_Products_Egenskaber_Overruling.ProductNum

You are actually looking for an INNER JOIN.
SELECT
SO.*,
SPEO.*
FROM SHOP_PRODUCTS SP
INNER JOIN Shop_Products_Egenskaber_Overruling SPEO
ON SP.ProductNum = SPEO.ProductNum
This will have improved performance over your CROSS-JOIN, because the condition to look for records with equal ProductNum is implicit in the JOIN condition and the WHERE clause is eliminated.
WHERE clauses always execute AFTER a JOIN. In your case, all possible combinations are created by the CROSS JOIN and then filtered by the conditions in the WHERE clause.
By using an INNER JOIN you are doing the filtering in the first step.

Cross join is slower, because it produce all combinations, which filtred after by where predicate. So you can use INNER JOIN for better performance. But I think It would be useful if you check execution plan of this query anyway, because in Oracle there is no difference between where and inner join solutions Inner join vs Where

Try using INNER JOIN
SELECT Produkter.*, Egenskaber.*
FROM Shop_Products Produkter
INNER JOIN Shop_Products_Egenskaber_Overruling Egenskaber ON Produkter.ProductNum=Egenskaber.ProductNum
Jag namngav aven dem pa Norska..

Related

SQL inner join multiple tables with one query

I've a query like below,
SELECT
c.testID,
FROM a
INNER JOIN b ON a.id=b.ID
INNER JOIN c ON b.r_ID=c.id
WHERE c.test IS NOT NULL;
Can this query be optimized further?, I want inner join between three tables to happen only if it meets the where clause.
Where clause works as filter on the data what appears after all JOINs,
whereas if you use same restriction to JOIN clause itself then it will be optimized in sense of avoiding filter after join. That is, join on filtered data instead.
SELECT c.testID,
FROM a
INNER JOIN b ON a.id = b.ID
INNER JOIN c ON b.r_ID = c.id AND c.test IS NOT NULL;
Moreover, you must create an index for the column test in table c to speed up the query.
Also, learn EXPLAIN command to the queries for best results.
Try the following:
SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
I changed the order of the joins and conditions so that the first statement to be evaluated is c.test IS NOT NULL
Disclaimer: You should use the explain command in order to see the execution.
I'm pretty sure that even the minor change I just did might have no difference due to the MySql optimizer that work on all queries.
See the MySQL Documentation: Optimizing Queries with EXPLAIN
Three queries Compared
Have a look at the following fiddle:
https://www.db-fiddle.com/f/fXsT8oMzJ1H31FwMHrxR3u/0
I ran three different queries and in the end, MySQL optimized and ran them the same way.
Three Queries:
EXPLAIN SELECT
c.testID
FROM c
INNER JOIN b ON c.test IS NOT NULL AND b.r_ID=c.testID
INNER JOIN a ON a.id=b.r_ID;
EXPLAIN SELECT c.testID
FROM a
INNER JOIN b ON a.id = b.r_id
INNER JOIN c ON b.r_ID = c.testID AND c.test IS NOT NULL;
EXPLAIN SELECT
c.testID
FROM a
INNER JOIN b ON a.id=b.r_ID
INNER JOIN c ON b.r_ID=c.testID
WHERE c.test IS NOT NULL;
All tables should have a PRIMARY KEY. Assuming that id is the PRIMARY KEY for the tables that it is in, then you need these secondary keys for maximal performance:
c: INDEX(test, test_id, id) -- `test` must be first
b: INDEX(r_ID)
Both of those are useful and "covering".
Another thing to note: b and a is virtually unused in the query, so you may as well write the query this way:
SELECT c.testID,
FROM c
WHERE c.test IS NOT NULL;
At that point, all you need is INDEX(test, testID).
I suspect you "simplified" your query by leaving out some uses of a and b. Well, I simplified it from there, just as the Optimizer should have done. (However, elimination of tables is an optimization that it does not do; it figures that is something the user would have done.)
On the other hand, b and a are not totally useless. The JOIN verify that there are corresponding rows, possibly many such rows, in those tables. Again, I think you had some other purpose.

MySQL, functional difference between ON and WHERE in specific statement

I am studying select queries for MySQL join functions.
I am slightly confused on the below query. I understand the below statement to join attributes from multiple tables with the ON clause, and then filter the results set with the WHERE clause.
Is this correct? What other functionality does this provide? Are there better alternatives?
The tables, attributes, and schema are not relevant to this question, specifically just the ON and WHERE interaction. Thanks in advance for any insight you can provide, appreciated.
SELECT DISTINCT Movies.title
FROM Rentals
INNER JOIN Customers
INNER JOIN Copies
INNER JOIN Movies ON Rentals.customerNum=Customers.customerNum
AND Rentals.barcode=Copies.barcode
AND Copies.movieNum=Movies.movieNum
WHERE Customers.givenName='Chad'
AND Customers.familyName='Black';
INNER JOIN (and the outer joins) are binary operators that should be followed by an ON clause. Your particular syntax works in MySQL, but will not work in any other database (because it is missing two ON clauses).
I would recommend writing the query as:
SELECT DISTINCT m.title
FROM Movies m JOIN
Copies co
ON co.movieNum = m.movieNum JOIN
Rentals r
ON r.barcode = co.barcode JOIN
Customers c
ON c.customerNum = r.customerNum
WHERE c.givenName = 'Chad' AND
c.familyName = 'Black';
You should always put the JOIN conditions in the ON clause, with one ON per JOIN. This also introduces table aliases, which make the query easier to write and to read.
The WHERE clause has additional filtering conditions. These could also be in ON clauses, but I think the query reads better with them in the WHERE clause. You can glance at the query and see: "We are getting something from a bunch of tables for Chad Black".
Ordinary inner JOIN operations only generate result rows for table rows matching their ON condition. They suppress any rows that don't match. That means you can move the contents of ON clauses to your WHERE clause and get the same result set. Still, don't do that; JOINs are easier to understand when they have ON clauses.
If you use LEFT JOIN, a kind of outer join, you get rows from the first table you mention that don't match any rows in the second table according to the ON clause.
SELECT a.name, b.name
FROM a
LEFT JOIN b ON a.a_id = b.a_id
gives you are result set containing all rows of a with NULL values in b.name indicating that the ON condition did not match.

Is this a left join or right join, inner or outer?

Sitting here at work writing SQL for the first time in a while. I know all the vendiagrams of all the joins, but I was wondering what type of join I just wrote. Is this even a typical join since I'm not using a 'JOIN' keyword?
select * from table1 t, table2 a where a.column1 = -10300 AND t.column1 = a.column2;
Thanks,
k9
This performs the same as an inner join because of the "t.column1 = a.column2". If that was left out, it would be a cross apply.
Here's some more info: http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
This is what would be considered an older style inner join.
In MySQL writing JOIN unqualified implies INNER JOIN. In other words the INNER in INNER JOIN is optional.
Official documentation about JOIN here
This is an old oracle-style inner join. It will give you all rows from both tables where the column matches, which is pretty much what an inner join is.
FYI, this used to be standard practice in Oracle SQL before everything went to ANSI-standard SQL.
Its Inner join as you used Inner Join logic in Where.

Using SELECT within SELECT in mysql query

It is common to use SELECT within SELECT to reduce the number of queries; but as I examined this leads to slow query (which is obviously harmful for mysql performance). I had a simple query as
SELECT something
FROM posts
WHERE id IN (
SELECT tag_map.id
FROM tag_map
INNER JOIN tags
ON tags.tag_id=tag_map.tag_id
WHERE tag IN ('tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6')
)
This leads to slow queries of "query time 3-4s; lock time about 0.000090s; with about 200 rows examined".
If I split the SELECT queries, each of them will be quite fast; but this will increase the number of queries which is not good at high concurrency.
Is it the usual situation, or something is wrong with my coding?
In MySQL, doing a subquery like this is a "correlated query". This means that the results of the outer SELECT depend on the result of the inner SELECT. The outcome is that your inner query is executed once per row, which is very slow.
You should refactor this query; whether you join twice or use two queries is mostly irrelevant. Joining twice would give you:
SELECT something
FROM posts
INNER JOIN tag_map ON tag_map.id = posts.id
INNER JOIN tags ON tags.tag_id = tag_map.tag_id
WHERE tags.tag IN ('tag1', ...)
For more information, see the MySQL manual on converting subqueries to JOINs.
Tip: EXPLAIN SELECT will show you how the optimizer plans on handling your query. If you see DEPENDENT SUBQUERY you should refactor, these are mega-slow.
You could improve it by using the following:
SELECT something
FROM posts
INNER JOIN tag_map ON tag_map.id = posts.id
INNER JOIN tags
ON tags.tag_id=tag_map.tag_id
WHERE <tablename>.tag IN ('tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6')
Just make sure you only select what you need and do not use *; also state in which table you have the tag column so you can substitute <tablename>
Join does filtering of results. First join will keep results having 1st ON condition satisfied and then 2nd condition gives final result on 2nd ON condition.
SELECT something
FROM posts
INNER JOIN tag_map ON tag_map.id = posts.id
INNER JOIN tags ON tags.tag_id = tag_map.tag_id AND tags.tag IN ('tag1', 'tag2', 'tag3', 'tag4', 'tag5', 'tag6');
You can see these discussions on stack overflow :
question1
question2
Join helps to decrease time complexity and increases stability of server.
Information for converting sub queries to joins:
link1
link2
link3

What would cause a query to run slowly when used a subquery, but not when run separately?

I have something similar to the following:
SELECT c.id
FROM contact AS c
WHERE c.id IN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%')
ORDER BY c.name
The problem is that the query takes a very very very long time (>2minutes), but if I take the subquery, run it separately, implode the ids and insert them into the main query, it runs in well less than 1 second, including the data retrival and implosion.
I have checked the explains on both methods and keys are being used appropriately and the same ways. The subquery doesn't return more than 200 IDs.
What could be causing the subquery method to take so much longer?
BTW, I know the query above can be written with joins, but the query I have can't be--this is just a simplified version.
Using MySQL 5.0.22.
Sounds suspiciously like MySQL bug #32665: Query with dependent subquery is too slow.
What happens if you try it like this?
SELECT c.id
FROM contact AS c
INNER JOIN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%') subq ON subq.contact_id=c.id
ORDER BY c.name
Assuming that the result of s.contact_id is unique. You can add distinct to the subquery if it is not.
I always use uncorrelated subqueries this way rather than using the IN operator in the where clause.
Have you checked the Execution Plan for the query? This will usually show you the problem.
Can't you do another join instead of a subquery?
SELECT c.id
FROM contact AS c
JOIN sub_table AS s on c.id = s.contact_id
LEFT JOIN contact_sub AS cs ON (s.id = cs.sub_field)
WHERE cs.phone LIKE '535%'
ORDER BY c.name
Since the subquery is referring to a field sub_field in the outer select, it has to be run once for each row in the outer table - the results for the inner query will change with each row in the outer table.
It's a correlated subquery. It runs once for each row in the outer select. (I think. You have two tables with the same correlation name, I'm assuming that's a typo. That you say it can't be rewritten as a join means it's correlated. )
Ok, I'm going to give you something to try. You say that the subquery is not correlated, and that you still can't join on it. And that it you take the output of the subquery, and lexically substitute that for the subquery, the main query runs much faster.
So try this: make the subquery into a view: create view foo followed by the text of the subquery. Then rewrite the main query to get rid of the "IN" clause and instead join to the view.
How's the timing on that?