Want to Understand the performance of join in MYSQL - mysql

There are different types of join in mysql like below:
1.JOIN
2.LEFT JOIN
3.RIGHT JOIN
4.INNER JOIN
5.LEFT OUTER JOIN
6.RIGHT OUTER JOIN
And i want to know which one perform better in query. And how we decide that this one suitable for this query. As JOIN and INEER JOIN fetch same data.In this case which one suitable.

I will repeat what #Dai said in the comments, joins should be used based on which operation you need, not on performance. The answers to this question cover what the different types of joins are. In particular I like this visual explanation.
Analyzing why a query is slow is usually done with EXPLAIN. It will tell you the plan for the query and you can determine things like if its doing a full table scan and what rows might need to be indexed. Here is a good writeup of how to use an EXPLAIN.

Related

MySQL - SELECT, JOIN

Few months ago I was programming a simple application with som other guy in PHP. There we needed to preform a SELECT from multiple tables based on a userid and another value that you needed to get from the row that was selected by userid.
My first idea was to create multiple SELECTs and parse all the output in the PHP script (with all that mysql_num_rows() and similar functions for checking), but then the guy told me he'll do that. "Okay no problem!" I thought, just much more less for me to write. Well, what a surprise when i found out he did it with just one SQL statement:
SELECT
d.uid AS uid, p.pasmo_cas AS pasmo, d.pasmo AS id_pasmo ...
FROM
table_values AS d, sectors AS p
WHERE
d.userid='$userid' and p.pasmo_id=d.pasmo
ORDER BY
datum DESC, p.pasmo_id DESC
(shortened piece of the statement (...))
Mostly I need to know the differences between this method (is it the right way to do this?) and JOIN - when should I use which one?
Also any references to explanations and examples of these two would come in pretty handy (not from the MySQL ref though - I'm really a novice in this kind of stuff and it's written pretty roughly there.)
, notation was replaced in ANSI-92 standard, and so is in one sense now 20 years out of date.
Also, when doing OUTER JOINs and other more complex queries, the JOIN notation is much more explicit, readable, and (in my opinion) debuggable.
As a general principle, avoid , and use JOIN.
In terms of precedence, a JOIN's ON clause happens before the WHERE clause. This allows things like a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL to check for cases where there is NOT a matching row in b.
Using , notation is similar to processing the WHERE and ON conditions at the same time.
This definitely looks like the ideal scenario for a join so you can avoid returning more data then you actually need. This: http://www.w3schools.com/sql/sql_join.asp or this: http://en.wikipedia.org/wiki/Join_(SQL) should help you get started with joins. I'm also happy to help you write the statement if you can give me a brief outline of the columns / data in each table (primarily I need two matching columns to join on).
The use of the WHERE clause is a valid approach, but as #Dems noted, has been superseded by the use of the JOINS syntax.
However, I would argue that in some cases, use of the WHERE clauses to achieve joins can be more readable and understandable than using JOINs.
You should make yourself familiar with both methods of joining tables.

How does JOIN work in MySQL?

Although the question title is duplicate of many discussions, I did not find a answer to this question:
Consider a simple join for normalized tables of tags as
SELECT tags.tag
FROM tags
INNER JOIN tag_map
ON tags.tag_id=tag_map.tag_id
WHERE article_id=xx
Does JOIN work with the entire tables of tags and tag_map then filter the created (JOINed) table to find rows with WHERE clause for the article id
OR JOIN will only join rows of tag_map table in which article_id=xx ?
The latter method should be quite faster!
It will do the former, to my knowledge WHERE's are explicitly performed on the resulting JOINed table. (Disclaimer: MySQL may optimize this in some cases, I don't know).
To force the latter behaviour and execute the WHERE first, you can add an extra filter to your JOIN ON statement:
SELECT tags.tag
FROM tags
INNER JOIN tag_map
ON tags.article_id=xx
AND tags.tag_id=tag_map.tag_id
WHERE article_id=xx
The Joins work on ONLY those records qualified from the WHERE clause of the first table returning records.. That said, you are doing a join to tag_map, but your where clause does not specify which alias the "Article_ID" is associated with. Its typically better to always qualify your fields with either the table name or alias the are coming from.
So, if article_id is coming from TAGS, then it will first look at that list as the primary set of records, and optimized with index if one so exists and return a small set. From that, the join is applied to the tag_map and will grab all records that match the join "ON" condition.
Just to clarify something. If the JOIN was applied FIRST, before the WHERE clause optimization, queries would take forever. The join basically PREPARES the relationship before the record selection actually occurs. Hence, the execution plan that shows the indexes that would be used.
It depends on the engine. Earlier version of many database engines would generate the join results first and then it would filter. Newer versions of engines generate a execution plan that achieves the fastest results. Test would have to be done with the db engine reviewing execution plans for your version/database to find "what is best"
Assuming it is simple or inner join:
The answer is: in relational model, first answer is correct, it creates a table that contains every row from first crossed with every row from the second table, so if you have N rows in first and M in second, it will create a table with NxM and then eliminate those where conditions do not match.
Now, that is mathematical model, but in implementation, depending on the engine, it will use some smarter way, typically choosing one table that seems faster and jioing from there using hopefully indexed join field. But this depends between engines: there is a lot of documetnation on that(google it) and some people, poster of this answer included, are paid to optimize join queries...
In case of MYSQL (just noticed the tag) you can use following syntax:
EXPLAIN [EXTENDED] SELECT select_options
as explained here and MYSQL will tell you how it would execute such query. It is faster then reading the docuemtnation.
You can always check Execution plan to see how your query gets executed step by step. In MySQL I don't know whether it can be presented graphically using any third party tools (as you can on MS SQL out of the box with Management Studio) but you can still check it using explain language constructs. Check documentation.
Not knowing your table schema
If article_id is of table tags then tag_map table isn't scanned at all unless join column in FK table is nullable.
If article_id is indexed (ie. primary key) then index is being scanned...
etc...
What I'd like to say is that we'd need your table schema definition to tell you some details. We can't know how your schema works.

Is nested select clause decreases the database performance??

I used to write select clause in side select clause to avoid joins in from clause. But I am afraid that is it a good coading practice or it will degrade database performance. Below is the query which contains multiple tables but I have written it using nested select clause without any join statement. Please let me know if I am making any mistake or it is ok. At this moment, I am getting accurate result.
SELECT * ,
(select POrderNo from PurchaseOrderMST POM
where POM.POrderID=CET.POrderID)as POrderNo,
(select SiteName from SiteTRS ST where ST.SiteID=CET.SiteID)as SiteName,
(select ParticularName from ParticularMST PM where
PM.ParticularID=CET.ParticularID)as ParticulerName
FROM ClaimExpenseTRS CET
WHERE ClaimID=#ClaimID
I'd use joins for this because it is best practice to do so and will be better for the query optimizer.
But for the learning just try to execute the script with join and without and see what happens on the query plan and the execution time. Usually this answers your questions right away.
Your solution is just fine.
As long as you are only using 1 column for each "joined" table, and has no multiple matching rows, it is fine. In some cases, even better than joining.
(the db engine could anytime change the direction of a join, if you are not using tricks to force a given direction, which could cause performance suprises. It is called query optimiyation, but as far as you really know your database, you should be the one to decide how the query should run).
I think you should JOIN indeed.
Now your creating your own JOIN with where and select statements.

What's better: joins or multiple sub-select statements as part of one query

Performance wise, what is better?
If I have 3 or 4 join statements in my query or use embedded select statements to pull the same information from my database as part of one query?
I would say joins are better because:
They are easier to read.
You have more control over whether you want to do an inner, left/right outer join or full outer join
join statements cannot be so easily abused to create query abominations
with joins it is easier for the query optimizer to create a fast query (if the inner select is simple, it might work out the same, but with more complicated stuff joins will work better).
embedded select's can only simulate left/right outer join.
Sometimes you cannot do stuff using joins, in that case (and only then) you'll have to fall back on an inner select.
It rather depends on your database: sizes of tables particularly, but also the memory parameters and sometimes even how the tables are indexed.
On less than current versions of MySQL, there was a real possibility of a query with a sub-select being considerably slower than a query that would return the same results structured with a join. (In the MySQL 4.1 days, I have seen the difference to be greater than an order of magnitude.) As a result, I prefer to build queries with joins.
That said, there are some types of queries that are extremely difficult to build with a join and a sub-select is the only way to really do it.
Assuming the database engine does absolutely no optimization, I would say it depends on how consistent you need your data to be. If you're doing multiple SELECT statements on a busy database, where the data you are looking at may change rapidly, you may run into issues where your data does not match up, between queries.
Assuming your data contains no inter-dependencies, then multiple queries will work fine. However, if your data requires consistency, use a single query.
This viewpoint boils down to keeping your data transactionally safe. Consider the situation where you have to pull a total of all accounts receivable, which is kept in a separate table from the monetary transaction amounts. If someone were to add another transaction in between your two queries, the accounts receivable total would not match the sum of the transaction amounts.
Most databases will optimize both queries below into the same plan, so whether you do:
select A.a1, B.b1 from A left outer join B on A.id = B.a_id
or
select A.a1, (select B.b1 from B where B.a_id = A.id) as b1 from A
It ends up being the same. However, in most cases for non-trivial queries you'd better stick with joins whenever you can, especially since some types of joins (such as an inner join) are not possible to achieve using sub-selects.

Inner Join vs. Natural Join, speed-wise?

I'm having a discussion with a colleague of mine, and we're stuck on the point of whether an inner join is faster than a natural join simply due to the fact that the ON clause is specified in an inner join, and thus the SQL engine need not compare table structures to determine common columns.
Please share your thoughts/answers.
It is going to depend on the DBMS, and to some extent on the tables joined, and the difference is generally not going to be measurable. With an INNER JOIN, there will be two copies of the common columns; with a NATURAL JOIN, there will be just one copy of the common columns.
If the intermediate results have to be written to disk or sorted, there will be more data to be managed, and there is slightly more data to be returned to the client over the client-server connection, so an INNER JOIN might take slightly longer than an NATURAL JOIN. If the common columns are big enough - long characters fields, for example - this size difference could be significant.
The difference in time to prepare the two types of join is completely negligible. Both have to read the information for all the columns in both tables involved in the join, and the processing of that data is substantially similar.
During execution, the copying of the data to form the intermediate results might be slightly more complex for a NATURAL JOIN - because you don't simply take a copy of each row - but you don't necessarily take a full copy of each row for an INNER JOIN either, so the difference is again negligible. And, as before, there is slightly less data to be copied in aggregate for the NATURAL JOIN, so there is a still a small advantage to it.
So, overall, a NATURAL JOIN might be slightly quicker than an INNER JOIN, but the difference is still likely to be negligible.
I can't imagine it being so expensive to find common columns that you could support that hypothesis.
In the exceptional case a natural join can be as speedy as an inner join.