MySQL using select with 2 queries, subquery or join? - mysql

Related to my last question (MySQLi performance, multiple (separate) queries vs subqueries) I came across another question.
Sometimes I'm using a subquery to select the value from another table (eg. the username connected to an ID), but I'm not sure about the select-in-select, because it doesn't seem to be very clean and I'm not sure about the performance.
The subquery could look like this:
SELECT
(SELECT `user_name` FROM `users`
WHERE `user_id` = table2.user_id) AS `user_name`
, `value1`
, `value2`
FROM
`table2`
....
Would it be "better" to use a separate query for the result from table1 and another for table2 (doubles the connections, but no need to cross tables), or should I even use a JOIN to get the results in a single query?
I don't have much experience with JOINS and subqueries yet, so I'm not sure if a JOIN would be "too much" in this case, because I really just need one name connected to an ID (or maybe count the number of rows from a table), or if it doesn't matter, because the select-in-select is treated like some kind of JOIN, too..
Solution with JOIN could look like this:
SELECT
users.user_name , table2.value1, table2.value2
FROM
`table2`
INNER JOIN
`users`
ON
users.user_id = table2.user_id
....
And if I should prefer JOIN, which one would be best in this case: left join, inner join or something else?

The very fact that you are asking whether to use inner join or left join indeed shows that you haven't done much work with them.
The purposes of these two are entirely different, inner join is used to return columns from two or more tables where some columns have matching values. left join is used when you want the rows from the table specified left in the join clause to return even when there is no matching column in the other tables. It depends on your application. If one table has names of players, and another table contains details of penalties paid by them, then you will most certainly want to use left join, to account for players without a penalty, and thus without a record in the 2nd table.
Regarding whether to use subquery or join, joins can be much faster when properly used. By properly I mean, when there are indices on the join columns, the tables are specified in increasing order of the number of containing rows (generally. There might be exceptions), the join columns have similar data-types, etc. If all these conditions match, join would be the better option.

Related

Why does LIKE '%%' depend on irrelative joins?

I have a huge query which runs quite well by itself. It has a lot of join statements. So, its structure looks like this:
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
It runs just for a millisecond. But if I add LIKE statement:
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
WHERE tK.attrP LIKE '%Something%'
then it almost never ends. I could not wait till the end and had to stop it manually. But at the same time, if I rewrite the query just like this
SELECT ... FROM mytable t
LEFT OUTER JOIN mytablek tK ON tK.attr = t.attr1
WHERE tK.attrP LIKE '%Something%'
then it again starts to run like a flash. Why is that? I think, that there is no logic, that all those extra joins which have nothing to do with this field attrP have some effect on the speed of the query. I guess, I know how to optimize this query, but still I think that, the more I work with MySQL, the less I like it. Hundreds of times I struggled with something which had no reasonable explanation.
EDIT
Well, I thought that I knew how to optimize it - to use inner join in this way:
SELECT ... FROM mytable t
... bunch of joins
INNER JOIN mytablek tK ON tK.attr = t.attr1 AND tK.attrP LIKE '%Something%'
... bunch of joins
But this has no effect.
EDIT
Well, I found a solution - to use match against. But unfortunatelly this solution is not universal. In fact, match against throws an error when you try to search in a field returned by a subquery. Poor mysql
Adding WHERE tK.attrP LIKE '%Something%' probably removes records from the result set. We don't know how many, though. Maybe 1%, maybe 99%.
We don't even know if we only joined mytable with mytableK and used that clause, what percentage of the records would be affected. Would it be worth joining these tables first and with the supposedly few records left, do the other joins using just loops to get those other tables' records? Or should we better join everything first using great join algorithms on the tables and only at last filter with LIKE?
We don't know and the dbms doesn't know either.
But you notice that the dbms is fast on the pure joins, but slow when it applies the LIKE clause. So hint the dbms to do one thing first and the other later:
SELECT *
FROM
(
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
)
WHERE tK_attrP LIKE '%Something%';
The SQL engine has many options when running the query. One of the strengths of the language is the optimizer which chooses the "best" way to run a given query. Of course, when the engine things is best is not necessarily the best.
The second point is that your condition is turning the left joins to inner joins. So, you might as well write the query that way (for clarity).
With that background, there are two possible answers to your question. The first is that when you run the other queries, you are noting when results first appear. This is the "time-to-first-row" measurement. However, the rows that match your more complicated query are at the end of the input. And MySQL needs to process all the non-matching rows to find the matching ones. This would be particularly true if some of the intermediate results create a cartesian product for a given row in the first table.
Another possibility is that the execution plan changes. Because the left joins are really inner joins, MySQL has a lot of flexibility in rewriting them.
My next recommendation is to put the join to table mytablek as the first table, rather than the last. Perhaps that will help MySQL find the best optimization.
The second would be to use a subquery:
(select t.*
from mytablek tk
where tK.attrP LIKE '%Something%'
) tk
This may force the engine to whittle down the rows quickly and point the optimizer in a better direction.
from

What's the difference between comma separated joins and join on syntax in MySQL? [duplicate]

This question already has answers here:
INNER JOIN ON vs WHERE clause
(12 answers)
Difference between these two joining table approaches?
(4 answers)
Closed 8 years ago.
I have a table Person with a column id that references a column id in table Worker.
What is the difference between these two queries? They yield the same results.
SELECT *
FROM Person
JOIN Worker
ON Person.id = Worker.id;
and
SELECT *
FROM Person,
Worker
WHERE Person.id = Worker.id;
There is no difference at all.
First representation makes query more readable and makes it look very clear as to which join corresponds to which condition.
The queries are logically equivalent. The comma operator is equivalent to an [INNER] JOIN operator.
The comma is the older style join operator. The JOIN keyword was added later, and is favored because it also allows for OUTER join operations.
It also allows for the join predicates (conditions) to be separated from the WHERE clause into an ON clause. That improves (human) readability.
FOLLOWUP
This answer says that the two queries in the question are equivalent. We shouldn't mix old-school comma syntax for join operation with the newer JOIN keyword syntax in the same query. If we do mix them, we need to be aware of a difference in the order of precedence.
excerpt from MySQL Reference Manual
https://dev.mysql.com/doc/refman/5.6/en/join.html
INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).
However, the precedence of the comma operator is less than that of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.
Beside better readability, there is one more case where explicitly joined tables are better instead of comma-separated tables.
let's see an example:
Create Table table1
(
ID int NOT NULL Identity(1, 1) PRIMARY KEY ,
Name varchar(50)
)
Create Table table2
(
ID int NOT NULL Identity(1, 1) PRIMARY KEY ,
ID_Table1 INT NOT NULL
)
Following query will give me all columns and rows from both tables
SELECT
*
FROM table1, table2
Following query will give me columns from first table with table alias called 'table2'
SELECT
*
FROM table1 table2
If you mistakenly forget comma in comma-separated join, second table automatically convert to table alias for first table. Not in all cases, but there is chances for something like this
Using JOINS makes the code easier to read, since it's self-explanatory.
In speed there is no difference (I tested it) and the execution plan is the same
If the query optimizer is doing its job right, there should be no difference between those queries. They are just two ways to specify the same desired result.
The SELECT * FROM table1, table2, etc. is good for a couple of tables, but it becomes exponentially harder as the number of tables increases.
The JOIN syntax makes it explicit what criteria affects which tables (giving a condition). Also, the second way is the older standard.
Although, to the database, they end up being the same

Querying to find if some columns are in array

I have a complex nested-query which is inside a join, is it possible to find several columns that match that query instead of repeating the query in the Join? ie:
select * from
A left join B on a.xid=b.xid and
(a.userid or b.userid) in (select userid from A where..)
^^^ don't want to duplicate the nested-query...
There is a nested query that should match several columns from the parent-query (as seen in the example above). The simple way is to duplicate the nested query several times. ie-
select * from A
left join B
on a.xid=b.xid
and a.userid in (select userid from ...)
and b.userid in (Select userid from ....)
BUT - since the subquery is bit complicated I don't want mysql to run it twice, but rather only once and than match it against several of the parent query columns.
If your subquery is working properly and you have the query cache turned on you won't have to worry about performance. If its a question of it being overly complex then maybe you could use a proc for this query: put the results of the sub into a temp table and join to it.
There are lots of ways to approach this.

Difference between these two joining table approaches?

Consider we have two tables, Users and Posts. user_id is the foreign key in Posts table and is primary key in Users table.
Whats the difference between these two sql queries?
select user.name, post.title
from users as user, posts as post
where post.user_id = user.user_id;
vs.
select user.name, post.title
from users as user join posts as post
using user_id;
Other than syntax, for the small snippet, they work exactly the same. But if at all possible, always write new queries using ANSI-JOINs.
As for semantically, the comma notation is used to produce a CARTESIAN product between two tables, which means produce a matrix of all records from table A with all records from table B, so two tables with 4 and 6 records respectively produces 24 records. Using the WHERE clause, you can then pick the rows you actually want from this cartesian product. However, MySQL doesn't actually follow through and make this huge matrix, but semantically this is what it means.
A JOIN syntax is the ANSI standard that more clearly defines how tables interact. By putting the ON clause next to the JOIN, it makes it clear what links the two tables together.
Functionally, they will perform the same for your two queries. The difference comes in when you start using other [OUTER] JOIN types.
For MySQL specifically, comma-notation does have one difference
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
However, it would not be wise to bank on this difference.
where post.user_id = user.user_id
Here you are making a conditional statement
from users as user join posts as post using user_id
Here you are joining two tables using the foreign key
At the end is just the same but JOIN is better used for more advanced queries...
In MySQL JOIN syntax, CROSS JOIN, INNER JOIN, and JOIN are all the same. A comma-separated table list is a JOIN.
The MySQL manual on page https://dev.mysql.com/doc/refman/5.5/en/join.html makes this point about the difference between the two approaches:
However, the precedence of the comma operator is less than that of
INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins
with the other join types when there is a join condition, an error of
the form Unknown column 'col_name' in 'on clause' may occur.

MySQL Join clause vs WHERE clause

What's the difference in a clause done the two following ways?
SELECT * FROM table1 INNER JOIN table2 ON (
table2.col1 = table1.col2 AND
table2.member_id = 4
)
I've compared them both with basic queries and EXPLAIN EXTENDED and don't see a difference. I'm wondering if someone here has discovered a difference in a more complex/processing intensive envornment.
SELECT * FROM table1 INNER JOIN table2 ON (
table2.col1 = table1.col2
)
WHERE table2.member_id = 4
With an INNER join the two approaches give identical results and should produce the same query plan.
However there is a semantic difference between a JOIN (which describes a relationship between two tables) and a WHERE clause (which removes rows from the result set). This semantic difference should tell you which one to use. While it makes no difference to the result or to the performance, choosing the right syntax will help other readers of your code understand it more quickly.
Note that there can be a difference if you use an outer join instead of an inner join. For example, if you change INNER to LEFT and the join condition fails you would still get a row if you used the first method but it would be filtered away if you used the second method (because NULL is not equal to 4).
If you are trying to optimize and know your data, by adding the clause "STRAIGHT_JOIN" can tremendously improve performance. You have an inner join ON... So, just to confirm, you want only records where table1 and table2 are joined, but only for table 2 member ID = some value.. in this case 4.
I would change the query to have table 2 as the primary table of the select as it has an explicit "member_id" that could be optimized by an index to limit rows, then joining to table 1 like
select STRAIGHT_JOIN
t1.*
from
table2 t2,
table1 t1
where
t2.member_id = 4
and t2.col1 = t1.col2
So the query would pre-qualify only the member_id = 4 records, then match between table 1 and 2. So if table 2 had 50,000 records and table 1 had 400,000 records, having table2 listed first will be processed first. Limiting the ID = 4 even less, and even less when joined to table1.
I know for a fact the straight_join works as I've implemented it many times dealing with gov't data of 14+ million records linking to over 15 lookup tables where the engine got confused trying to think for me on the critical table. One such query was taking 24+ hours before hanging... Adding the "STRAIGHT_JOIN" and prioritizing what the "primary" table was in the query dropped it to a final correct result set in under 2 hours.
There's not really much of a difference in the situation you describe; in a situation with multiple complex joins, my understanding is that the first is somewhat preferential, as it will reduce the complexity somewhat; that said, it's going to be a small difference. Overall, you shouldn't notice much of a difference in most if not all situations.
With an inner join, it makes almost* no difference; if you switch to outer join, all the difference in the world.
*I say "almost" because optimizers are quirky beasts and it isn't impossible that under some circumstances, it might do a better job optimizing the former or the latter. Do not attempt to take advantage of this behavior.