Why does LIKE '%%' depend on irrelative joins? - mysql

I have a huge query which runs quite well by itself. It has a lot of join statements. So, its structure looks like this:
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
It runs just for a millisecond. But if I add LIKE statement:
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
WHERE tK.attrP LIKE '%Something%'
then it almost never ends. I could not wait till the end and had to stop it manually. But at the same time, if I rewrite the query just like this
SELECT ... FROM mytable t
LEFT OUTER JOIN mytablek tK ON tK.attr = t.attr1
WHERE tK.attrP LIKE '%Something%'
then it again starts to run like a flash. Why is that? I think, that there is no logic, that all those extra joins which have nothing to do with this field attrP have some effect on the speed of the query. I guess, I know how to optimize this query, but still I think that, the more I work with MySQL, the less I like it. Hundreds of times I struggled with something which had no reasonable explanation.
EDIT
Well, I thought that I knew how to optimize it - to use inner join in this way:
SELECT ... FROM mytable t
... bunch of joins
INNER JOIN mytablek tK ON tK.attr = t.attr1 AND tK.attrP LIKE '%Something%'
... bunch of joins
But this has no effect.
EDIT
Well, I found a solution - to use match against. But unfortunatelly this solution is not universal. In fact, match against throws an error when you try to search in a field returned by a subquery. Poor mysql

Adding WHERE tK.attrP LIKE '%Something%' probably removes records from the result set. We don't know how many, though. Maybe 1%, maybe 99%.
We don't even know if we only joined mytable with mytableK and used that clause, what percentage of the records would be affected. Would it be worth joining these tables first and with the supposedly few records left, do the other joins using just loops to get those other tables' records? Or should we better join everything first using great join algorithms on the tables and only at last filter with LIKE?
We don't know and the dbms doesn't know either.
But you notice that the dbms is fast on the pure joins, but slow when it applies the LIKE clause. So hint the dbms to do one thing first and the other later:
SELECT *
FROM
(
SELECT ... FROM mytable t
LEFT OUTER JOIN mytable2 t2 ON t2.attr = t.attr1
LEFT OUTER JOIN mytable3 t3 ON t3.attr = t.attr3
...
LEFT OUTER JOIN mytableN tN ON tN.attr = t.attrN
)
WHERE tK_attrP LIKE '%Something%';

The SQL engine has many options when running the query. One of the strengths of the language is the optimizer which chooses the "best" way to run a given query. Of course, when the engine things is best is not necessarily the best.
The second point is that your condition is turning the left joins to inner joins. So, you might as well write the query that way (for clarity).
With that background, there are two possible answers to your question. The first is that when you run the other queries, you are noting when results first appear. This is the "time-to-first-row" measurement. However, the rows that match your more complicated query are at the end of the input. And MySQL needs to process all the non-matching rows to find the matching ones. This would be particularly true if some of the intermediate results create a cartesian product for a given row in the first table.
Another possibility is that the execution plan changes. Because the left joins are really inner joins, MySQL has a lot of flexibility in rewriting them.
My next recommendation is to put the join to table mytablek as the first table, rather than the last. Perhaps that will help MySQL find the best optimization.
The second would be to use a subquery:
(select t.*
from mytablek tk
where tK.attrP LIKE '%Something%'
) tk
This may force the engine to whittle down the rows quickly and point the optimizer in a better direction.
from

Related

How to improve a MySQL query that have an internal join

I'm using MySQL DB and as a result of a 3rd party client, we have some query that takes a long time. The 'problem' is that there is an outer-select using some internal-join without filtering results with 'where', and the 'where' is only on the "outer" section, which causes a join of 2 very big tables instead of joining 2 much smaller subsets of the tables (I can't control it, this is they way it is done... I must define them the join and they just add where clauses to it using this structure). Note that if the 'where' clauses would have been within the internal-join the join would be much-much smaller and the whole query would have been faster.
I've considered implementing the internal-join using a view, but it results the same performance. All fields compared by the join are indexed.
I was told that it can be improved with some DB's configuration tweaking, but no one could say what exactly.
Here is a paraphrase of the query (takes lots of seconds to minute to execute):
SELECT a.*,
SUM(b.p1) p1
FROM
(SELECT a.*,
b.p1
FROM a
LEFT OUTER JOIN b ON a.some_value = b.some_value)
WHERE a.some_value = 'x'
Just to explain, if I could write the query myself I would have written it like this (takes ~200ms to execute):
SELECT a.*,
SUM(b.p1) p1
FROM a
LEFT OUTER JOIN b ON a.some_value = b.some_value
WHERE a.some_value = 'x'
Any idea how can I improve that?
Your personal rewrite would be ok, however, by adding the AND b.y to the where clause kills your LEFT join to an INNER JOIN. The AND b.y should be part of the join's ON clause to retain left-join qualification.
For indexes, table A should have index on (x, b_id) and table B have a covering index on (id, y, p1)

MySQL using select with 2 queries, subquery or join?

Related to my last question (MySQLi performance, multiple (separate) queries vs subqueries) I came across another question.
Sometimes I'm using a subquery to select the value from another table (eg. the username connected to an ID), but I'm not sure about the select-in-select, because it doesn't seem to be very clean and I'm not sure about the performance.
The subquery could look like this:
SELECT
(SELECT `user_name` FROM `users`
WHERE `user_id` = table2.user_id) AS `user_name`
, `value1`
, `value2`
FROM
`table2`
....
Would it be "better" to use a separate query for the result from table1 and another for table2 (doubles the connections, but no need to cross tables), or should I even use a JOIN to get the results in a single query?
I don't have much experience with JOINS and subqueries yet, so I'm not sure if a JOIN would be "too much" in this case, because I really just need one name connected to an ID (or maybe count the number of rows from a table), or if it doesn't matter, because the select-in-select is treated like some kind of JOIN, too..
Solution with JOIN could look like this:
SELECT
users.user_name , table2.value1, table2.value2
FROM
`table2`
INNER JOIN
`users`
ON
users.user_id = table2.user_id
....
And if I should prefer JOIN, which one would be best in this case: left join, inner join or something else?
The very fact that you are asking whether to use inner join or left join indeed shows that you haven't done much work with them.
The purposes of these two are entirely different, inner join is used to return columns from two or more tables where some columns have matching values. left join is used when you want the rows from the table specified left in the join clause to return even when there is no matching column in the other tables. It depends on your application. If one table has names of players, and another table contains details of penalties paid by them, then you will most certainly want to use left join, to account for players without a penalty, and thus without a record in the 2nd table.
Regarding whether to use subquery or join, joins can be much faster when properly used. By properly I mean, when there are indices on the join columns, the tables are specified in increasing order of the number of containing rows (generally. There might be exceptions), the join columns have similar data-types, etc. If all these conditions match, join would be the better option.

mysql SELECT "FROM t1, t2" -or- "FROM t1 JOIN t2 ON"

what is the different between this queries? which one should I prefer and why?
SELECT
t1.*,
t2.x
FROM
t1,
t2
WHERE
t2.`id` = t1.`id`
or
SELECT
t1.*,
t2.x
FROM
t1
INNER JOIN # LEFT JOIN ?
t2
ON t2.`id` = t1.`id`
Does using commas has the same effect than use LEFT JOINS?
That's embarrassing. It's the first time I asked myself about this for years. I ever used the first version, but now i'm feeling like I missed some lines in my first SQL induction. ;)
The "comma" syntax is equivalent to the INNER JOIN syntax. The query optimizer should be running the same query for you regardless of how you ask for it. There's a general recommendation to do your joins using the JOIN language and do your filtering using the WHERE language as it makes your intention more clear
The queries are the same, the first is just short-hand syntaxfor the second. LEFT JOIN is something entirely different.
INNER JOIN only shows the records common to both tables. LEFT JOIN takes all records from the left table and match it to those of the right table. If no records in the right table match, NULL will be selected in their place.
inner joins are left joins that skip on empty (null) results in the second table.
query optimizers have more possibilites when everything is done using the where clause. some rdbms don't support commands like natural join and other joins, so using "where" is something one can really rely on.
The first query is called cartisian query and usually provide undesired results in the large database.
Instead using joins will produce exactly what you want.

What are the differences between these query JOIN types and are there any caveats?

I have multiple queries (from different section of my site) i am executing
Some are like this:
SELECT field, field1
FROM table1, table2
WHERE table1.id = table2.id
AND ....
and some are like this:
SELECT field, field1
FROM table1
JOIN table2
USING (id)
WHERE ...
AND ....
and some are like this:
SELECT field, field1
FROM table1
LEFT JOIN table2
ON (table1.id = table2.id)
WHERE ...
AND ....
Which of these queries is better, or slower/faster or more standard?
The first two queries are equivalent; in the MySql world the using keyword is (well, almost - see the documentation but using is part of the Sql2003 spec and there are some differences in NULL values) the same as saying field1.id = field2.id
You could easily write them as:
SELECT field1, field2
FROM table1
INNER JOIN table2 ON (table1.id = table2.id)
The third query is a LEFT JOIN. This will select all the matching rows in both tables, and will also return all the rows in table1 that have no matches in table2. For these rows, the columns in table2 will be represented by NULL values.
I like Jeff Atwood's visual explanation of these
Now, on to what is better or worse. The answer is, it depends. They are for different things. If there are more rows in table1 than table2, then a left join will return more rows than an inner join. But the performance of the queries will be effected by many factors, like table size, the types of the column, what the database is doing at the same time.
Your first concern should be to use the query you need to get the data out. You might honestly want to know what rows in table1 have no match in table2; in this case you'd use a LEFT JOIN. Or you might only want rows that match - the INNER JOIN.
As Krister points out, you can use the EXPLAIN keyword to tell you how the database will execute each kind of query. This is very useful when trying to figure out just why a query is slow, as you can see where the database spends all of its time.
personally, i prefer using left joins in my queries, though you can run into issues in the case of null records or duplicates, but that can be resolved with a simple modification with an outer clause. it's my understanding that a join is a bit more resource intensive, but this is up for debate and might be based on personal preference.
just my $.02.
The third example, using ON (field1=field2) is the more common, and seems to be the more commonly accepted standard.
I don't know about the performance difference, you would have to run some EXPLAIN queries to see what MySQL actually ends up doing with them all really.
I do know though that the first, with WHERE being used to join them all, is much less readable on anything other than trivial queries. Once you have some complex conditions in a query, it's confusing to have "join conditions" all muddled in with "selection conditions".

MySQL Join clause vs WHERE clause

What's the difference in a clause done the two following ways?
SELECT * FROM table1 INNER JOIN table2 ON (
table2.col1 = table1.col2 AND
table2.member_id = 4
)
I've compared them both with basic queries and EXPLAIN EXTENDED and don't see a difference. I'm wondering if someone here has discovered a difference in a more complex/processing intensive envornment.
SELECT * FROM table1 INNER JOIN table2 ON (
table2.col1 = table1.col2
)
WHERE table2.member_id = 4
With an INNER join the two approaches give identical results and should produce the same query plan.
However there is a semantic difference between a JOIN (which describes a relationship between two tables) and a WHERE clause (which removes rows from the result set). This semantic difference should tell you which one to use. While it makes no difference to the result or to the performance, choosing the right syntax will help other readers of your code understand it more quickly.
Note that there can be a difference if you use an outer join instead of an inner join. For example, if you change INNER to LEFT and the join condition fails you would still get a row if you used the first method but it would be filtered away if you used the second method (because NULL is not equal to 4).
If you are trying to optimize and know your data, by adding the clause "STRAIGHT_JOIN" can tremendously improve performance. You have an inner join ON... So, just to confirm, you want only records where table1 and table2 are joined, but only for table 2 member ID = some value.. in this case 4.
I would change the query to have table 2 as the primary table of the select as it has an explicit "member_id" that could be optimized by an index to limit rows, then joining to table 1 like
select STRAIGHT_JOIN
t1.*
from
table2 t2,
table1 t1
where
t2.member_id = 4
and t2.col1 = t1.col2
So the query would pre-qualify only the member_id = 4 records, then match between table 1 and 2. So if table 2 had 50,000 records and table 1 had 400,000 records, having table2 listed first will be processed first. Limiting the ID = 4 even less, and even less when joined to table1.
I know for a fact the straight_join works as I've implemented it many times dealing with gov't data of 14+ million records linking to over 15 lookup tables where the engine got confused trying to think for me on the critical table. One such query was taking 24+ hours before hanging... Adding the "STRAIGHT_JOIN" and prioritizing what the "primary" table was in the query dropped it to a final correct result set in under 2 hours.
There's not really much of a difference in the situation you describe; in a situation with multiple complex joins, my understanding is that the first is somewhat preferential, as it will reduce the complexity somewhat; that said, it's going to be a small difference. Overall, you shouldn't notice much of a difference in most if not all situations.
With an inner join, it makes almost* no difference; if you switch to outer join, all the difference in the world.
*I say "almost" because optimizers are quirky beasts and it isn't impossible that under some circumstances, it might do a better job optimizing the former or the latter. Do not attempt to take advantage of this behavior.