SQL Join condition optimization - mysql

I had an issue with a SQL query that was slow and lead me to a 500 Internal Server Error.
After playing a bit with the query I had found out that moving join conditions outside makes the query work and the error vanish.
Now I wonder:
is the second query is equivalent to the first one ?
what was my bug mistake, why it was so slow ?
Here the queries.
Original (slow) SQL:
SELECT *
FROM `TABLE_1` AS `main_table`
INNER JOIN `TABLE_2` AS `w`
ON main_table.entity_id = w.product_id
LEFT JOIN `TABLE_3` AS `ur`
ON main_table.entity_id = ur.product_id
AND ur.category_id IS NULL
AND ur.store_id = 1
AND ur.is_system = 1
WHERE (w.website_id = '1')
Faster SQL:
SELECT *
FROM `TABLE_1` AS `main_table`
INNER JOIN `TABLE_2` AS `w`
ON main_table.entity_id = w.product_id
LEFT JOIN `TABLE_3` AS `ur`
ON main_table.entity_id = ur.product_id
WHERE (w.website_id = '1')
AND ur.category_id IS NULL
AND ur.store_id = 1
AND ur.is_system = 1

Is the 2nd query equivalent to the 1st?
Depending on the data in TABLE_3, the 2nd query is NOT equivalent to the first, since you're using a LEFT JOIN.
Consider what happens when you have a record in TABLE_1, with an entity_id that does not match any rows in TABLE_3: Your first query would still return this record from TABLE_1, with NULL values for all the columns of TABLE_3. The 2nd query applies filters to the columns from TABLE_3, but since these are all NULLs, the record now gets filtered out.
In general, query 2 would thus return fewer records than query 1 (unless, of course, all your entity_id's match a product_id from TABLE_3).
What was my bug mistake, why it was so slow?
Both queries are valid SQL, so none of them should give you a 500 Internal Server Error. This does not even look like a MySQL error, so maybe the bug arose elsewhere? Perhaps you have a web application that is unable to handle the NULL's returned from the 1st query? It's impossible to answer the question without more details about the error.
As for the speed of the queries, this depends a lot on what indexes are defined. Generally, I would not expect the 2nd query to be faster than the 1st, but that depends. As I mentioned above, the 1st query might return more records than the 2nd. If you're not querying the database directly, but instead through some application, could it be the application slowing down your query?

Related

MySQL 5.7 vs. 8 difference in LEFT JOIN processing

We are checking if we can upgrade our project database from MySQL 5.7 to v.8. The system is 7 years old and has tons of code... Today we got a slightly strange bug which did not appear on 5.7 (I wonder why). The buggy request is the following:
SELECT TableA.Amount, SUM(TableB.Amount) AS Amount2
FROM
TableA LEFT JOIN TableB ON TableA.ReservID = TableB.ReservID
WHERE
TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
AND TableA.ReservID = 4657;
There is one record in TableA and no records in TableB for the given conditions.
I know that WHERE conditions are applied after joining the tables. So it is not a suprise for me that the query return NULL, NULL on MySQL8. But our developer (who's still sure that this query is Ok) just showed me that it returns 67667.65, NULL on MySQL 5.7!
So I got 2 questions at ones. 1. Why it works on 5.7 when all data must be filtered out by the WHERE conditions on non-existent (all null in joint table) Table2 fields? 2. Is there a way to make MySQL8 work in the same 'tolerant' way as I am sure there are many such 'genius' queries all over our old code?
The problem in your query is not the (left) join. While it makes it less clear to the reader that your left join is treated as a join, having the comparisons in the where clause is completely valid sql. Every database will treat your left join correctly as a join, and I don't think that MySQL (5.7 or 8.0) would give you a different result if you replace left join with a join, as the internal representation would not change.
Your query has a problem with the aggregation. select colA, sum(colB) without using group by colA will leave the value of colA unclear, see MySQL Handling of GROUP BY:
SELECT name, MAX(age) FROM t;
Without GROUP BY, there is a single group and it is nondeterministic which name value to choose for the group.
MySQL is about the only database system that will even allow you to run this query, a very special behaviour that generates a lot of questions on stackoverflow. Most other databases will complain about that column listed in the select - exactly for the reason you face right now: they don't really know what to return.
So the value you get for tablea.amount is basically random. While it usually depends on how MySQL executes the query internally (so it could depend on some optimization setting), it looks unlikely that you can convince MySQL 8 to return the number value there - and especially to make sure it is consistent in all similar queries you may have.
And I want to emphasize: the value null in MySQL 8 is also not deterministic - it could be something different.
To make a proper query, use an aggregate function for tablea.amount too, and depending on your requirements, fix your left join, e.g.
SELECT MAX(TableA.Amount) AS Amount,
SUM(TableB.Amount) AS Amount2
FROM TableA LEFT JOIN TableB
ON TableA.ReservID = TableB.ReservID
AND TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
WHERE TableA.ReservID = 4657
This should give you the behaviour from MySQL 5.7, e.g. <value for amount>, null. If you use join, you should get null, null. Both cases will be deterministic.
Although using just SELECT TableA.Amount, SUM(...) ... LEFT JOIN ... (with a proper left join) will return the amount instead of null for MySQL 8 too, it is still not valid sql! MySQL will only allow it because ReservID = 4657 limits it to a single row in TableA using the primary key. So if you have to check all queries anyway, fix it properly.
i dont know the reason, why its working on 5.7. but to get your expected result you can do:
SELECT
TableA.Amount,
SUM(TableB.Amount) AS Amount2
FROM TableA
LEFT JOIN TableB
ON TableA.ReservID = TableB.ReservID
AND TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
WHERE TableA.ReservID = 4657;

Query performance issue with multiple left joins

In mysql v8.x I have a table with about 7000 records in it. I'm trying to create a single query combines two subqueries of the same table.
I thought I could achieve this by left joining on the subqueries and then matching on any records that have values for these as shown in the example below (note: this effect happens when my_table has just just an id column).
The query seems to work quickly when the subqueries return records but not when the subqueries return empty (which I've recreated in the example below with WHERE FALSE). When this happens there is a situation where executing these queries on their own that each take a millisecond or so, takes 12 seconds.
My understanding is that these these joins should return the same number of rows as the source table and as such there shouldn't be such a big difference. I'm interested in understanding how the join works in this type of case and why it's producing such a difference in execution time.
SELECT my_table.* FROM accessory_requests
LEFT JOIN
( SELECT my_table.id
FROM my_table
WHERE FALSE
) as join1
ON join1.id = my_table.id
LEFT JOIN
( SELECT my_table.id
FROM my_table
WHERE FALSE
) as join2
ON join2.id = my_table.id
WHERE join1.id IS NOT NULL OR join2.id IS NOT NULL;
Your query is all messed up and it is not really clear what you are trying to do.
However, I can comment on your performance issues. MySQL has a tendency to materialize subqueries in the FROM clause. That means that a new copy of the table is created. In doing so, indexes are lost on the table. So, eliminate the subqueries in the FROM clause.
If you ask another question with sample data, desired results, and a decent explanation, then it might be possible to help with a more efficient form of the query. I suspect you just want not exists, but that is a rather large leap from this question.
combines two subqueries of the same table.
What do you mean?
If you want to take the rows from each subquery, then simply do
( SELECT ... ) -- what you are calling the first subquery
UNION
( SELECT ... ) -- 2nd
Also,
LEFT JOIN ( ... ) as join1 ON ...
WHERE join1.id IS NOT NULL;
is probably the same as simply
JOIN ( ... ) as join1 ON ...
If by "combining" you mean to have multiple columns, then see the tag [pivot-table].

How to optimize a MySQL update which contains an "in" subquery?

How do I optimize the following update because the sub-query is being executed for each row in table a?
update
a
set
col = 1
where
col_foreign_id not in (select col_foreign_id in b)
You could potentially use an outer join where there are no matching records instead of your not in:
update table1 a
left join table2 b on a.col_foreign_id = b.col_foreign_id
set a.col = 1
where b.col_foreign_id is null
This should use a simple select type rather than a dependent subquery.
Your current query (or the one that actually works since the example in the OP doesn't look like it would) is potentially dangerous in that a NULL in b.col_foreign_id would cause nothing to match, and you'd update no rows.
not exists would also be something to look at if you want to replace not in.
I can't tell you that this will make your query any faster, but there is some good info here. You'll have to test in your environment.
Here's a SQL Fiddle illuminating the differences between in, exists, and outer join (check the rows returned, null handling, and execution plans).

MySQL Query not returning

I'm unfamiliar with MySQL, and I don't know where to begin with solving this query issue.
SELECT *
FROM `rmedspa`.`clients` c
inner join `rmedspa`.`tickets` t on c.clientid = t.clientid
where c.fldclass is not null
AND t.ticketID > 0
This query returns just fine in MySQL Workbench, in 30 seconds, and the IDE is limiting the query results to 1000 records. The database is not on my own machine, but on a server that is in a different location (in other words, it's going out to the internet and it's slow). If I add an order by at the end, the query never returns.
SELECT *
FROM `rmedspa`.`clients` c
inner join `rmedspa`.`tickets` t on c.clientid = t.clientid
where c.fldclass is not null
AND t.ticketID > 0
ORDER BY t.ticketid
There are "many" tickets for 1 client. t.ticketid is an int. clientid is an int, too.
I don't know where to begin to find out why the ORDER BY is causing this query to never return. It doesn't fail, it just doesn't return.
This Post is a short review of previous comments plus some hints on SQL-Query for you.
Query building
Whenever the column names of a join condition match, you can rewrite the join statement as follows:
INNER JOIN {database}.{table} t USING({joinColumn})
Check your indexes - in your case: Are the columns of the join condition and the where statement indexed?
mySQL result
The ORDER BY statement often requires a reordering of the result set in memory. If you expected large results (many rows) and/or large rows (many columns) but see no result the mySQL Server is not properly configured for that case - How MySQL Uses Memory, Optimizing Queries with EXPLAIN

MySQL Subquery Causing Server to Hang

I'm trying to execute the following query
SELECT * FROM person
WHERE id IN
( SELECT user_id FROM participation
WHERE activity_id = '1' AND application_id = '1'
)
The outer query returns about 4000 responses whilst the inner returns 29. When executed on my web server nothing happened and when I tested it locally mysql ended up using 100% CPU and still achieved nothing. Could the size be the cause?
Specifically it causes the server to hang forever, I'm fairly sure the web server I ran the query on is in the process of crashing due to it (woops).
why don't you use an inner join for this query? i think that would be faster (and easier to read) - and maybe it solves your problem (but i can't find a failure in your query).
EDIT: the inner-join-solution would look like this:
SELECT
person.*
FROM
person
INNER JOIN
participation
ON
person.id = participation.user_id
WHERE
participation.activity_id = '1'
AND
participation.application_id = '1'
How many rows are there in participation table and what indexes are there?
A multi-column index on (user_id, activity_id, application_id) could help here.
Re comments: IN isn't slow. Subqueries within IN can be slow, if they're correlated to outer query.