MYSQL - not equal joins not working properly - mysql

I'm having trouble getting a query to work properly. I feel that this should be easy but for some reason I can't get it correct.
I have two tables joined by an ID field. I'm trying to get all the records that are in t1 and don't show up in t2.
This works currently:
select * from at_templates a
left join at_vault b on a.id = b.template
where b.at_id is null
BUT, I also want to put another condition in the query to limit the data to a subset and it is not working:
select * from at_templates a
left join at_vault b on a.id = b.template
where b.at_id != 1
The second query comes up empty but I want the same results as the first, based upon the input of at_id.
Any ideas?

Your working example implies that the "first table" you want to see records from is a and the "second table" you want to use to exclude records is b. If you are excluding all records that exist in b, then you can't further limit the result set by any value like b.at_id because there are no values associated with b in your result set.
Additionally, if the condition b.at_id is null is true, the condition b.at_id != 1 will never be true because an inequality comparison with null will always return null. (The reason for this is that null is not a value; it is a placeholder indicating the absence of a value.)
If you want to get the same results from both queries, based on a comparison between some user input parameter and the field b.at_id (and noting that your second query currently returns an empty set), you might be able to use MySQL's null-safe equality operator in the following way:
SELECT
*
FROM
at_templates AS a
LEFT JOIN
at_vault AS b ON a.id = b.template
WHERE NOT b.at_id <=> 1;
This is a MySQL extension, not a standard syntax; unfortunately the ANSI SQL standard syntax, IS [NOT] DISTINCT FROM, doesn't appear to be widely supported. Some alternate ways to rewrite this condition are discussed in How to rewrite IS DISTINCT FROM and IS NOT DISTINCT FROM?.
Keep in mind that if in the future you have some values of b.at_id that are not 1, this query would return those rows as well, and not just the rows returned by your first query.

Related

MySQL 5.7 vs. 8 difference in LEFT JOIN processing

We are checking if we can upgrade our project database from MySQL 5.7 to v.8. The system is 7 years old and has tons of code... Today we got a slightly strange bug which did not appear on 5.7 (I wonder why). The buggy request is the following:
SELECT TableA.Amount, SUM(TableB.Amount) AS Amount2
FROM
TableA LEFT JOIN TableB ON TableA.ReservID = TableB.ReservID
WHERE
TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
AND TableA.ReservID = 4657;
There is one record in TableA and no records in TableB for the given conditions.
I know that WHERE conditions are applied after joining the tables. So it is not a suprise for me that the query return NULL, NULL on MySQL8. But our developer (who's still sure that this query is Ok) just showed me that it returns 67667.65, NULL on MySQL 5.7!
So I got 2 questions at ones. 1. Why it works on 5.7 when all data must be filtered out by the WHERE conditions on non-existent (all null in joint table) Table2 fields? 2. Is there a way to make MySQL8 work in the same 'tolerant' way as I am sure there are many such 'genius' queries all over our old code?
The problem in your query is not the (left) join. While it makes it less clear to the reader that your left join is treated as a join, having the comparisons in the where clause is completely valid sql. Every database will treat your left join correctly as a join, and I don't think that MySQL (5.7 or 8.0) would give you a different result if you replace left join with a join, as the internal representation would not change.
Your query has a problem with the aggregation. select colA, sum(colB) without using group by colA will leave the value of colA unclear, see MySQL Handling of GROUP BY:
SELECT name, MAX(age) FROM t;
Without GROUP BY, there is a single group and it is nondeterministic which name value to choose for the group.
MySQL is about the only database system that will even allow you to run this query, a very special behaviour that generates a lot of questions on stackoverflow. Most other databases will complain about that column listed in the select - exactly for the reason you face right now: they don't really know what to return.
So the value you get for tablea.amount is basically random. While it usually depends on how MySQL executes the query internally (so it could depend on some optimization setting), it looks unlikely that you can convince MySQL 8 to return the number value there - and especially to make sure it is consistent in all similar queries you may have.
And I want to emphasize: the value null in MySQL 8 is also not deterministic - it could be something different.
To make a proper query, use an aggregate function for tablea.amount too, and depending on your requirements, fix your left join, e.g.
SELECT MAX(TableA.Amount) AS Amount,
SUM(TableB.Amount) AS Amount2
FROM TableA LEFT JOIN TableB
ON TableA.ReservID = TableB.ReservID
AND TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
WHERE TableA.ReservID = 4657
This should give you the behaviour from MySQL 5.7, e.g. <value for amount>, null. If you use join, you should get null, null. Both cases will be deterministic.
Although using just SELECT TableA.Amount, SUM(...) ... LEFT JOIN ... (with a proper left join) will return the amount instead of null for MySQL 8 too, it is still not valid sql! MySQL will only allow it because ReservID = 4657 limits it to a single row in TableA using the primary key. So if you have to check all queries anyway, fix it properly.
i dont know the reason, why its working on 5.7. but to get your expected result you can do:
SELECT
TableA.Amount,
SUM(TableB.Amount) AS Amount2
FROM TableA
LEFT JOIN TableB
ON TableA.ReservID = TableB.ReservID
AND TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
WHERE TableA.ReservID = 4657;

MySQL aggregate function to filter nulls and conform with ONLY_FULL_GROUP_BY

I have a single record which joins to N other tables, and extracts a single column from each of them. I would like to put all N of those extracted columns in a single record.
After constructing the diagram below it seems like I can get to the second step easily, and then I should be able to use an aggregate function to filter out the NULL's. I have looked around for something like GROUP_COALESCE, but I couldn't find something which accomplishes this.
I have a fiddle here which unfortunately works, because MySQL will let you select columns which aren't in the GROUP BY without an aggregate at your own peril http://sqlfiddle.com/#!9/304992/1/0.
Is there a way I can make sure that it always selects the column from the record, if the record exists?
The end result should one record per group, and each column would contain the value which was inside the only row successfully joined for that group..
If I followed you correctly, you can just use aggregate functions on the columns coming from the joined tables. Aggregate functions ignore null values, so, since you have two null values and one non-null value for each column and each group, this will return the expected output (while conforming to the ONLY_FULL_GROUP_BY option).
SELECT
group_table_id,
MAX(t1.v) t1_v,
MAX(t2.v) t2_v,
MAX(t3.v) t3_v
FROM group_table
LEFT JOIN t1 ON t1.group_id = group_table_id
LEFT JOIN t2 ON t2.group_id = group_table_id
LEFT JOIN t3 ON t3.group_id = group_table_id
GROUP BY group_table_id

LEFT JOIN returns everything with NULL

I have two tables whome I am joining through left join. Both the tables are empty. But when I run the query, mysql returns a row with all NULLS.
I have tried several queries like
SELECT products.*,SUM(pq_quantity) as quantity
FROM `products` LEFT JOIN `products_quantities` ON `pq_product_idFk` = `p_id`
WHERE `p_volusion_id` = '37808'
OR
SELECT products.*,SUM(pq_quantity) as quantity
FROM `products` LEFT JOIN `products_quantities` ON `pq_product_idFk` = `p_id`
WHERE `p_volusion_id` = '37808' AND p_id IS NOT NULL
OR
SELECT products.*,SUM(pq_quantity) as quantity
FROM `products` LEFT JOIN `products_quantities` ON `pq_product_idFk` = `p_id` AND `p_volusion_id` = '37808' AND p_id IS NOT NULL
NONE of the above queries seem to work as I just want the result that is not NULL.
Thanks
Both the tables are empty. But when I run the query, mysql returns a row with all NULLS.
The presence of GROUP BY aggregate functions in the SELECT clause asks the GROUP BY clause to be present too. However, if it is not present, the SQL standard specifies that a single group is to be created using all the rows filtered by the WHERE clause.
Because of the * used in the SELECT clause, all the queries you posted are invalid SQL.
A query that contains a GROUP BY clause does not return rows from tables. It creates rows using the values extracted from the tables. First it creates groups (and sub-groups) using the expressions from the GROUP BY clause. All the rows from a group have the same value for the first expression specified in the GROUP BY clause.
If there are two or more expressions in the GROUP BY clause, each group is split into sub-groups using the second expression then each sub-group is further split into sub-sub-groups using the third expression (if exists) and so on.
From each such group of rows (after the last split), the database engine generates one new row and puts it into the result set. If the query contains in the SELECT clause expressions that are not either arguments of a GRUP BY aggregate function or also present in the GROUP BY clause then, most probably, these expressions will have more than one value in a subgroup. This is why the query is invalid SQL. Up to version 5.7.5, MySQL accepts such invalid queries but reserves itself the right to return any value it wants (from the group) for the offending expressions.
Back to your question, as explained above, even without having a GROUP BY clause, your query is processed as it had one and one group is created from all the rows filtered by the WHERE clause.
It is an empty group but this doesn't prevent the database engine to generate a row from it. Since there are no values to use to compute SUM(pG_quantity), NULL is the logical value it returns in the columns of the result set.
NULL is a special value that means the absence of any value or an unknown value. It make perfect sense in your case. You don't have any value in the tables, there is no way one could compute SUM(pq_quantity). Its value is not available (i.e. NULL).

a comma between SELECT statements

I have this query:
SELECT (#a:=#a+1) AS priority
FROM (SELECT t1.name FROM t1 LIMIT 100) x, (SELECT #a:=0) r
a few questions:
1 - What is the comma doing between the SELECTS? I have never seen a comma between commands, and I don't know what it means
2 - why is the second SELECT given a name?
3 - why is the second SELECT inside brackets?
4 - Performance-wize: Does it select the first 100 rows form t1, and then assigns them a number? What is going on here??
It is performing a CROSS JOIN (a cartesian product of the rows) but without the explicit syntax. The following 2 queries produce identical in results:
SELECT *
FROM TableA, TableB
SELECT *
FROM TableA
CROSS JOIN TableB
The query in the question uses 2 "derived tables" instead. I would encourage you to use the explicit join syntax CROSS JOIN and never use just commas. The biggest issue with using just commas is you have no idea if the Cartesian product is deliberate or accidental.
Both "derived tables" have been given an alias - and that is a good thing. How else would you reference some item of the first or second "derived table"? e.g. Imagine they were both queries that had the column ID in them, you would then be able to reference x.ID or r.ID
Regarding what the overall query is doing. First note that the second query is just a single row (1 row). So even though the syntax produces a CROSS JOIN it does not expand the total number of rows because 100 * 1 = 100. In effect the subquery "r" is adding a "placeholder" #a (initially at value zero) on every row. Once that #a belongs on each row, then you can increment the value by 1 for each row, and as a result you get that column producing a row number.
x and r are effectively anonymous views produced by the SELECT statements. If you imagine that instead of using SELECTs in brackets, you defined a view using the select statement and then referred to the view, the syntax would be clear.
The selects are given names so that you can refer to these names in WHERE conditions, joins or in the list of fields to select.
That is the syntax. You have to have brackets.
Yes, it selects the first 100 rows. I am not sure what you mean by "gives them a number".

How to optimize a MySQL update which contains an "in" subquery?

How do I optimize the following update because the sub-query is being executed for each row in table a?
update
a
set
col = 1
where
col_foreign_id not in (select col_foreign_id in b)
You could potentially use an outer join where there are no matching records instead of your not in:
update table1 a
left join table2 b on a.col_foreign_id = b.col_foreign_id
set a.col = 1
where b.col_foreign_id is null
This should use a simple select type rather than a dependent subquery.
Your current query (or the one that actually works since the example in the OP doesn't look like it would) is potentially dangerous in that a NULL in b.col_foreign_id would cause nothing to match, and you'd update no rows.
not exists would also be something to look at if you want to replace not in.
I can't tell you that this will make your query any faster, but there is some good info here. You'll have to test in your environment.
Here's a SQL Fiddle illuminating the differences between in, exists, and outer join (check the rows returned, null handling, and execution plans).