mysql subquery row comparision logic - mysql

I have question on the subquery logic. For my understanding, sql always parse the subquery and then outler query. However, the example from official documents does not support that.
linke as blow from mysql official documents
https://dev.mysql.com/doc/refman/8.0/en/subqueries.html
Here is another example, which again is impossible with a join
because it involves aggregating for one of the tables.
It finds all rows in table t1 containing a value that occurs twice in a given column:
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
For my understanding, when sql parse the subquery, it will find the
(SELECT COUNT(*) FROM t1 WHERE t1.id = t.id)
return scalar based on the table. then when it compare with 2
it will either return all row in t1 or nothing since it is easy true or false question.
obvious it does not like what office documents discussion
. based on the official documents, it is more like checking row by row from t1 table, make comparsion with scalar 2 and keep looping the whole table until the end.
why the logic is like? I have some difficulty to understand the logic.
Any future explaination will helps.

There is no difficulty.
It depends all how many rows you get back, because it will check every row of t1 and compare if the table has 2 row with that value. lieke you can see here https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=d70942c0321f9310c3b89dbc9435b418
(SELECT COUNT(*) FROM t1 WHERE t1.id = t.id)
reruns 1 value (1 column 1 row) for example 3 which it compares to your 2 with =
As long as the subquery only returns 1 Value you can compare it with a singular value with =
For multiple Rows in the result set you need IN
LIKE
SELECT * FROM t1 AS t
WHERE cis IN (SELECT t1.id FROM t1 WHERE t1.id = t.id);
And of course you can compare to multiple columns(also here you need ÌN if you have multiple rows)
SELECT * FROM t1 AS t
WHERE (2,2) = (SELECT COUNT(*),SUM(abc = 'test') FROM t1 WHERE t1.id = t.id);

Related

Create a new variable in SQL by groupby

I have 2 sql table as follows:
First table t1:
Second table t2:
I need to calculate the count of "Number" column based on "Name" column from t1 and merge it with t2.
I wrote following code. But it seems not working
select *
from (
select Name, count(Number) as count
from t1
group by Name ) as a
join ( select *
from t2 ) as b
on a.Name = b.Name;
Can any one figure out what is wrong ? Thank you very much
I think you want to use SUM() instead of COUNT().
Because SUM() sums some integers, while COUNT() counts number of occurencies.
And as also stated in the comments, multiple columns with same names will create conflicts, so you have to select the wanted columns explicit (that is usually a good idea anyway).
You could obtain your wanted endgoal by this query:
select
SUM(Number),
t1.Name,
(select val1 FROM t2 WHERE t2.Name = t1.Name LIMIT 1) as val1
FROM t1
GROUP BY t1.Name
Example in sqlfiddle: http://sqlfiddle.com/#!9/04dddf/7

Mysql - Not In takes Time when nested select query is dynamic but not if it's constant

T1 contains about 30 million rows,
T2 contains about 100k rows
select a from T2 gives ('a1','a2','a3',...); (1 lakh rows)
When I use 100k constant values directly inside the in block the query returns result in 80 millisec. But, when I use nested select in the query, it takes like forever.
select a,b from T1 where a in ('a1','a2','a3', ...); (Constant Values inside in block)
select a,b from T1 where a in (select a from T2); (Query instead of values)
Any Idea why is it happening? Also is there a better way to do so?
Since T1 contains 30 million rows, Left Join also takes a lot of time.
My Actual Query is :
select a,b from t1 where (a,b) not in (select a,b from t2) and a in (select a from t1);
There is a third, better, way:
SELECT ...
FROM T1
JOIN T2 ON T1.a = T2.a;
(And be sure that there is an index on a in each table.)
IN ( SELECT ... ) is notoriously slow; avoid it.
The subquery is more expensive in MySQL because MySQL hasn't optimized that type of query very well. MySQL doesn't notice that the subquery is invarant, in other words the subquery on T2 has the same result regardless of which row of T1 is being searched. So you and I can see clearly that MySQL should execute the subquery once and use its result to evaluate each row of T1.
But in fact, MySQL naively assumes that the subquery is a correlated subquery that may have a different result for each row in T1. So it is forced to run the subquery many times.
You clarified in a comment that your query is actually:
select a,b from t1
where (a,b) not in (select a,b from t2)
and a in (select a from t1);
You should also know that MySQL does not optimize tuple comparison at all. Even if it should use an index, it will do a table-scan. I think they're working on fixing that in MySQL 8.
Your second term is unnecessary, because the subquery is selecting from t1, so obviously any value of a in t1 exists in t1. Did you mean to put t2 in the subquery? I'll assume that you just made an error typing.
Here's how I would write your query:
select a, b from t1
left outer join (select distinct a, b from t2) as nt2
on t1.a = nt2.a and t1.b = nt2.b
where nt2.a is null;
In these cases, MySQL treats the subquery differently because it appears in the FROM clause. It runs each subquery once and stores the results in temp tables. Then it evaluates the rows of t1 against the data in the temp tables.
The use of distinct is to make sure to do the semi-join properly; if there are multiple matching rows in t2, we don't want multiple rows of output from the query.

MySQL WHERE EXISTS evaluating to true for all records

I'm trying to run a query that retrives all records in a table that exists in a subquery.
However, it is returning all records insteal of just the ones that I am expecting.
Here is the query:
SELECT DISTINCT x FROM T1 WHERE EXISTS
(SELECT * FROM T1 NATURAL JOIN T2 WHERE T2.y >= 3.0);
I've tried testing the subquery and it returns the correct number of records that meet my constraint.
But when I run the entire query it returns records that should not exists in the subquery.
Why is EXISTS evaluating true for all the records in T1?
You need a correlated subquery, not a join in the subquery. It is unclear what the right correlation clause is, but something like this:
SELECT DISTINCT x
FROM T1
WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.COL = T1.COL AND T2.y >= 3.0);
Your query has a regular subquery. Whenever it returns at least one row, then the exists is true. So, there must be at least one matching row. This version "logically" runs the subquery for each row in the outer T1.
Q: Why is EXISTS evaluating true for all the records in T1?
A: Because the subquery returns a row, entirely independent of anything in the outer query.
The EXISTS predicate is simply checking whether the subquery is returning a row or not, and returning a boolean TRUE or FALSE.
You'd get the same result with:
SELECT DISTINCT x FROM T1 WHERE EXISTS (SELECT 1)
(The only difference would be if that subquery didn't return at least one row, then you'd get no rows returned in the outer query.)
There's no correlation between the rows returned by the subquery and the rows in the outer query.
I expect that there's another question you want to ask. And the answer to that really depends on what result set you are wanting to return.
If you are wanting to return rows from T1 that have some "matching" row in T2, you could use either a NOT EXISTS (correlated subquery)
Or, you could also use a join operation to return an equivalent result, for example:
SELECT DISTINCT T1.x
FROM T1
NATURAL
JOIN T2
WHERE T2.y >= 3.0
It isn't working because there is no correlation between the outer query and the subquery being used. Below there is a correlation in the form of and T1.id = T2.id
SELECT DISTINCT x
FROM T1
WHERE EXISTS ( SELECT 1 FROM T2 WHERE T2.y >= 3.0 and T1.id = T2.id)
;
But, without knowing the data I'd hope you do NOT need to use "distinct" in that query, and this would produce the same result:
SELECT x
FROM T1
WHERE EXISTS ( SELECT 1 FROM T2 WHERE T2.y >= 3.0 and T1.id = T2.id)
;
An alternative, which probably would require distinct, is a variation ofh the second half of your second query
SELECT DISTINCT x FROM T1 NATURAL JOIN T2 WHERE T2.y >= 3.0
You can use an INNER JOIN to get where you're trying to go:
SELECT DISTINCT T1.X
FROM T1
INNER JOIN T2
ON T2.COL = T1.COL
WHERE T2.Y > 3.0
Share and enjoy.

UPDATE with nested query in PostgreSQL with unwanted results

Due to its geographic capabilities I'm migrating my database from MySQL to PostgreSQL/PostGIS, and SQL that used to be so trivial is now are becoming painfully slow to overcome.
In this case I use a nested query to obtain the results in two columns, having in 1st column an ID and in the 2nd a counting result and insert those results in table1.
EDIT: This is the original MySQL working code that I need to be working in PostgreSQL:
UPDATE table1 INNER JOIN (
SELECT id COUNT(*) AS cnt
FROM table2
GROUP BY id
) AS c ON c.id = table1.id
SET table1.cnt = c.cnt
The result is having all rows with the same counting result, that being the 1st counting result of the nested select.
In MySQL this would be solved easily.
How would this work in PostgreSQL?
Thank you!
UPDATE table1 dst
SET cnt = src.cnt
FROM (SELECT id, COUNT (*) AS cnt
FROM table2
GROUP BY id) as src
WHERE src.id = dst.id
;

SQL "IN" combined with "=" in WHERE clause

I'm struggling with someone else's code. What might the WHERE clause do in the following (MySQL) statement?
SELECT * FROM t1, t2 WHERE t1.id = t2.id IN (1,2,3)
It's not providing the desired result in my case, but I'm trying to figure what the original author intended.
Can anyone provide an example of the use of a WHERE clause like this?
This condition starts from the right, evaluates t2.id IN (1,2,3), gets the result (0 or 1), and uses it for join with t1.id. All rows of t2 with id from the IN list are joined to the row in t1 that has id of one; all other rows of t2 are joined with the row in t1 that has id of zero. Here is a small demo on sqlfiddle.com: link.
It is hard to imagine that that was the intent of the author, however: I think a more likely check was for both items to be in the list, and also being equal to each other. The equality to each other is important, because it looks like the author wanted to join the two tables.
A more modern way of doing joins is with ANSI SQL syntax. Here is the equivalent of your query in ANSI SQL:
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id IN (1,2,3)