avoid subquery by using "join" and "group by" - mysql

I'm trying to avoid a subquery but I'm not able to get the right result:
This is the original query:
SELECT T1.IdL, T1.REG, T1.YearIn, T1.URL,
(SELECT Count(*) FROM T2 WHERE T1.IdL = T2.IdL) AS IdL_Count
FROM T1
The following is an attempt to avoid subquery but doesn't works becouse the rows where there aren't records in T2 are missing
SELECT T1.IdL, T1.REG, T1.YearIn, T1.URL, Count(*) AS IdL_Count
FROM T1 INNER JOIN T2 USING(IdL)
GROUP BY IdL
So I tryed LEFT JOIN but I get wrong IdL_Count: 1 instead of 0
Is there a way to avoid subquery or not?

There is no way to avoid sub query, at least without specified tables structure (indexes etc).
But this query should perform much better
SELECT T1.IdL, T1.REG, T1.YearIn, T1.URL, coalesce(T3.count, 0) AS IdL_Count
FROM T1
LEFT JOIN (SELECT IdL, count(*) as count FROM T2 GROUP BY IdL) T3 on T3.IdL = T1.IdL

I think if you make the COUNT(T2.[somefield]), instead of COUNT(*), it will return 0 when expected. If that does not work, you can do SUM(IF(T2.IdL IS NULL, 0, 1)) AS IdL_Count instead.

Related

How to optimize mysql on left join

I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last

MySQL WHERE EXISTS evaluating to true for all records

I'm trying to run a query that retrives all records in a table that exists in a subquery.
However, it is returning all records insteal of just the ones that I am expecting.
Here is the query:
SELECT DISTINCT x FROM T1 WHERE EXISTS
(SELECT * FROM T1 NATURAL JOIN T2 WHERE T2.y >= 3.0);
I've tried testing the subquery and it returns the correct number of records that meet my constraint.
But when I run the entire query it returns records that should not exists in the subquery.
Why is EXISTS evaluating true for all the records in T1?
You need a correlated subquery, not a join in the subquery. It is unclear what the right correlation clause is, but something like this:
SELECT DISTINCT x
FROM T1
WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.COL = T1.COL AND T2.y >= 3.0);
Your query has a regular subquery. Whenever it returns at least one row, then the exists is true. So, there must be at least one matching row. This version "logically" runs the subquery for each row in the outer T1.
Q: Why is EXISTS evaluating true for all the records in T1?
A: Because the subquery returns a row, entirely independent of anything in the outer query.
The EXISTS predicate is simply checking whether the subquery is returning a row or not, and returning a boolean TRUE or FALSE.
You'd get the same result with:
SELECT DISTINCT x FROM T1 WHERE EXISTS (SELECT 1)
(The only difference would be if that subquery didn't return at least one row, then you'd get no rows returned in the outer query.)
There's no correlation between the rows returned by the subquery and the rows in the outer query.
I expect that there's another question you want to ask. And the answer to that really depends on what result set you are wanting to return.
If you are wanting to return rows from T1 that have some "matching" row in T2, you could use either a NOT EXISTS (correlated subquery)
Or, you could also use a join operation to return an equivalent result, for example:
SELECT DISTINCT T1.x
FROM T1
NATURAL
JOIN T2
WHERE T2.y >= 3.0
It isn't working because there is no correlation between the outer query and the subquery being used. Below there is a correlation in the form of and T1.id = T2.id
SELECT DISTINCT x
FROM T1
WHERE EXISTS ( SELECT 1 FROM T2 WHERE T2.y >= 3.0 and T1.id = T2.id)
;
But, without knowing the data I'd hope you do NOT need to use "distinct" in that query, and this would produce the same result:
SELECT x
FROM T1
WHERE EXISTS ( SELECT 1 FROM T2 WHERE T2.y >= 3.0 and T1.id = T2.id)
;
An alternative, which probably would require distinct, is a variation ofh the second half of your second query
SELECT DISTINCT x FROM T1 NATURAL JOIN T2 WHERE T2.y >= 3.0
You can use an INNER JOIN to get where you're trying to go:
SELECT DISTINCT T1.X
FROM T1
INNER JOIN T2
ON T2.COL = T1.COL
WHERE T2.Y > 3.0
Share and enjoy.

Create a VIEW where a record in t1 is not present in t2 ? Confirmation on Union/Left Join/Inner Join?

I am trying to make a view of records in t1 where the source id from t1 is not in t2.
Like... "what records are not present in the other table?"
Do I need to include t2 in the FROM clause? Thanks
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1
WHERE t1.fee_source_id NOT IN (
SELECT t1.fee_source_id
FROM t1 INNER JOIN t2 ON t1.fee_source_id = t2.fee_source
)
ORDER BY t1.aif_id DESC
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using IN:
SELECT fee_source_id, company_name, document
FROM t1
WHERE fee_source_id NOT IN (SELECT fee_source FROM t2)
ORDER BY aif_id DESC
Using EXISTS:
SELECT fee_source_id, company_name, document
FROM t1
WHERE NOT EXISTS (
SELECT * FROM t2 WHERE t2.fee_source = t1.fee_source_id LIMIT 1
)
ORDER BY aif_id DESC
Using JOIN:
SELECT t1.fee_source_id, t1.company_name, t1.document
FROM t1 LEFT JOIN t2 ON t2.fee_source = t1.fee_source_id
WHERE t2.fee_source IS NULL
ORDER BY t1.aif_id DESC
According to #Quassnoi's analysis:
Summary
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
It will take each value from t_left and look it up in the index on t_right.value. In case of an index hit or an index miss, the corresponding predicate will immediately return FALSE or TRUE, respectively, and the decision to return the row from t_left or not will be made immediately without examining other rows in t_right.
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.
However, I'm not entirely sure how this analysis reconciles with the MySQL manual section on Optimizing Subqueries with EXISTS Strategy which (to my reading) suggests that the second approach above should be more efficient than the first.
Another option below (similar to anti-join)... Great answer above though. Thanks!
SELECT D1.deptno, D1.dname
FROM dept D1
MINUS
SELECT D2.deptno, D2.dname
FROM dept D2, emp E2
WHERE D2.deptno = E2.deptno
ORDER BY 1;

MySQL correlated subquery SUM() ORDER BY

Is there anyway to optimize the following query:
SELECT
t1.id,
(SELECT SUM(col1) FROM table_name_two t2 WHERE t2.name LIKE CONCAT('%',t1.name)) AS col1_count
FROM
table_name_one t1
ORDER BY
col1_count DESC
Using ORDER BY col1_count DESC takes a long time.
Thanks.
Just make a normal join with your comparison in the join's on clause:
SELECT
t1.id,
SUM(t2.col1) AS col1_count
FROM table_name_one t1
LEFT JOIN table_name_two t2 on t2.name LIKE CONCAT('%', t1.name)
GROUP BY 1
ORDER BY 2 DESC
It should be way faster this way - it's basically one query instead of "n" queries, although it won't get any help from indexes using the LIKE operator with a leading %

sql query in "WHERE IN" clause

select * from tablename
where id in(select id from tablename2 where condition UNION select -1)
Is it ok to use select -1 as if the inner query does not result anything it will give error. It is feasible or not?
imho, inner-select is far from ideal (slow)
based on your posted SQL, an inner join will do the trick
select *
from tablename as t1
inner join tablename2 as t2
on t1.id=t2.id
where condition; --- your condition
If you have to get it done with a subquery then the correct way to do it would probably be:
SELECT *
FROM tablename AS t1
WHERE EXISTS
(SELECT id
FROM tablename2 AS t2
WHERE conditions)
It won't give an error if the query returns nothing. It just returns an empty resultset.