Subqueries with EXISTS vs IN - MySQL - mysql

Below two queries are subqueries. Both are the same and both works fine for me. But the problem is Method 1 query takes about 10 secs to execute while Method 2 query takes under 1 sec.
I was able to convert method 1 query to method 2 but I don't understand what's happening in the query. I have been trying to figure it out myself. I would really like to learn what's the difference between below two queries and how does the performance gain happen ? what's the logic behind it ?
I'm new to these advance techniques. I hope someone will help me out here. Given that I read the docs which does not give me a clue.
Method 1 :
SELECT
*
FROM
tracker
WHERE
reservation_id IN (
SELECT
reservation_id
FROM
tracker
GROUP BY
reservation_id
HAVING
(
method = 1
AND type = 0
AND Count(*) > 1
)
OR (
method = 1
AND type = 1
AND Count(*) > 1
)
OR (
method = 2
AND type = 2
AND Count(*) > 0
)
OR (
method = 3
AND type = 0
AND Count(*) > 0
)
OR (
method = 3
AND type = 1
AND Count(*) > 1
)
OR (
method = 3
AND type = 3
AND Count(*) > 0
)
)
Method 2 :
SELECT
*
FROM
`tracker` t
WHERE
EXISTS (
SELECT
reservation_id
FROM
`tracker` t3
WHERE
t3.reservation_id = t.reservation_id
GROUP BY
reservation_id
HAVING
(
METHOD = 1
AND TYPE = 0
AND COUNT(*) > 1
)
OR
(
METHOD = 1
AND TYPE = 1
AND COUNT(*) > 1
)
OR
(
METHOD = 2
AND TYPE = 2
AND COUNT(*) > 0
)
OR
(
METHOD = 3
AND TYPE = 0
AND COUNT(*) > 0
)
OR
(
METHOD = 3
AND TYPE = 1
AND COUNT(*) > 1
)
OR
(
METHOD = 3
AND TYPE = 3
AND COUNT(*) > 0
)
)

An Explain Plan would have shown you why exactly you should use Exists. Usually the question comes Exists vs Count(*). Exists is faster. Why?
With regard to challenges present by NULL: when subquery returns Null, for IN the entire query becomes Null. So you need to handle that as well. But using Exist, it's merely a false. Much easier to cope. Simply IN can't compare anything with Null but Exists can.
e.g. Exists (Select * from yourtable where bla = 'blabla'); you get true/false the moment one hit is found/matched.
In this case IN sort of takes the position of the Count(*) to select ALL matching rows based on the WHERE because it's comparing all values.
But don't forget this either:
EXISTS executes at high speed against IN : when the subquery results is very large.
IN gets ahead of EXISTS : when the subquery results is very small.
Reference to for more details:
subquery using IN.
IN - subquery optimization
Join vs. sub-query.

Method 2 is fast because it is using EXISTS operator, where I MySQL do not load any results.
As mentioned in your docs link as well, that it omits whatever is there in SELECT clause. It only checks for the first value that matches the criteria, once found it sets the condition TRUE and moves for further processing.
On the other side Method 1 has IN operator which loads all possible values and then matches it. Condition is set TRUE only when exact match is found which is time consuming process.
Hence your method 2 is fast.
Hope it helps...

The EXISTS operator is a Boolean operator that returns either true or false. The EXISTS operator is often used the in a subquery to test for an “exist” condition.
SELECT
select_list
FROM
a_table
WHERE
[NOT] EXISTS(subquery);
If the subquery returns any row, the EXISTS operator returns true, otherwise, it returns false.
In addition, the EXISTS operator terminates further processing immediately once it finds a matching row. Because of this characteristic, you can use the EXISTS operator to improve the performance of the query in some cases.
The NOT operator negates the EXISTS operator. In other words, the NOT EXISTS returns true if the subquery returns no row, otherwise it returns false.
You can use SELECT *, SELECT column, SELECT a_constant, or anything in the subquery. The results are the same because MySQL ignores the select_list that appears in the SELECT clause.
The reason is that the EXISTS operator works based on the “at least found” principle. It returns true and stops scanning table once at least one matching row found.
On the other hands, when the IN operator is combined with a subquery, MySQL must process the subquery first and then uses the result of the subquery to process the whole query.
The general rule of thumb is that if the subquery contains a large volume of data, the EXISTS operator provides a better performance.
However, the query that uses the IN operator will perform faster if the result set returned from the subquery is very small.
For detail explanations and examples: MySQL EXISTS - mysqltutorial.org

The second Method is faster because you've got this like there "WHERE t3.reservation_id = t.reservation_id". In the first case your subquery has to do a full scan into the table to verify the information. However at the 2o Method the subquery knows exactly what it is looking for and once it is found is checked the having condition then.

Their Official Documentation.SubQuery Optimization with Exists

Related

Can a mysql query be extended as child of another query?

Lets assume we have got query1 as follow :
select * from users where status = 1
this will output some results,I can cache these data, now the second query is :
select * from users where status = 1 and point >= 50
as you see the second query is somehow the child of first query, it returns a subset of last query data and has common code as well, is there a way which I can speed up my second query by using first query results and shorten my code using the first query code?
Yes, you use nested queries:
select x.*
from
(
select * from users
where status = 1
) as x
where x.point >= 50;

sub query returns more than 1 - issue with passing values into subquery

I am running the following query which keep stating that more then one row is given:
select filestorage.id
from `filestorage`
where (SELECT LEFT(filestorage_inst_data.value, length(filestorage_inst_data.value - 1)) as seconds
FROM filestorage_inst_data
WHERE filestorage_inst_data.parameter = "Time" AND filestorage_inst_data.filestorage_id = filestorage.id) <= 3600
For some reason, the only very first value is passed into the subquery. Also, if I do set a limit within the subquery than the data is fetched fine, it's just I don't see why query would fetch multiple results?
Try this:
SELECT filestorage.id
FROM filestorage f
WHERE EXISTS(SELECT 1 FROM filestorage_inst_data fid
WHERE fid.parameter = 'Time'
AND fid.filestorage_id = f.id
AND CAST(LEFT(fid.value, length(fid.value - 1)) AS UNSIGNED) <= 3600)
You have to pass a specific one row when giving a select statement on where clause. select clause that, the one you using on where clause must return one unique row. for example.
"SELECT * FROM user WHERE role_id=(SELECT role_id FROM user_role WHERE role_name='Admin');"

Set a list in a variable in subquery - MYSQL

My problem is the following, I want set a list of ID in a variable, then use this variable in a subquery. The problem is that WorkBench (my GUI) return the following error : "subquery returning multiple rows". It seems to me that's what I want.
Please explain me where I am wrong.
This is my query :
set #listID := (select ID_VOIE as ID from voies
where ORIGINE = 'XXX'
group by CODE_INSEE, CODE_VOIE
having count(*) > 1);
select substring(v.CODE_INSEE,1,2), count(*) from voies v
where v.ID_VOIE in (#listID)
group by substring(vs.CODE_INSEE,1,2);
The thing is I'm blocked with the "group by", I want do a groupd by after a first group by, that's why I can't (or at least i didn't find a way) write the request with a single WHERE clause.
The thing is I know that I can put the whole request directly in my subquery instead of using variable but :
It can let me use this trick in another requests that needed this behaviour (DRY concept !)
I'm not sure but the subquery will be executed in each turn of my loop, and that will be very inefficient
So I seek 2 possible ways : a way that let me use a list in a variable in a subquery OR a way that let me use "group by" twice in a single query.
Thanks you in advance for your answers (oh and sorry for my english, this is not my maternal language).
Unless you need that variable for something else, you should be able to skip it entirely as follows:
SELECT
SUBSTRING(v.CODE_INSEE,1,2),
COUNT(*)
FROM
voies v
WHERE
v.ID_VOIE in
(SELECT
ID_VOIE as ID
FROM
voies
WHERE
ORIGINE = 'XXX'
GROUP BY
CODE_INSEE,
CODE_VOIE
HAVING COUNT(*) > 1)
GROUP BY
SUBSTRING(vs.CODE_INSEE,1,2);
As you say, the subquery will be executed for all rows. To avoid that, a variable would be best, but MySQL doesn't support table variables. Instead, you can use a temporary table:
IF EXISTS DROP TABLE myTempTable;
CREATE TEMPORARY TABLE myTempTable (ID_VOIE int); -- I don't know the datatype
INSERT INTO myTempTable (ID_VOIE)
SELECT DISTINCT -- using distinct so I can join instead of use IN.
ID_VOIE as ID from voies
WHERE
ORIGINE = 'XXX'
GROUP BY
CODE_INSEE, CODE_VOIE
HAVING COUNT(*) > 1
And now you can do this:
SELECT
SUBSTRING(v.CODE_INSEE,1,2), COUNT(*)
FROM
voies v
JOIN myTempTable tt ON
v.ID_VOIE = tt.ID_VOIE
GROUP BY SUBSTRING(vs.CODE_INSEE,1,2);

Is there any way to execute a SELECT only if a condition in a different table is met?

Is there any way to do that in a single query? Or do I have to manage it externally? It is not a JOIN of any kind.
SELECT
IF (
(SELECT indicator FROM configuration_table) = 1,
(SELECT series_id FROM series_table LIMIT 1),
''
) as if_exp
FROM
series_table
This executes but returns the first ID over and over, and if I take out the LIMIT 1, it doesn't work as it expects only one result. But what I need is that, if this condition is met:
(SELECT indicator FROM configuration_table) = 1,
Then I need all this data returned:
SELECT series_id, series_code, series_name FROM series_table
Is it possible somehow? Should I be doing two queries and managing the data from php? Thank you very much.
The easiest way would be:
IF ((SELECT indicator FROM configuration_table) = 1) THEN
SELECT series_id, series_code, series_name FROM series_table
END IF
You did not show us what to do, when the condition is false. We do not know the relationship between configuration_table and series_table, so we can't find a way to make it in a single query.
I have copied this answer from IF Condition Perform Query, Else Perform Other Query this answer.
SELECT CASE WHEN ( (SELECT indicator FROM configuration_table) = 1 )
THEN
SELECT series_id, series_code, series_name FROM series_table
ELSE
<QUERY B>
END
Here Query B should replaced by your desired query.

MySQL SELECT 1 vs SELECT `field_id` AND COUNT 1 vs COUNT (*) or COUNT (`field_id`) Performance wise

I have a very simple question.
I want to know if a certain database row exists.
I generally use :
SELECT 1 FROM `my_table` WHERE `field_x` = 'something'
Then I fetch the result with :
$row = self::$QueryObject->fetch();
And check if any results :
if(isset($row[1]) === true){
return(true);
}
You can do this also with :
COUNT 1 FROM `my_table` WHERE `field_x` = 'something'
And similar to COUNT * FROMmy_tableandCOUNT field_id FROM `my_table
But I was wondering.. How does this relate to performance?
Are there any cons to using SELECT 1 or COUNT 1??
My feeling says that select INTEGER 1 means the lowest load.
But is this actually true??
Can anyone enlighten me?
Actually all your solutions are suboptimal :) What you do with your queries is reading every row there is to be found, even if you add limit. Do it like this:
SELECT EXISTS ( SELECT 1 FROM `my_table` WHERE `field_x` = 'something');
EXISTS returns 1 if something was found, 0 if not. It stops searching as soon as an entry was found. What you select in the subquery doesn't matter, you can even select null.
Also keep in mind, that COUNT(*) or COUNT(1) are very different from COUNT(column_name). COUNT(*) counts every row, while COUNT(column_name) only count the rows that are not null.
If you add the LIMIT 1 to the end of the query then SELECT works better than COUNT especially when you have a large table.