Different result of not exists and not in - mysql

I hava two very similar queries, however they return different result.
The first one:
select * from products p
where p.val >= 999999 and
not exists
(select * from products p2 where p2.val < 999999 and p.user_id = p2.user_id);
The second one:
select * from products p
where p.val >= 999999 and
p.user_id not in (select user_id from products p2 where p2.val < 999999);
The first one gave me the right answer, while the second one gave me no (zero) result. Is it possible that this happened because the subquery in the second query gave too many results?

Beware of nulls!
If there's a NULL in the sub-query, the NOT IN will not work as most people expect.
The issue will be clearer if you translate NOT IN (...) either to NOT (... OR ...) or to NOT ... AND NOT ... and apply the three-valued logic to the resulting expression.
To illustrate this with an example, let's say the condition is NOT IN (1, 2, NULL) and the row being checked has a value of 3.
Using NOT (... OR ...) you get this:
NOT (3=1 OR 3=2 OR 3=NULL)
The first two conditions in the brackets are false the last one is unknown. Based on the three-valued logic, the result of the disjunction will be unknown. Inversion of an unknown is also unknown, according to that same logic. The result of unknown in a WHERE clause is treated same as the result of false, i.e. a no-match. So, here you are.
Now, if you rewrite the NOT IN with NOT ... AND NOT ..., this is what you get:
NOT 3=1 AND NOT 3=2 AND NOT 3=NULL
The first two terms are true, the last one is unknown (3=NULL is unknown, its inversion is unknown as well). Again, the three-valued logic says the final result is unknown in this case. Here you are again.
So, when a row has a value that is not in the subset but the subset also contains nulls, either do not use NOT IN or filter out the nulls.

Related

How to return row with diffrent value if "where" has not been met in MySQL

I have problem with receiving rows if id from one table dont match second one.
If zamowienia.id_telefon is null or dont match i dont recive whole row.
I want to instead get column crm2018.telefon.numer with "0" or null value. Please help :)
I tried something like that but its obvious syntax eror:
SELECT
crm2018.zamowienia.*,
crm2018.telefon.numer
FROM
crm2018.zamowienia
JOIN crm2018.telefon
WHERE
if (zamowienia.id_telfon != "0") zamowienia.id_telefon = telefon.id_telefon else crm2018.telefon.numer as "0"
Here's working code but with missing rows.
SELECT
crm2018.zamowienia.*,
crm2018.telefon.numer
FROM
crm2018.zamowienia
JOIN crm2018.telefon
WHERE
zamowienia.id_telefon = telefon.id_telefon
Just use LEFT JOIN instead of (INNER) JOIN.
Accordingly, you need to move the join condition from the WHERE clause to the ON clause of the join, to avoid filtering out unmatched records. Please note that as it is, your query has a JOIN without ON clause : this is a syntax error in all SQL dialects.
Finally, I would recommend using table aliases in the query : this makes it easier to read and to maintain.
SELECT
z.*,
t.numer
FROM
crm2018.zamowienia AS z
LEFT JOIN crm2018.telefon AS t
ON z.id_telefon = t.id_telefon
When no record is available in crm2018.telefon for the given crm2018.zamowienia, the record will still be displayed, with all columns coming from crm2018.telefon showing NULL values.
If needed, you can turn NULL values to 0 with the COALESCE() function, like :
COALESCE(t.numer, 0)

Optimize derived table in select

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

Group rows but keep values where not null

I am trying to group rows in MySQL but end up with a wrong result.
My DB looks like this:
I'm using this query:
SELECT
r_id, va_id,va_klasse,va_periode,
1va_mer,1va_hjem,1va_mot,1va_bil,1va_fit,1va_hand,1va_med,1va_fra,
2va_mer,2va_hjem,2va_trae,2va_bil,2va_sty,2va_mus,2va_med,2va_fra,
3va_mer,3va_hjem,3va_mot,3va_bil,3va_pima,3va_nat,3va_med,3va_fra,
va_lock, va_update
FROM o6hxd_valgfag
WHERE va_klasse IN('7A','7B','7C','8A','8B','8C','9A','9B','9C')
GROUP BY va_id
ORDER BY va_klasse,va_name
This produces a wrong result, where one row is returned with only the first three numbers 123 and not the ones from row two and three.
What I would like is a result where the numbers 123, 321 and 132 are gathered in one line.
I can explain more detailed if this isn't sufficient.
If across those fields there should only be ever one value, you should really have them all in the same record and go about fixing it to insert and update the same record.
Ie I am aware that you database isn't designed correctly
However
To dig you out, you could give this a crack, I suppose.
SELECT
r_id, va_id,va_klasse,va_periode,
MAX(1va_mer),MAX(1va_hjem),MAX(1va_mot),MAX(1va_bil),MAX(1va_fit),MAX(1va_hand),MAX(1va_med),MAX(1va_fra),
MAX(2va_mer),MAX(2va_hjem),MAX(2va_trae),MAX(2va_bil),MAX(2va_sty),MAX(2va_mus),MAX(2va_med),MAX(2va_fra),
MAX(3va_mer),MAX(3va_hjem),MAX(3va_mot),MAX(3va_bil),MAX(3va_pima),MAX(3va_nat),MAX(3va_med),MAX(3va_fra),
va_lock, va_update
FROM o6hxd_valgfag
WHERE va_klasse IN('7A','7B','7C','8A','8B','8C','9A','9B','9C')
GROUP BY va_id
ORDER BY va_klasse,va_name
Your query will not work as intended. Think about this use-case:
what if for row1 (r_id =9), the fields 2va_sty, 2va_mus, 2va_med are not empty and has values?
In such case what should your desired output be? It certainly cannot be the numbers 123, 321 and 132 gathered in one line. Group by is usually used if you want to use aggregate functions executed against a certain field value, in your case va_id.
Not a solution to your problem but i think a better query would be like this (because of the not named columns in the group by clause https://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html):
SELECT
aa.r_id, aa.va_id, aa.va_klasse, aa.va_periode,
aa.1va_mer, aa.1va_hjem, aa.1va_mot, aa.1va_bil, aa.1va_fit, aa.1va_hand, aa.1va_med, aa.1va_fra,
aa.2va_mer, aa.2va_hjem, aa.2va_trae, aa.2va_bil, aa.2va_sty,2va_mus, aa.2va_med, aa.2va_fra,
aa.3va_mer, aa.3va_hjem, aa.3va_mot, aa.3va_bil, aa.3va_pima, aa.3va_nat, aa.3va_med, aa.3va_fra,
aa.va_lock, aa.va_update
FROM o6hxd_valgfag AS aa
INNER JOIN (
SELECT va_id
FROM o6hxd_valgfag
GROUP BY va_id
) AS _aa
ON aa.va_id = _aa.va_id
WHERE aa.va_klasse IN ('7A','7B','7C','8A','8B','8C','9A','9B','9C')
ORDER BY aa.va_klasse, aa.va_name;

Using the right MYSQL JOIN

I'm trying to get all the data from the match table, along with the currently signed up gamers of each type, experienced or not.
Gamers
(PK)Gamer_Id
Gamer_firstName,
Gamer_lastName,
Gamer experience(Y/N)
Gamer_matches
(PK)FK GamerId,
(PK)FK MatchId,
Gamer_score
Match
(PK)Match_Id,
ExperiencedGamers_needed,
InExperiencedGamers_needed
I've tried this query along with many others but it doesn't work, is it a bad join?
SELECT M.MatchId,M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "Y"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId)AS ExpertsSignedUp,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "N"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId) AS NovicesSignedUp
FROM MATCHES M
What you've written is called a correlated subquery which forces SQL to re-execute the subquery for each row fetched from Matches. It can be made to work, but it's pretty inefficient. In some complex queries it may be necessary, but not in this case.
I would solve this query this way:
SELECT M.MatchId, M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
SUM(G.experience = 'Y') AS ExpertsSignedUp,
SUM(G.experience = 'N') AS NovicesSignedUp
FROM MATCHES M
LEFT OUTER JOIN (Gamer_matches GM
INNER JOIN Gamers G ON G.GamerId = GM.GamerId)
ON M.MatchId = GM.MatchId
GROUP BY M.MatchId;
Here it outputs only one row per Match because of the GROUP BY at the end.
There's no subquery to re-execute many times, it's just joining Matches to the respective rows in the other tables once. But I use an outer join in case a Match has zero players of eithe type signed up.
Then instead of using COUNT() I use a trick of MySQL and use SUM() with a boolean expression inside the SUM() function. Boolean expressions in MySQL always return 0 or 1. The SUM() of these is the same as the COUNT() where the expression returns true. This way I can get the "count" of both experts and novices only scanning the Gamers table once.
P.S. MySQL is working in a non-standard way to return 0 or 1 from a boolean expression. Standard ANSI SQL does not support this, nor do many other brands of RDBMS. Standardly, a boolean expression returns a boolean, not an integer.
But you can use a more verbose expression if you need to write standard SQL for portability:
SUM(CASE G.experience WHEN 'Y' THEN 1 WHEN 'N' THEN 0 END) AS ExpertsSignedUp

SQL Join only returning 1 row

Not quite sure what I'm missing, but my SQL statement is only returning one row.
SELECT
tl.*,
(tl.topic_total_rating/tl.topic_rates) as topic_rating,
COUNT(pl.post_id) - 1 as reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list tl
JOIN post_list pl
ON tl.topic_id=pl.post_parent
WHERE
tl.topic_board_link = %i
AND topic_hidden != 1
ORDER BY %s
I have two tables (post_list and topic_list), and post_list's post_parent links to a topic_list's topic_id.
Instead of returning all the topics (where their board's topic_board_link is n), it only returns one topic.
You would normally need a GROUP BY clause in there. MySQL has different rules from Standard SQL on the subject of when GROUP BY is needed. This is therefore closer to Standard SQL:
SELECT tl.*,
(tl.topic_total_rating/tl.topic_rates) AS topic_rating,
COUNT(pl.post_id) - 1 AS reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list AS tl
JOIN post_list AS pl ON tl.topic_id = pl.post_parent
WHERE tl.topic_board_link = ? -- %i
AND tl.topic_hidden != 1
GROUP BY tl.col1, ..., topic_rating
ORDER BY ? -- %s
In Standard SQL, you would have to list every column in topic_list, plus the non-aggregate value topic_rating (and you might have to list the expression rather than the display label or column alias in the select list).
You also have a restriction condition on 'topic_board_link' which might be limiting your result set to one group. You cannot normally use a placeholder in the ORDER BY clause, either.