Odd parentheses behavior in Where clause - mysql

SETUP:
MySQL 5.7.14 (Google SQL)
DESCRIPTION:
In the following scenario it appears I am getting some false matches in my where clause where I'm NOT using parentheses. But adding the parentheses DOES yield the correct results.
This Query DOES return results with tsd.StatusID = 3 (wrong):
SELECT
tsee.ID, tsd.StatusID
FROM TSShiftDetails tsd
JOIN TSShiftEmployees tse
ON tse.ShiftID = tsd.ID
JOIN TSShiftEmpEntries tsee
ON tsee.ShiftEmpID = tse.ID
WHERE tsee.CCID IN (4590) OR tsee.CCID LIKE null
AND tsd.StatusID != 3
While this query DOES NOT return results with AND tsd.StatusID = 3 (correct):
SELECT
tsee.ID, tsd.StatusID
FROM TSShiftDetails tsd
JOIN TSShiftEmployees tse
ON tse.ShiftID = tsd.ID
JOIN TSShiftEmpEntries tsee
ON tsee.ShiftEmpID = tse.ID
WHERE (tsee.CCID IN (4590) OR tsee.CCID LIKE null)
AND tsd.StatusID != 3
QUESTION:
While I feel I completely understand why the query WITH the parentheses is working. My question is WHY is the one without parentheses returning records with a StatusID == 3? I would think without any functional ordering of parentheses, the AND tsd.StatusID != 3 clause would be applied to every match regardless of the preceding OR.
What Ya'll think? Am I misunderstanding, or is MySQL behaving inconsistently here?
P.S.
FYI, Yes there is a front end application reason for the need to have the Where clause formatted this way. eg. tsee.CCID IN (4590) as opposed to tsee.CCID =4590

The explanation has nothing to do with LIKE NULL or IN ( ).
Boolean expressions follow an order of operator precedence, just like arithmetic.
In arithmetic, you may remember that multiplication has higher precedence than addition:
A + B * C
Without parentheses, this works exactly like:
A + (B * C)
If you want the addition to be evaluated first, you must use parentheses to override the default operator precedence:
(A + B) * C
Similarly, in boolean expressions, AND has higher precedence than OR.
A OR B AND C
Works like:
A OR (B AND C)
If you want the OR to be evaluated first, you must use parentheses to override the default operator precedence:
(A OR B) AND C
How does this explain what you're seeing?
WHERE tsee.CCID IN (4590) OR tsee.CCID LIKE null
AND tsd.StatusID != 3
This works as if you had done:
WHERE tsee.CCID IN (4590) OR (tsee.CCID LIKE null
AND tsd.StatusID != 3)
So if it finds a row with CCID 4590, that row satisfies the whole WHERE clause, because true OR (anything) is still true.

Related

MySQL LEFT JOIN with WHERE function-call produces wrong result

From MySQL 5.7 I am executing a LEFT JOIN, and the WHERE clause calls a user-defined function of mine. It fails to find a matching row which it should find.
[Originally I simplified my actual code a bit for the purpose of this post. However in view of a user's proposed response, I post the actual code as it may be relevant.]
My user function is:
CREATE FUNCTION `jfn_rent_valid_email`(
rent_mail_to varchar(1),
agent_email varchar(45),
contact_email varchar(60)
)
RETURNS varchar(60)
BEGIN
IF rent_mail_to = 'A' AND agent_email LIKE '%#%' THEN
RETURN agent_email;
ELSEIF contact_email LIKE '%#%' THEN
RETURN contact_email;
ELSE
RETURN NULL;
END IF
END
My query is:
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email)
AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL)
This produces no rows.
However. When a.AgentEmail IS NULL if I only change from
AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL)
to
AND (jfn_rent_valid_email(r.MailTo, NULL, co.Email) IS NOT NULL)
it does correctly produce a matching row:
RentCode, MailTo, AgentEmail, Email, ValidEmail
ZAKC17, N, <NULL>, name#email, name#email
So, when a.AgentEmail is NULL (from non-matching LEFT JOINed row), why in the world does passing it to the function as a.AgentEmail act differently from passing it as a literal NULL?
[BTW: I believe I have used this kind of construct under MS SQL server in the past and it has worked as I would expect. Also, I can reverse the test of AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL) to AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NULL) yet I still get no match. It's as though any reference to a.... as a parameter to the function causes no matching row...]
Most likely this is an issue with optimizer turning the LEFT JOIN into a INNER JOIN. The optimizer may do this when it believes that the WHERE-condition is always false for the generated NULL row (which it in this case is not).
You can take a look at the query plan with the EXPLAIN command, you will likely see different table order depending on the query variation.
If the actual logic of the function is to check all emails with one function call, you may have better luck with using a function that takes just one email address as parameter and use that for each email-column.
You can try without the function:
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email)
AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
AND ((r.MailTo='A' AND a.AgentEmail LIKE '%#%') OR co.Email LIKE '%#%' )
Or wrap the function in a subquery:
SELECT q.RentCode, q.MailTo, q.AgentEmail, q.Email, q.ValidEmail
FROM (
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
) as q
WHERE q.ValidEmail IS NOT NULL
Changing the call to the function in the WHERE clause to read
jfn_rent_valid_email(r.MailTo, IFNULL(a.AgentEmail, NULL), IFNULL(co.Email, NULL)) IS NOT NULL
solves the issue.
It appears that the optimizer feels it can incorrectly guess that the function will return NULL in the non-match LEFT JOIN case if a plain reference to a.AgentEmail is passed as any parameter. But if the column reference is inside any kind of expression the optimizer ducks out. Wrapping it inside a "dummy", seemingly pointless IFNULL(column, NULL) is thus enough to restore correct behaviour.
I am marking this as the accepted solution because it is by far the simplest workaround, requiring the least code change/complete query rewrite.
However, full credit is due to #slaakso's post here in this topic for analysing the problem. Note that he states that the behaviour has been fixed/altered in MySQL 8 such that this workaround is unnecessary, so it may only be necessary in MySQL 5.7 or earlier.

Optimize derived table in select

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

Different result of not exists and not in

I hava two very similar queries, however they return different result.
The first one:
select * from products p
where p.val >= 999999 and
not exists
(select * from products p2 where p2.val < 999999 and p.user_id = p2.user_id);
The second one:
select * from products p
where p.val >= 999999 and
p.user_id not in (select user_id from products p2 where p2.val < 999999);
The first one gave me the right answer, while the second one gave me no (zero) result. Is it possible that this happened because the subquery in the second query gave too many results?
Beware of nulls!
If there's a NULL in the sub-query, the NOT IN will not work as most people expect.
The issue will be clearer if you translate NOT IN (...) either to NOT (... OR ...) or to NOT ... AND NOT ... and apply the three-valued logic to the resulting expression.
To illustrate this with an example, let's say the condition is NOT IN (1, 2, NULL) and the row being checked has a value of 3.
Using NOT (... OR ...) you get this:
NOT (3=1 OR 3=2 OR 3=NULL)
The first two conditions in the brackets are false the last one is unknown. Based on the three-valued logic, the result of the disjunction will be unknown. Inversion of an unknown is also unknown, according to that same logic. The result of unknown in a WHERE clause is treated same as the result of false, i.e. a no-match. So, here you are.
Now, if you rewrite the NOT IN with NOT ... AND NOT ..., this is what you get:
NOT 3=1 AND NOT 3=2 AND NOT 3=NULL
The first two terms are true, the last one is unknown (3=NULL is unknown, its inversion is unknown as well). Again, the three-valued logic says the final result is unknown in this case. Here you are again.
So, when a row has a value that is not in the subset but the subset also contains nulls, either do not use NOT IN or filter out the nulls.

Using the right MYSQL JOIN

I'm trying to get all the data from the match table, along with the currently signed up gamers of each type, experienced or not.
Gamers
(PK)Gamer_Id
Gamer_firstName,
Gamer_lastName,
Gamer experience(Y/N)
Gamer_matches
(PK)FK GamerId,
(PK)FK MatchId,
Gamer_score
Match
(PK)Match_Id,
ExperiencedGamers_needed,
InExperiencedGamers_needed
I've tried this query along with many others but it doesn't work, is it a bad join?
SELECT M.MatchId,M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "Y"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId)AS ExpertsSignedUp,
(SELECT COUNT(GM.GamerId)
FROM Gamers G, Gamers_matches GM
WHERE G.GamerId = GM.GamerId
AND G.experience = "N"
AND GM.MatchId = M.MatchId
GROUP BY GM.MatchId) AS NovicesSignedUp
FROM MATCHES M
What you've written is called a correlated subquery which forces SQL to re-execute the subquery for each row fetched from Matches. It can be made to work, but it's pretty inefficient. In some complex queries it may be necessary, but not in this case.
I would solve this query this way:
SELECT M.MatchId, M.ExperiencedGamers_needed,M.InExperiencedGamers_needed,
SUM(G.experience = 'Y') AS ExpertsSignedUp,
SUM(G.experience = 'N') AS NovicesSignedUp
FROM MATCHES M
LEFT OUTER JOIN (Gamer_matches GM
INNER JOIN Gamers G ON G.GamerId = GM.GamerId)
ON M.MatchId = GM.MatchId
GROUP BY M.MatchId;
Here it outputs only one row per Match because of the GROUP BY at the end.
There's no subquery to re-execute many times, it's just joining Matches to the respective rows in the other tables once. But I use an outer join in case a Match has zero players of eithe type signed up.
Then instead of using COUNT() I use a trick of MySQL and use SUM() with a boolean expression inside the SUM() function. Boolean expressions in MySQL always return 0 or 1. The SUM() of these is the same as the COUNT() where the expression returns true. This way I can get the "count" of both experts and novices only scanning the Gamers table once.
P.S. MySQL is working in a non-standard way to return 0 or 1 from a boolean expression. Standard ANSI SQL does not support this, nor do many other brands of RDBMS. Standardly, a boolean expression returns a boolean, not an integer.
But you can use a more verbose expression if you need to write standard SQL for portability:
SUM(CASE G.experience WHEN 'Y' THEN 1 WHEN 'N' THEN 0 END) AS ExpertsSignedUp

SQL Join only returning 1 row

Not quite sure what I'm missing, but my SQL statement is only returning one row.
SELECT
tl.*,
(tl.topic_total_rating/tl.topic_rates) as topic_rating,
COUNT(pl.post_id) - 1 as reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list tl
JOIN post_list pl
ON tl.topic_id=pl.post_parent
WHERE
tl.topic_board_link = %i
AND topic_hidden != 1
ORDER BY %s
I have two tables (post_list and topic_list), and post_list's post_parent links to a topic_list's topic_id.
Instead of returning all the topics (where their board's topic_board_link is n), it only returns one topic.
You would normally need a GROUP BY clause in there. MySQL has different rules from Standard SQL on the subject of when GROUP BY is needed. This is therefore closer to Standard SQL:
SELECT tl.*,
(tl.topic_total_rating/tl.topic_rates) AS topic_rating,
COUNT(pl.post_id) - 1 AS reply_count,
MIN(pl.post_time) AS topic_time,
MAX(pl.post_time) AS topic_bump
FROM topic_list AS tl
JOIN post_list AS pl ON tl.topic_id = pl.post_parent
WHERE tl.topic_board_link = ? -- %i
AND tl.topic_hidden != 1
GROUP BY tl.col1, ..., topic_rating
ORDER BY ? -- %s
In Standard SQL, you would have to list every column in topic_list, plus the non-aggregate value topic_rating (and you might have to list the expression rather than the display label or column alias in the select list).
You also have a restriction condition on 'topic_board_link' which might be limiting your result set to one group. You cannot normally use a placeholder in the ORDER BY clause, either.