MySQL LEFT JOIN with WHERE function-call produces wrong result - mysql

From MySQL 5.7 I am executing a LEFT JOIN, and the WHERE clause calls a user-defined function of mine. It fails to find a matching row which it should find.
[Originally I simplified my actual code a bit for the purpose of this post. However in view of a user's proposed response, I post the actual code as it may be relevant.]
My user function is:
CREATE FUNCTION `jfn_rent_valid_email`(
rent_mail_to varchar(1),
agent_email varchar(45),
contact_email varchar(60)
)
RETURNS varchar(60)
BEGIN
IF rent_mail_to = 'A' AND agent_email LIKE '%#%' THEN
RETURN agent_email;
ELSEIF contact_email LIKE '%#%' THEN
RETURN contact_email;
ELSE
RETURN NULL;
END IF
END
My query is:
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email)
AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL)
This produces no rows.
However. When a.AgentEmail IS NULL if I only change from
AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL)
to
AND (jfn_rent_valid_email(r.MailTo, NULL, co.Email) IS NOT NULL)
it does correctly produce a matching row:
RentCode, MailTo, AgentEmail, Email, ValidEmail
ZAKC17, N, <NULL>, name#email, name#email
So, when a.AgentEmail is NULL (from non-matching LEFT JOINed row), why in the world does passing it to the function as a.AgentEmail act differently from passing it as a literal NULL?
[BTW: I believe I have used this kind of construct under MS SQL server in the past and it has worked as I would expect. Also, I can reverse the test of AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NOT NULL) to AND (jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) IS NULL) yet I still get no match. It's as though any reference to a.... as a parameter to the function causes no matching row...]

Most likely this is an issue with optimizer turning the LEFT JOIN into a INNER JOIN. The optimizer may do this when it believes that the WHERE-condition is always false for the generated NULL row (which it in this case is not).
You can take a look at the query plan with the EXPLAIN command, you will likely see different table order depending on the query variation.
If the actual logic of the function is to check all emails with one function call, you may have better luck with using a function that takes just one email address as parameter and use that for each email-column.
You can try without the function:
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email)
AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
AND ((r.MailTo='A' AND a.AgentEmail LIKE '%#%') OR co.Email LIKE '%#%' )
Or wrap the function in a subquery:
SELECT q.RentCode, q.MailTo, q.AgentEmail, q.Email, q.ValidEmail
FROM (
SELECT r.RentCode, r.MailTo, a.AgentEmail, co.Email,
jfn_rent_valid_email(r.MailTo, a.AgentEmail, co.Email) AS ValidEmail
FROM rents r
LEFT JOIN contacts co ON r.RentCode = co.RentCode -- this produces one match
LEFT JOIN link l ON r.RentCode = l.RentCode -- there will be no match in `link` on this
LEFT JOIN agents a ON l.AgentCode = a.AgentCode -- there will be no match in `agents` on this
WHERE r.RentCode = 'ZAKC17' -- this produces one match
) as q
WHERE q.ValidEmail IS NOT NULL

Changing the call to the function in the WHERE clause to read
jfn_rent_valid_email(r.MailTo, IFNULL(a.AgentEmail, NULL), IFNULL(co.Email, NULL)) IS NOT NULL
solves the issue.
It appears that the optimizer feels it can incorrectly guess that the function will return NULL in the non-match LEFT JOIN case if a plain reference to a.AgentEmail is passed as any parameter. But if the column reference is inside any kind of expression the optimizer ducks out. Wrapping it inside a "dummy", seemingly pointless IFNULL(column, NULL) is thus enough to restore correct behaviour.
I am marking this as the accepted solution because it is by far the simplest workaround, requiring the least code change/complete query rewrite.
However, full credit is due to #slaakso's post here in this topic for analysing the problem. Note that he states that the behaviour has been fixed/altered in MySQL 8 such that this workaround is unnecessary, so it may only be necessary in MySQL 5.7 or earlier.

Related

how can i return a null value if the AND statement produces no results?

i have a pretty lengthy sql statement that pulls data based on a users input. one of the parameters is a date range and currently if the date does not exist then nothing is returned. im trying to figure out how to place the AND parameter within an if condition so that if the date does not exist, the data is still returned with a null. i have looked into IFNULL and CASE but cant seem to figure out a way to implement it properly
SELECT pName,pNum,pPhase,pStart,pEnd,pComp,pHoursBudgeted,Zee_Kray_A
FROM hourmap
JOIN projects ON projects.pID = hourmap.ProjectID
JOIN schedule ON schedule.id = hourmap.ScheduleID
WHERE (pManager LIKE '%' or pManager is Null)
AND (pNum LIKE '%90668%' or pNum is Null)
AND (year_week LIKE '2020-W01'or year_week is Null)
;
within the sql above, in the last parameter, if 2020-W01 is anything other than this, than there is nothing returned. how can i place this final parameter in an if statement or make it conditional?
use left join and move the left joined colunms condition in ON clause
SELECT pName,pNum,pPhase,pStart,pEnd,pComp,pHoursBudgeted,Zee_Kray_A
FROM hourmap
LEFT JOIN projects ON projects.pID = hourmap.ProjectID
LEFT JOIN schedule ON schedule.id = hourmap.ScheduleID
AND (pManager LIKE '%' or pManager is Null)
AND (pNum LIKE '%90668%' or pNum is Null)
AND (year_week LIKE '2020-W01'or year_week is Null)
;

SQL: Something wrong with inheriting variables for NULL next-row values

I'm trying to inherit value from previous row (based on correct subscription_id + checking for IS NULL subscription_status), but something goes wrong and I get incorrect value.
Take a look at screenshot.
If I'm not mistaken it also called last non-null puzzle, but examples of possible solution for other DB provide window function with IGNORE NULLS.
But, I'm using MySQL 8.x and it doesn't support this function.
I'm sorry, but SQL fiddle doesn't provide correct text-value for variables in my code :(
https://www.db-fiddle.com/f/wHanqoSCHKJHus5u6BU4DB/4
Or, you can see mistakes here:
SET #history_subscription_status = NULL;
SET #history_subscription_id = 0;
SELECT
c.date,
c.user_id,
c.subscription_id,
sd.subscription_status,
(#history_subscription_id := c.subscription_id) as 'historical_sub_id',
(#history_subscription_status := CASE
WHEN #history_subscription_id = c.subscription_id AND sd.subscription_status IS NULL
THEN #history_subscription_status
ELSE
sd.subscription_status
END
) as 'historical'
FROM
calendar c
LEFT JOIN
subscription_data sd ON sd.date = c.date AND sd.user_id = c.user_id AND sd.subscription_id = c.subscription_id
ORDER BY
c.user_id,
c.subscription_id,
c.date
I expect to get results for this query in this way:
IMPORTANT: I'm going to use this code for a lot of data (about 1 mln rows), so it very important for me to avoid additional select or subquery that can slow down the execution of the query.

Optimize derived table in select

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

showing Zero if sql count is NULL

I have the following query (executed through PHP). How can I make it showing ZEROs if the result is NULL and is not shown.
select count(schicht) as front_lcfruh,
kw,
datum
from dienstplan
left join codes on dienstplan.schicht = codes.lcfruh
left join personal on personal.perso_id = dienstplan.perso_id
where codes.lcfruh != ''
and personal.status = 'rezeption'
and dienstplan.kw = '$kw'
group by dienstplan.datum
I'm not entirely sure I understand the question, but I think you want this:
select count(codes.lcfruh) as front_lcfruh,
dienstplan.kw,
dienstplan.datum
from dienstplan
left join codes on dienstplan.schicht = codes.lcfruh and codes.lcfruh <> ''
left join personal on personal.perso_id = dienstplan.perso_id
and personal.status = 'rezeption'
and dienstplan.kw = $kw
group by dienstplan.datum, dienstplan.kw
If schicht comes from dienstplan there will always be a row for that (as that is the driving table). If I understand you correctly you want a 0 if no matching rows are found. Therefor you need to count the joined table.
Edit:
The condition where codes.lcfruh != '' turns the outer join back into an inner join because any "outer" row will have lcfruh as NULL and any comparison with NULL yields "unknown" and therefor the rows are removed from the final result. If you want to exclude rows in the codes table where the lcfruh has an empty string, you need to move that condition into the JOIN clause (see above).
And two more things: get used to prefixing your columns in a query with more than one table. That avoids ambiguity and makes the query more stable against changes. You should also understand the difference between number literals and string literals 1 is a number '1' is a string. It's a bad habit to use string literals where numbers are expected. MySQL is pretty forgiving as it always try to "somehow" work but if you ever user other DBMS you might get errors you don't understand.
Additionally your usage of group by is wrong and will lead to "random" values being returned. Please see these blog posts to understand why:
http://rpbouman.blogspot.de/2007/05/debunking-group-by-myths.html
http://www.mysqlperformanceblog.com/2006/09/06/wrong-group-by-makes-your-queries-fragile/
Every other DBMS will reject your query the way it is written now (and MySQL will as well in case you turn on a more ANSI compliant mode)
If you have no matching rows, then MySQL will return an empty set (here I have defined the fields at random, just to run the query):
mysql> CREATE TABLE dienstplan (kw varchar(10), datum integer, schicht integer, perso_id integer);
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE codes (lcfruh varchar(2));
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE personal (perso_id integer, status varchar(5));
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> select count(schicht) as front_lcfruh,
-> kw,
-> datum
-> from dienstplan
-> left join codes on dienstplan.schicht = codes.lcfruh
-> left join personal on personal.perso_id = dienstplan.perso_id
-> where codes.lcfruh != ''
-> and personal.status = 'rezeption'
-> and dienstplan.kw = '$kw'
-> group by dienstplan.datum;
Empty set (0.00 sec)
You can check how many rows were returned by the query. In this case you will get zero.
So you could modify your code like this (pseudo code):
$exec = $db->execute($query);
if ($exec->error() !== false) {
// Handle errors. Possibly quit or raise an exception.
}
if (0 == $exec->count())
{
// No rows. We return a default tuple
$tuple = array(
'front_lcfruh' => 0,
'kw' => $kw,
'datum' => null
);
handleTuple($tuple);
} else {
while($tuple = $exec->fetch()) {
handleTuple($tuple);
}
}
Where handleTuple() is the function that formats or otherwise manipulates the returned rows.
SELECT dp.datum
, MIN(kw) -- To avoid adding it to GROUP BY clause
, count(schicht) AS front_lcfruh
FROM dienstplan dp
LEFT JOIN codes co ON dp.schicht = co.lcfruh
LEFT JOIN personal pe ON pe.perso_id = dp.perso_id
WHERE dp.kw = '$kw'
AND COALESCE(co.lcfruh, 'X') <> '' -- Handle NULLs, too
AND COALESCE(pe.status , 'rezeption' ) = 'rezeption' -- Handle NULLs, too
GROUP BY dp.datum
;
COUNT() can NEVER return NULL.
Lorenzo said (see comments) that I should provide references to help the SO community. Based on that feedback, I've edited my answer below.
You should consult the documentation of the COUNT() function for MySQL here.
You should also consider reading this article at Stack Overflow entitled "can COUNT(*) ever return NULL".
Or this one entitled "When does COUNT(*) return NULL"
Or this one entitled "COUNT(*) returns NULL"
Or this one entitled "Return NULL if COUNT(*) is zero"
Or this article on Oracle's forums entitled "COUNT never returns NULL?"
select COALESCE(count(schicht),0) as front_lcfruh,kw,datum
from dienstplan
left join codes on dienstplan.schicht = codes.lcfruh
left join personal on personal.perso_id = dienstplan.perso_id
where codes.lcfruh != ''
and personal.status = 'rezeption'
and dienstplan.kw = '$kw'
group by dienstplan.datum
just replace
count(schicht)
with
COALESCE(count(schicht),0)

Access LookUp table

i need to create a lookup table in Access, where all the abbreviations are related to a value, and if the abbreviation (in the main table) is null, then i want to show "Unknown"
i got the values working, but i can't seem to get the nulls to show up.
my lookup table looks like this:
REQUEST REQUEST_TEXT
------------------------
A Approve
D Disapprove
NULL N/A
but when i do a count by request, it only shows me values for A and D, all though i know there are some blanks in there as well.
what am i doing wrong?
This should be easier if you change tblLookup.
REQUEST REQUEST_TEXT
------------------------
A Approve
D Disapprove
U Unknown
Then, in tblMain, change the REQUEST field to Required = True and Default Value = "U". When new records are added, they will have U for REQUEST unless the user changes it to A or D.
Then a query which JOINs the 2 tables on REQUEST should get you what I think you want.
SELECT m.REQUEST, l.REQUEST_TEXT
FROM tblMain AS m
INNER JOIN tblLookup AS l
ON l.REQUEST = m.REQUEST;
You should also create a relationship between the 2 tables, and select the option to enforce referential integrity in order to prevent the users from adding a spurious value such as "X" for REQUEST.
Edit:
If changing tblMain structure is off the table, and if you're doing this from within an Access session, you can use the Nz() function on a LEFT JOIN.
SELECT m.REQUEST, Nz(l.REQUEST_TEXT, "Unknown")
FROM tblMain AS m
LEFT JOIN tblLookup AS l
ON l.REQUEST = m.REQUEST;
If you're doing this from outside an Access session, like from ASP, the Nz() function will not be available. So you can substitute an IIf() expression for Nz().
SELECT m.REQUEST, IIf(l.REQUEST_TEXT Is Null, "Unknown", l.REQUEST_TEXT)
FROM tblMain AS m
LEFT JOIN tblLookup AS l
ON l.REQUEST = m.REQUEST;
Edit2: You can't directly JOIN with Null values. However with the "Unknown" row I suggested for tblLookup, you could use a JOIN which includes Nz for tblMain.REQUEST
SELECT m.id, m.request, l.request_text
FROM tblMain AS m
INNER JOIN tblLookup AS l
ON Nz(m.request,"U") = l.request;
If you want to leave tblLookup REQUEST as Null for REQUEST_TEXT = Unknown, I suppose you could use Nz on both sides of the JOIN expression. However, this whole idea of joining Nulls makes me cringe. I would fix the tables instead.