having trouble understanding CASE WHEN statement in SQL - mysql

SELECT state,
COUNT(CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END) as count_high_elevation_aiports
FROM airports
GROUP BY state;
In the above statement, what is the THEN 1 and what does '1' signify?
How does that value '1' after THEN affect the output?

First, note that these three expressions are equivent:
CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END
IF(elevation >= 2000, 1, NULL)
((elevation >= 2000) OR NULL)
If elevation >= 2000, the expression evaluates as "1", otherwise the expression evaluates as NULL.
"1" is conventionally used as boolean true, and you could substitute the MySQL literal TRUE in the above expressions with equivalent results... but that isn't what the "1" is for, here.
When used with COUNT(), in cases like this, the only real significance of 1 is that it is not NULL.
This is important, because -- contrary to popular belief -- COUNT() does not count rows. It counts values.
What's the difference? NULL is not technically a value. Instead, it is a marker that signifies the absence of a value, thus COUNT(expr) only counts rows where expr is not null.
By using an expression like the one here, you're asking the server to count the rows with elevation => 2000, and you do this by giving COUNT() a NULL for rows you want not to be counted... and a non-null value for rows you do.
Aggregate (GROUP BY) functions operate on values -- and NULL, again, is not a value in this sense.
Another aggregate function that makes this rationale perhaps even more clear is AVG(). If you had 3 rows... with values 5, NULL, and 10... what's the average? If you said 7.5, that's correct: the average of these 3 rows is (5 + 10) ÷ 2 = 5 because the 3 rows have only two values. NULL is not 0, otherwise the average would be (5 + 0 + 10) ÷ 3 = 5, which it is not.
So, that's how and why this works.
How does that value '1' after THEN affect the output?
It really doesn't. You could just as easily have said COUNT(CASE WHEN elevation >= 2000 THEN 'cat videos are funny' ELSE NULL END) because, just like the literal 1, the literal string 'cat videos are funny' is also not null, and non-null values -- anything not null -- is what count will count.
A novice might try to accomplish this task with COUNT(elevation >= 2000), but that gives the wrong answer, because the 0 (false) for rows where elevation is < 2000 is not null, so these rows would still be counted.
You may then ask, "why not just use COUNT(*) ... WHERE elevation >= 2000?" Good question. The reasons vary, but if you GROUP BY state and there are states with no rows matching WHERE, those states would be entirely eliminated from the results, which is often not what you want. This query includes them, with a count of zero.
Note that ((elevation >= 2000) OR NULL), the third example expression at the top, doesn't actually need the parentheses. I included them because this form is not necessarilly intuitive at first glance. The natural precedence of operations will cause this to be evaluated correctly if written simply elevation >= 2000 OR NULL. This expression is equivalent to the other two because elevation >= 2000 first evaluates to 1 if true, 0 if false, or NULL if elevation is null. Then the lower-precedence OR is evaluated, and you get one of these: 1 OR NULL => 1 ... 0 OR NULL => NULL ... NULL OR NULL => NULL... and you may actually be awarded a SQL wizard badge by the elders of the Internet at the point when writing queries with COUNT(elevation >= 2000 OR NULL) comes naturally to you.

Query will simply return 1 if elevation is > or = 2000, else it will return NULL (this is use full for boolean representation of field because NULL represents 0),
Now returned value will be set into count_high_elevation_airports.

Related

MySQL Case function behave strange and inconstent

We are using MySQL 8 as our java application DB.
We have a query with the following format:
select
id,
group_concat(NAME ORDER BY ID separator ',,') AS Code,
CASE
WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END AS factor
from MA_TABLE
join TABLE_P p on (...)
join TABLE_I i on (...)
group by id
The query worked very fine in our development environments until deploy with client where the factor column is getting null.
We have run the same query in the client environment from MySQL Workbench and we can see that the factor column is getting well populated.
After some debugging,we changed :
CASE
WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END AS factor
to
MAX(
WHEN p.VARIABLEfactor = 1 THEN i.factor ELSE p.factor END ) AS factor,
and the query worked correctly.
Any help here please?
From your explanation I gather that you don't understand the difference of your two case expressions. But they are very different. Let's look at an example for one ID:
ID
VARIABLEfactor
i.factor
p.factor
100
0
null
10
100
1
null
20
Your expression
CASE WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END
looks at the maximum VARIABLEfactor, which is 1, so the THEN case applies and the maximum i.factor is returned. This is null, as all i.factor are null.
Your expression
MAX(WHEN p.VARIABLEfactor = 1 THEN i.factor ELSE p.factor END)
looks at each row's VARIABLEfactor. For the first row this is 0, so the ELSE case applies and p.factor 10 is used. For the second row the VARIABLEfactor is 1, so its i.factor null gets used. Of these you take the maximum, which is 10.
To recap: The first expression is just a CASE expression on the aggregation results. It returns null here. The second expression is a conditional aggregation. It returns 10 for the sample data.

How to not COUNT a value in SSRS matrix when value is NULL

I cannot make the my expression NOT count a NULL value in my SSRS matrix.
In my SSRS matrix, I have 2 columns one for AppraisalCompany and a count under the SubmittedDate column. In my report this what is happening:
Per Derrick's suggestion here is the change I made in the ColumnGroup properties for the SubmittedDate:
Here is my expression change in the ColumnGroup properties:
Unfortunately I got this error:
I'm suspicious of your Dataset, I'm not entirely sure how you're getting a null value to return 1 in the COUNT. I have been unable to reproduce your results.
Dataset Query
SELECT 'Drive In' AS AppraisalCompany, NULL AS SubmittedDate
UNION
SELECT 'Photo App - English', 'Dec-18'
Next I created a Row Group on AppraisalCompany and a Column Group on SubmittedDate.
I filtered the column group to remove the null grouping, using the expression =IsNothing(Fields!SubmittedDate.Value), operator <>, and Value true.
In the textbox in the matrix I used [Count(SubmittedDate)].
OUTUT
Appraisal Company | Dec-18
-------------------------------
Drive In | 0
Photo App - English | 1
By a Decimal (Number) datatype Nothing and 0 are the same. You can test this.
Put a tablix into your report with year from 2017 to 2019. Then put the year in a column of the tablix as a number format, then write the following expression in the detail textbox:
=CDec(IIF(CDec(Fields!Year.Value) = 2017, 0, Nothing))
After executing your report you will notice that every value in the year column is 0.
The same goes for the check. Both of these expressions will always return Yes. I basically check for 0 and the second one for for Nothing:
=IIF(CDec(IIF(CDec(Fields!Jahr.Value) = 2017, 0, Nothing)) = 0, "Yes", "No")
=IIF(CDec(IIF(CDec(Fields!Jahr.Value) = 2017, 0, Nothing)) = Nothing, "Yes", "No")
But remember your textbox/column has the be a number format.
So if you want to return Nothing and you display it in a number format textbox, it will show you a 0.
With this in mind it will make sense that a Count() returns the value 1 for 0 AND Nothing. So basically this will do the trick:
'Cont
=Sum(IIF(Fields!YourValue.Value = Nothing, 0, 1))

Selecting a value based on how another field was generated

I'm selecting some data;
select c.*,
coalesce(s.column1, ...),
coalesce(s.column2, ...),
FROM
(SELECT ...)
Basically, if s.column1 or s.column2 is null then I am putting in some logic to take the average of that column and use it instead.
I want to have another field so I can know weather or not that value was computing using the average or not - perhaps a boolean? Lets say the average for column1 was 120, the table would look like;
column1 column2 avg
54 10 0
200 40 0
120 180 1
499 160 0
This allows me to see that the third row was generated using the avg of all rows as it was initially null.
How could the logic for the avg column work?
Your question seems fairly moot to me because:
The AVG function ignores NULL values by default, so the average using the overall average for NULL slots is the same as leaving out those slots entirely, and
If you just want to mark the rows which had a NULL value, you can use a CASE expression
So, to get what you want, just use this:
SELECT
column1,
column2,
CASE WHEN column1 IS NULL THEN 1 ELSE 0 END AS avg
FROM yourTable;
And know that SELECT AVG(column1) FROM yourTable would return the same value whether NULL rows were omitted, or the overall average were used.

IF with SUM produces incorrect results in MySQL

I am getting incorrect results when I try to insert a SUM into an IF clause. The results are correct when I use COUNT, either incorrect or NULL when I use SUM. I have been able to produce the correct results for each statement through another query (as a means of validating the formula). What is the syntax to get correct results for an SUM within an IF statement? Based on another StackOverflow question, I attempted to fix the formula, but it produced an error.
SELECT
IF (Artist LIKE '%Hillsong%' , 'Hillsong', NULL ) as Artist,
COUNT(IF(CCD BETWEEN 28 AND 730,1,NULL)) AS 1_CC, -- result 191, which is correct
IF(CCD BETWEEN 28 AND 730, SUM(CC28),NULL) AS 2_CC, -- result NULL, should be 610 xx
COUNT(IF(CCD > 28,1,NULL)) AS 3_CC, -- result 684, which is correct
COUNT(IF(CCD < 730,1,NULL)) AS 4_CC, -- result 502, which is correct
IF(CCD > 28,SUM(CC28),NULL) AS 5_CC, -- result 2253, should be 1882 xx
--- SUM(IF(CCD > 28,CC28,NULL) AS 6_CC, -- my attempt to fix, creates error
IF(CCD < 730,SUM(CC28),NULL) AS 7_CC -- result NULL, shoul be 981 xx
FROM praisecharts_reporting.large_sales_report
GROUP BY 1;
As a frame of reference, I am a music publisher, and I am trying to get results for all songs where the artist includes "Hillsong", where the Chord Chart (CC) has been available between 28 and 730 days (CCD BETWEEN 28 AND 730). The COUNT should tell me how many song titles qualify, and SUM should tell me the total unit sales for all songs that qualify.
I figured out the answer to my problem. The syntax for putting an IF clause inside a SUM is to use CASE WHEN. Below is the solution to my query above, and it produces correct results all around:
SELECT
IF (Artist LIKE '%Hillsong%' , 'Hillsong', NULL ) as Artist,
COUNT(IF(CCD BETWEEN 28 AND 730,1,NULL)) AS 1_CC,
SUM(CASE WHEN CCD BETWEEN 28 AND 730 THEN CC28 ELSE 0 END) AS 2_CC,
COUNT(IF(CCD > 28,1,NULL)) AS c3_CC,
COUNT(IF(CCD < 730,1,NULL)) AS c4_CC,
SUM(CASE WHEN CCD > 28 THEN CC28 ELSE 0 END) AS 5_CC,
SUM(CASE WHEN CCD < 730 THEN CC28 ELSE 0 END) AS 6_CC
FROM praisecharts_reporting.large_sales_report
GROUP BY 1;
You should be on the right track on your attempt to use SUM - IF
--- SUM(IF(CCD > 28,CC28,NULL) AS 6_CC, -- my attempt to fix, creates error
If you look closely, you are missing an ending parenthesis. We must pay attention to the error we received. Most likely you received a syntax error right after AS 6_CC,.
Just add a closing parenthesis after NULL:
SUM(IF(CCD > 28,CC28,NULL)) AS 6_CC,
Try to use SUM - IF on your other columns as well. Let me know if this works.
With SUM - IF you can have a true / false result, which is as simple as it can be (though you can have nested if's inside, but that would make it unreadable).
With SUM - CASE you have an option to have more results by providing more conditions, just like a SWITCH statement.
Without the underlying data it would be hard to verify the results. But your NULL result may be due to a underlying NULL in data or due to condition clause.
Try replacing SUM(CC28) with SUM(IFNULL(CC28,0))
A sum of Integer and NULL equals NULL. Hence you might be getting NULL in second set.

SQL: Sum returns null

So i have two tables academy_attempt & module_attempt
I am attempting to add two values from each of these tables together:
round(((select
sum(`academy_attempt`.`score`)
from
`academy_attempt`
where
((`academy_attempt`.`module_type_id` in (3 , 4, 5, 6))
and (`academy_attempt`.`user_id` = `U`.`id`))) + (select
sum(ifnull(`module_attempt`.`score`, 0))
from
`module_attempt`
where
((`module_attempt`.`module_type_id` in (3 , 4, 5, 6))
and (`module_attempt`.`user_id` = `U`.`id`)))),
2) AS `total_score`
in academy_attempt the where statement is met and in one row it returns the right amount (if it is alone) however module_attempt does not have any values that matches the where statement and therefor returns null.
Sadly this does not turn into 0 and since im guessing you can't do the operation: 17 + null = 17 it instead returns null.
To counter this i have attempt an IFNULL statement as you can see above but sadly this did not fix the problem
You have to apply the IFNULL() higher up, because an empty result set is considered to be null:
SELECT (...
) + IFNULL((SELECT SUM(`module_attempt`.`score`) ...), 0) AS total_score
NULL represents an unknown value, so naturally trying to add an unknown value to a number still results in an unknown value, albeit a different unknown value (hence NULL != NULL)
I think you actually want the COALESCE function, which returns the first non-null argument. Thus, you can wrap your null value with this function, and sum it as normal. COALESCE( NULL, 0 ) will return 0, and COALESCE(1,0) will return 1
https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce