IF with SUM produces incorrect results in MySQL - mysql

I am getting incorrect results when I try to insert a SUM into an IF clause. The results are correct when I use COUNT, either incorrect or NULL when I use SUM. I have been able to produce the correct results for each statement through another query (as a means of validating the formula). What is the syntax to get correct results for an SUM within an IF statement? Based on another StackOverflow question, I attempted to fix the formula, but it produced an error.
SELECT
IF (Artist LIKE '%Hillsong%' , 'Hillsong', NULL ) as Artist,
COUNT(IF(CCD BETWEEN 28 AND 730,1,NULL)) AS 1_CC, -- result 191, which is correct
IF(CCD BETWEEN 28 AND 730, SUM(CC28),NULL) AS 2_CC, -- result NULL, should be 610 xx
COUNT(IF(CCD > 28,1,NULL)) AS 3_CC, -- result 684, which is correct
COUNT(IF(CCD < 730,1,NULL)) AS 4_CC, -- result 502, which is correct
IF(CCD > 28,SUM(CC28),NULL) AS 5_CC, -- result 2253, should be 1882 xx
--- SUM(IF(CCD > 28,CC28,NULL) AS 6_CC, -- my attempt to fix, creates error
IF(CCD < 730,SUM(CC28),NULL) AS 7_CC -- result NULL, shoul be 981 xx
FROM praisecharts_reporting.large_sales_report
GROUP BY 1;
As a frame of reference, I am a music publisher, and I am trying to get results for all songs where the artist includes "Hillsong", where the Chord Chart (CC) has been available between 28 and 730 days (CCD BETWEEN 28 AND 730). The COUNT should tell me how many song titles qualify, and SUM should tell me the total unit sales for all songs that qualify.

I figured out the answer to my problem. The syntax for putting an IF clause inside a SUM is to use CASE WHEN. Below is the solution to my query above, and it produces correct results all around:
SELECT
IF (Artist LIKE '%Hillsong%' , 'Hillsong', NULL ) as Artist,
COUNT(IF(CCD BETWEEN 28 AND 730,1,NULL)) AS 1_CC,
SUM(CASE WHEN CCD BETWEEN 28 AND 730 THEN CC28 ELSE 0 END) AS 2_CC,
COUNT(IF(CCD > 28,1,NULL)) AS c3_CC,
COUNT(IF(CCD < 730,1,NULL)) AS c4_CC,
SUM(CASE WHEN CCD > 28 THEN CC28 ELSE 0 END) AS 5_CC,
SUM(CASE WHEN CCD < 730 THEN CC28 ELSE 0 END) AS 6_CC
FROM praisecharts_reporting.large_sales_report
GROUP BY 1;

You should be on the right track on your attempt to use SUM - IF
--- SUM(IF(CCD > 28,CC28,NULL) AS 6_CC, -- my attempt to fix, creates error
If you look closely, you are missing an ending parenthesis. We must pay attention to the error we received. Most likely you received a syntax error right after AS 6_CC,.
Just add a closing parenthesis after NULL:
SUM(IF(CCD > 28,CC28,NULL)) AS 6_CC,
Try to use SUM - IF on your other columns as well. Let me know if this works.
With SUM - IF you can have a true / false result, which is as simple as it can be (though you can have nested if's inside, but that would make it unreadable).
With SUM - CASE you have an option to have more results by providing more conditions, just like a SWITCH statement.

Without the underlying data it would be hard to verify the results. But your NULL result may be due to a underlying NULL in data or due to condition clause.
Try replacing SUM(CC28) with SUM(IFNULL(CC28,0))
A sum of Integer and NULL equals NULL. Hence you might be getting NULL in second set.

Related

How do I fix this SQL query (MariaDB) to produce the correct output?

I have been trying to extract some data from a database and produce the correct output. The following is what I am looking for
id | ref_no | Order time | discount code | total | items | donations
This is my SQL query so far:
SELECT
order.id,
order.ref_no,
order.cdate AS "Order time",
order.promocode AS "discount code",
order.orig_total as "total",
GROUP_CONCAT(order_item_ref.item_id ORDER BY order_item_ref.item_id) AS "items",
(CASE WHEN order_item_ref.item_id = "99"
THEN order_item_ref.quantity ELSE "0" END) AS "donations"
FROM `order`
INNER JOIN `order_item_ref`
ON order.id = order_item_ref.order_id
WHERE order.deleted = "0"
GROUP BY order.id;
Currently it doesn't quite work. The donation column is supposed to show 0 when an order does not contain an item with the item number 99 or the actual amount (order_item_ref.quantity) if it does. However the query only works sometimes to produce something like the following:
If I drop by the GROUP BY and GROUP CAT parts the query works as intended, however I need to keep the grouping intact.
How can I fix the query so the output is correct?
Edit: the output for line 48 is correct. The output for line 49 is not. There should be a non-zero number for the right hand column which should come from order_item_ref.quantity as per the SQL query above.
Since the question has been reopened I can post my own answer. After the explanation made by KIKO Software, I was looking for a way to aggregate (CASE WHEN order_item_ref.item_id = "99" THEN order_item_ref.quantity ELSE "0" END) AS "donations" and found that I was able to do it by simply summing over the output like so:
SUM((CASE WHEN order_item_ref.item_id = "99" THEN order_item_ref.quantity ELSE "0" END)) AS "donations"

MySQL Case function behave strange and inconstent

We are using MySQL 8 as our java application DB.
We have a query with the following format:
select
id,
group_concat(NAME ORDER BY ID separator ',,') AS Code,
CASE
WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END AS factor
from MA_TABLE
join TABLE_P p on (...)
join TABLE_I i on (...)
group by id
The query worked very fine in our development environments until deploy with client where the factor column is getting null.
We have run the same query in the client environment from MySQL Workbench and we can see that the factor column is getting well populated.
After some debugging,we changed :
CASE
WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END AS factor
to
MAX(
WHEN p.VARIABLEfactor = 1 THEN i.factor ELSE p.factor END ) AS factor,
and the query worked correctly.
Any help here please?
From your explanation I gather that you don't understand the difference of your two case expressions. But they are very different. Let's look at an example for one ID:
ID
VARIABLEfactor
i.factor
p.factor
100
0
null
10
100
1
null
20
Your expression
CASE WHEN MAX(p.VARIABLEfactor) = 1 THEN MAX(i.factor) ELSE MAX(p.factor) END
looks at the maximum VARIABLEfactor, which is 1, so the THEN case applies and the maximum i.factor is returned. This is null, as all i.factor are null.
Your expression
MAX(WHEN p.VARIABLEfactor = 1 THEN i.factor ELSE p.factor END)
looks at each row's VARIABLEfactor. For the first row this is 0, so the ELSE case applies and p.factor 10 is used. For the second row the VARIABLEfactor is 1, so its i.factor null gets used. Of these you take the maximum, which is 10.
To recap: The first expression is just a CASE expression on the aggregation results. It returns null here. The second expression is a conditional aggregation. It returns 10 for the sample data.

SQL : SELECT SUM WHERE CONDITION

I've got some troubles about SQL request :
I have a table like this table data image
I would like to create a view from this table to get :
Time_A : SUM of a column (total_time_taken) WHERE column (is_radiant)=1
Time_B : SUM of the same column (total_time_taken) WHERE column (is_radiant)=0
Time_AB : SUM of the column (total_time_taken) WHERE column (is_radiant)=0 OR (is_radiant)=1
SELECT
SUM(`matchpickban_radiant`.`total_time_taken`) AS `draft_time_radiant`,
SUM(`matchpickban_dire`.`total_time_taken`) AS `draft_time_radiant`
FROM
(`matchpickban` AS `matchpickban_radiant`
JOIN `matchpickban` AS `matchpickban_dire` ON ((`matchpickban_dire`.`idmatchpickban` = `matchpickban_radiant`.`idmatchpickban`)))
WHERE
`matchpickban_radiant`.`is_radiant` = 1
AND `matchpickban_dire`.`is_radiant` = 0
Actually I can run this request without syntax error but the result is NULL cause no data can be equal to 0 AND equal to 1 in the same time, obviously...
Also, I don't know if it's possible to make a JOIN the table to itself as I did (matchpickban JOIN matchpickban).
If syntax is correct I need to place my WHERE CONDITION away but don't know how, is it possible to replace it with 2 IF statement (IF is_radiant=0 SUM(...))
Thx for reading and helping me about this issue I got !
If you need more info about table or request I will give you all you need !
No need for a self-join or complex logic, you can just use conditional aggregation, which consists in using conditional expression within aggregate functions.
In MySQL, you could go:
select
sum(is_radiant * total_time_taken) time_a,
sum((1 - is_radiant) * total_time_taken) time_b,
sum(total_time_taken) time_ab
from matchpickban
where is_radiant in (0, 1)
This works because is_radiant is made of 0/1 values only - so this simplifies the logic. A more canonical way to phrase the conditional sums would be:
sum(case when is_radiant = 1 then total_time_taken else 0 end) time_a,
sum(case when is_radiant = 0 then total_time_taken else 0 end) time_b,

SQL Inner Join w/ sub query to return difference w/criteria

What am I doing:
I'm attempting to take two tables, one with 2016 data and one with 2015 data, and subtract the cells in each column to display only the differences greater than or equal to 10,000, rounded to the nearest 100th place, in a new table.
The Issue:
I am able to get the new table to pop up with the correct amounts displayed for the subtraction part only. I'm not able to add any additional criteria to filter the results to display the >= 10000 or the rounding to the 100th spot.
After research it looks like my JOIN needs a subquery to display what i would like, but I've been messing around with it for hours now and I can't seem to get it to display anything when I add a sub. Any assistance would be great. Here is what I have that works without the >= 10000 and rounding:
SELECT
`prioryeardata`.location,
`currentdata`.`2010` - `prioryeardata`.`2010` AS '2010_Difference',
`currentdata`.`2011` - `prioryeardata`.`2011` AS '2011_Difference',
`currentdata`.`2012` - `prioryeardata`.`2012` AS '2012_Difference',
`currentdata`.`2013` - `prioryeardata`.`2013` AS '2013_Difference',
`currentdata`.`2014` - `prioryeardata`.`2014` AS '2014_Difference',
`currentdata`.`2015` - `prioryeardata`.`2015` AS '2015_Difference'
FROM `prioryeardata`
JOIN `currentdata`
ON `prioryeardata`.location = `currentdata`.location;
Have a look at the below query it may help (using sql-server)
select location,Round([2010_Difference],3).[2010_Difference],Round([2011_Difference],3)[2011_Difference]
,Round([2012_Difference],3)[2012_Difference],Round([2013_Difference],3)[2013_Difference]
,Round([2014_Difference],3)[2014_Difference],Round([2015_Difference],3)[2015_Difference] from
( SELECT
prioryeardata.location,
currentdata.year2010 - prioryeardata.year2010 AS [2010_Difference],
currentdata.year2011 - prioryeardata.year2011 AS [2011_Difference],
currentdata.year2012 - prioryeardata.year2012 AS [2012_Difference],
currentdata.year2013 - prioryeardata.year2013 AS [2013_Difference],
currentdata.year2014 - prioryeardata.year2014 AS [2014_Difference],
currentdata.year2015 - prioryeardata.year2015 AS [2015_Difference]
FROM prioryeardata
JOIN currentdata
ON prioryeardata.location = currentdata.location
) t where t.[2015_Difference]>=10000 --or .......
Edit
select location,Round([2010_Difference],3).[2010_Difference],Round([2011_Difference],3)[2011_Difference]
,Round([2012_Difference],3)[2012_Difference],Round([2013_Difference],3)[2013_Difference]
,Round([2014_Difference],3)[2014_Difference],Round([2015_Difference],3)[2015_Difference]
from
(select t.location
,case when [2010_Difference]>10000 then [2010_Difference] Else 0 End as [2010_Difference]
,case when [2011_Difference]>10000 then [2011_Difference] Else 0 End as [2011_Difference]
,case when [2012_Difference]>10000 then [2012_Difference] Else 0 End as [2012_Difference]
,case when [2013_Difference]>10000 then [2013_Difference] Else 0 End as [2013_Difference]
,case when [2014_Difference]>10000 then [2014_Difference] Else 0 End as [2014_Difference]
,case when [2015_Difference]>10000 then [2015_Difference] Else 0 End as [2015_Difference]
from
( SELECT
prioryeardata.location,
currentdata.year2010 - prioryeardata.year2010 AS [2010_Difference],
currentdata.year2011 - prioryeardata.year2011 AS [2011_Difference],
currentdata.year2012 - prioryeardata.year2012 AS [2012_Difference],
currentdata.year2013 - prioryeardata.year2013 AS [2013_Difference],
currentdata.year2014 - prioryeardata.year2014 AS [2014_Difference],
currentdata.year2015 - prioryeardata.year2015 AS [2015_Difference]
FROM prioryeardata
JOIN currentdata
ON prioryeardata.location = currentdata.location
) t where t.[2010_Difference]>=10000 or t.[2011_Difference]>=10000 or t.[2012_Difference]>=10000
or t.[2013_Difference]>=10000 or t.[2014_Difference]>=10000 or t.[2015_Difference]>=10000
)tt
If you want cells to show blank instead of a value, use a pattern like this under your SELECT:
CASE WHEN `currentdata`.`2015` - `prioryeardata`.`2015` >= 10000 THEN
`currentdata`.`2015` - `prioryeardata`.`2015` ELSE NULL END AS '2015_Difference'
strictly speaking the else null is unnecessary, I just put it in for your learning benefit
If you want to only show rows where the difference is greater than ten k put this in on the end of your query:
WHERE
`currentdata`.`2015` - `prioryeardata`.`2015` >= 10000
If you want to only show rows where all years were over ten k, add similar filters for other years separated by AND. If you want to show rows where any year was over ten k, separate them with OR
To round values to the nearest 100 (i.e. 12345 becomes 12300) I believe you would use
ROUND(12345,-2)

having trouble understanding CASE WHEN statement in SQL

SELECT state,
COUNT(CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END) as count_high_elevation_aiports
FROM airports
GROUP BY state;
In the above statement, what is the THEN 1 and what does '1' signify?
How does that value '1' after THEN affect the output?
First, note that these three expressions are equivent:
CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END
IF(elevation >= 2000, 1, NULL)
((elevation >= 2000) OR NULL)
If elevation >= 2000, the expression evaluates as "1", otherwise the expression evaluates as NULL.
"1" is conventionally used as boolean true, and you could substitute the MySQL literal TRUE in the above expressions with equivalent results... but that isn't what the "1" is for, here.
When used with COUNT(), in cases like this, the only real significance of 1 is that it is not NULL.
This is important, because -- contrary to popular belief -- COUNT() does not count rows. It counts values.
What's the difference? NULL is not technically a value. Instead, it is a marker that signifies the absence of a value, thus COUNT(expr) only counts rows where expr is not null.
By using an expression like the one here, you're asking the server to count the rows with elevation => 2000, and you do this by giving COUNT() a NULL for rows you want not to be counted... and a non-null value for rows you do.
Aggregate (GROUP BY) functions operate on values -- and NULL, again, is not a value in this sense.
Another aggregate function that makes this rationale perhaps even more clear is AVG(). If you had 3 rows... with values 5, NULL, and 10... what's the average? If you said 7.5, that's correct: the average of these 3 rows is (5 + 10) ÷ 2 = 5 because the 3 rows have only two values. NULL is not 0, otherwise the average would be (5 + 0 + 10) ÷ 3 = 5, which it is not.
So, that's how and why this works.
How does that value '1' after THEN affect the output?
It really doesn't. You could just as easily have said COUNT(CASE WHEN elevation >= 2000 THEN 'cat videos are funny' ELSE NULL END) because, just like the literal 1, the literal string 'cat videos are funny' is also not null, and non-null values -- anything not null -- is what count will count.
A novice might try to accomplish this task with COUNT(elevation >= 2000), but that gives the wrong answer, because the 0 (false) for rows where elevation is < 2000 is not null, so these rows would still be counted.
You may then ask, "why not just use COUNT(*) ... WHERE elevation >= 2000?" Good question. The reasons vary, but if you GROUP BY state and there are states with no rows matching WHERE, those states would be entirely eliminated from the results, which is often not what you want. This query includes them, with a count of zero.
Note that ((elevation >= 2000) OR NULL), the third example expression at the top, doesn't actually need the parentheses. I included them because this form is not necessarilly intuitive at first glance. The natural precedence of operations will cause this to be evaluated correctly if written simply elevation >= 2000 OR NULL. This expression is equivalent to the other two because elevation >= 2000 first evaluates to 1 if true, 0 if false, or NULL if elevation is null. Then the lower-precedence OR is evaluated, and you get one of these: 1 OR NULL => 1 ... 0 OR NULL => NULL ... NULL OR NULL => NULL... and you may actually be awarded a SQL wizard badge by the elders of the Internet at the point when writing queries with COUNT(elevation >= 2000 OR NULL) comes naturally to you.
Query will simply return 1 if elevation is > or = 2000, else it will return NULL (this is use full for boolean representation of field because NULL represents 0),
Now returned value will be set into count_high_elevation_airports.