Replacing UNION ALL to increase speed when summarizing multiple columns - mysql

I have a dataset with columns V1 V2 V3 V4 ... V200 where I would like to return a table with the column name and how many NULL, zero and below zero values each column has. My Current code looks like:
SELECT 'V1' AS column_name, SUM(CASE WHEN V1 IS NULL THEN 1 ELSE 0 END) AS n_null, SUM(...V1 = 0) AS n_zero, SUM(... V1 < 0) AS n_below_zero UNION ALL
...
SELECT 'V200' AS column_name, SUM(CASE WHEN V200 IS NULL THEN 1 ELSE 0 END) AS n_null, SUM(...V200 = 0) AS n_zero, SUM(... V200 < 0) AS n_below_zero
Is there a faster way than this? I feel that 200 UNION ALL is not the fastest way
I am running on Databricks, so Spark SQL.

It should be way faster to have one full table scan and aggregate all rows to one row containing all counts. Here is how to do that:
SELECT
SUM(CASE WHEN V1 IS NULL THEN 1 ELSE 0 END) AS v1_null_cnt,
SUM(CASE WHEN V1 = 0 THEN 1 ELSE 0 END) AS v1_zero_cnt,
SUM(CASE WHEN V1 < 0 THEN 1 ELSE 0 END) AS v1_nega_cnt,
SUM(CASE WHEN V2 IS NULL THEN 1 ELSE 0 END) AS v2_null_cnt,
SUM(CASE WHEN V2 = 0 THEN 1 ELSE 0 END) AS v2_zero_cnt,
SUM(CASE WHEN V2 < 0 THEN 1 ELSE 0 END) AS v2_nega_cnt,
...
SUM(CASE WHEN V200 < 0 THEN 1 ELSE 0 END) AS v200_nega_cnt
FROM mytable;
Once you have this result row, you can unpivot it to get one row per table column, if you prefer this.

Related

Why are my various CASE WHEN functions returning the same values?

Im trying to write a query that returns a count depending on the value of a feedback field that ranges from 0-5 (0 meaning that it was not rated).
I want:
Count of all rows ( anything rated 1 or greater)
Count of all rows rated as 1 (anything = 1)
And all rows rated as 1 and also is the first iteration of a given task (anything rated =1 and iteration = 0)
I have written this query but I am getting the same value for all counts:
select
DATE_FORMAT(created_at,'%M') as Month,
COUNT(CASE WHEN rate > 0 THEN 1 ELSE 0 END) AS total,
COUNT(CASE WHEN rate = 1 THEN 1 ELSE 0 END) AS Rated_1,
COUNT(CASE WHEN client_feedback = 1 AND index = 0 THEN 1 ELSE 0 END) AS first_iteration_rated_1
from tablexxx
where created_at between date('2022-04-01') and date('2022-10-01')
GROUP BY Month
Try to use SUM() instead of COUNT().
Count() will count up regardless of the value being 0 or 1.
you can have two approaches:
method 1: use NULL in else part of the CASE
select
DATE_FORMAT(created_at,'%M') as Month,
COUNT(CASE WHEN rate > 0 THEN 1 ELSE null END) AS total,
COUNT(CASE WHEN rate = 1 THEN 1 ELSE null END) AS Rated_1,
COUNT(CASE WHEN client_feedback = 1 AND index = 0 THEN 1 ELSE null END) AS first_iteration_rated_1
from tablexxx
where created_at between date('2022-04-01') and date('2022-10-01')
GROUP BY Month
method 2: use sum instead of count
select
DATE_FORMAT(created_at,'%M') as Month,
SUM(CASE WHEN rate > 0 THEN 1 ELSE 0 END) AS total,
SUM(CASE WHEN rate = 1 THEN 1 ELSE 0 END) AS Rated_1,
SUM(CASE WHEN client_feedback = 1 AND index = 0 THEN 1 ELSE 0 END) AS first_iteration_rated_1
from tablexxx
where created_at between date('2022-04-01') and date('2022-10-01')
GROUP BY Month

Can I get average of a column in mySQL DB based upon value of other column in one query?

I have a table of phone call activity for a client. In the table, I have one column for the length of the call (in seconds), and another column for "first time call" (true / false). I was hoping to find a way to get the average call length of first time calls separated from the average time of non first time calls? Is this doable in a singe mySQL query?
SELECT location,
count(*) AS total,
sum(case when firstCall = 'true' then 1 else 0 end) AS firstCall,
sum(case when answered = 'Yes' then 1 else 0 end) AS answered,
sum(case when tags like '%Lead%' then 1 else 0 end) as lead,
sum(case when tags like '%arbage%' then 1 else 0 end) as garbage,
avg(case when duration........firstTime = True???)
FROM staging
GROUP BY location
SELECT location,
count(*) AS total,
sum(case when firstCall = 'true' then 1 else 0 end) AS firstCall,
sum(case when answered = 'Yes' then 1 else 0 end) AS answered,
sum(case when tags like '%Lead%' then 1 else 0 end) as lead,
sum(case when tags like '%arbage%' then 1 else 0 end) as garbage,
sum(case when firstCall='true' then duration else 0 end)/sum(case when firstCall = 'true' then 1 else 0 end) as first_call_true_average,
sum(case when firstCall='false' then duration else 0 end)/sum(case when firstCall = 'false' then 1 else 0 end) as first_call_false_average
FROM staging
GROUP BY location
I would phrase this as:
select
location,
count(*) as total,
sum(firstcall = 'true' ) as cnt_firstcall,
sum(answered = 'Yes' ) as cnt_answered,
sum(tags like '%Lead%' ) as cnt_lead,
sum(tags like '%arbage%') as cnt_garbage,
avg(case when firstcall = 'true' then duration end) as avg_first_call,_duration
avg(case when firstcall = 'false' then duration end) as avg_non_first_call_duration
from staging
group by location
Rationale:
MySQL interpret true/false conditions as 1/0 values in numeric context, which greatly shortens the conditional sum()s
avg() ignores null values, so a simple case expression is sufficient to compute the conditional averages

Multiple Count with Multiple column

I am new in sql. I want to count something like:
Select count(*) from table where col1= x and col2=x and Col3=x.
I need to count the same value in all different column.
Any help will be appreciated.
You can use conditional aggregation :
Select sum(case when col1='x' then 1 else 0 end) as count_col1,
sum(case when col2='x' then 1 else 0 end) as count_col2,
sum(case when col3='x' then 1 else 0 end) as count_col3
from tab;
If you want to have sum of these count values, consider the above query as an inner and use the following :
Select q.*,
q.count_col1 + q.count_col2 + q.count_col3 whole_sum
from
(
Select sum(case when col1='x' then 1 else 0 end) as count_col1,
sum(case when col2='x' then 1 else 0 end) as count_col2,
sum(case when col3='x' then 1 else 0 end) as count_col3
from tab
) q
Rextester Demo

UPDATE a field with sum of other fields with conditions MYSQL

My Table structure is as follows
counter1|counter1_status|counter2|counter2_status|counter3|counter3_status|valid_counter
-----------------------------------------------------------------------------------------
5 0 6 1 3 1 XXXX
I want a single update query to update valid_counter as 6 + 3 = 9
As counter1_status = 0, counter1 should not be added
Tried following query, but it gives error.
UPDATE counter_table
SET valid_contact =
SUM((CASE WHEN counter1_status=1 THEN counter1 ELSE 0 END)
+ (CASE WHEN counter2_status=1 THEN counter2 ELSE 0 END)
+ (CASE WHEN counter3_status=1 THEN counter3 ELSE 0 END))
I can get the sum by using SELECT query without any error, but Update query failed.
if you want to conditionally store sum of (counter1,counter2,counter3) in the valid_contace field, you may use:
UPDATE counter_table
SET valid_contact =
(CASE WHEN counter1_status=1 THEN counter1 ELSE 0 END)
+ (CASE WHEN counter2_status=1 THEN counter2 ELSE 0 END)
+ (CASE WHEN counter3_status=1 THEN counter3 ELSE 0 END)
where id=5
UPDATE counter_table AS c1 JOIN
(SELECT id, SUM((CASE WHEN counter1_status=1 THEN counter1 ELSE 0 END)
+ (CASE WHEN counter2_status=1 THEN counter2 ELSE 0 END)
+ (CASE WHEN counter3_status=1 THEN counter3 ELSE 0 END)) AS m
FROM counter_table) AS c2
USING (id)
SET c1.`valid_counter` = c2.m;
Sample fiddle
Finally got the solution for error :
SUM should be removed from the query.
It should be SUM(a,b,c)
or a+b+c.
UPDATE counter_table
SET valid_contact =
((CASE WHEN counter1_status=1 THEN counter1 ELSE 0 END)
+ (CASE WHEN counter2_status=1 THEN counter2 ELSE 0 END)
+ (CASE WHEN counter3_status=1 THEN counter3 ELSE 0 END))

Update certain columns based on date. Best way to approach this issue

I have a table I need to update. In this table there is a column for each month. If someone's start date is 2/13/2015 I need to update the February column with some data from a calculation.
So if Start_Date is 2/17/2015 then the value for the February column called FTEFeb needs to be populated. If the Start_Date is 3/09/2015 then the FTEMar column needs to be updated etc...
I was thinking a CASE statment would work but the SET column would be different based on the value in Start Date.
I've looked online for anything similar and nothing came up.
Thanks in advance!
You can use a case statement.
For your query, instead of preforming a CASE on the month, just write a case to update each column, and if the column is not the month you want you can set it to 0, or whatever your business rule is. Try this:
UPDATE myTable
SET
FTEJan = (CASE WHEN MONTH(startDate) = 1 THEN 1 ELSE 0 END),
FTEFeb = (CASE WHEN MONTH(startDate) = 2 THEN 1 ELSE 0 END),
FTEMar = (CASE WHEN MONTH(startDate) = 3 THEN 1 ELSE 0 END),
FTEApr = (CASE WHEN MONTH(startDate) = 4 THEN 1 ELSE 0 END),
FTEMay = (CASE WHEN MONTH(startDate) = 5 THEN 1 ELSE 0 END),
FTEJun = (CASE WHEN MONTH(startDate) = 6 THEN 1 ELSE 0 END),
FTEJul = (CASE WHEN MONTH(startDate) = 7 THEN 1 ELSE 0 END),
FTEAug = (CASE WHEN MONTH(startDate) = 8 THEN 1 ELSE 0 END),
FTESep = (CASE WHEN MONTH(startDate) = 9 THEN 1 ELSE 0 END),
FTEOct = (CASE WHEN MONTH(startDate) = 10 THEN 1 ELSE 0 END),
FTENov = (CASE WHEN MONTH(startDate) = 11 THEN 1 ELSE 0 END),
FTEDec = (CASE WHEN MONTH(startDate) = 12 THEN 1 ELSE 0 END);
Here is an SQL Fiddle example.