How to use GROUP BY and COUNT in SQL in this scenario - mysql

I am working on a voting system, there is a survey form, I record the data like this
survey
--------------
id
q1
q2
q3
where q1 means question 1, the possible value is 1, 2 , 3, (means the user select first , second ..choice) while the q2 means question 2 , the possible value is 1, 2, 3, 4, 5 ..etc
And I would like to have a query in this case, for each question, count the total number of each choice, so , the query result should be like this e.g.
q1 , first choice , 10
q1 , second, 50
q1 , third , 20...and so on
I think of this way but are there any other elegant approach ? Thanks
(SELECT COUNT(1) as q1_third FROM survey WHERE q1 = 3)...

One way to do it is with case statements within a COUNT or SUM() like you see below.
SELECT
SUM(CASE WHEN q1 = 1 THEN 1 ELSE 0 END) as q1_first,
SUM(CASE WHEN q1 = 2 THEN 1 ELSE 0 END) as q1_second,
SUM(CASE WHEN q1 = 3 THEN 1 ELSE 0 END) as q1_third
FROM survey

Maybe this will be more elegant:
select 'q1', q1, count(*) as n from survey group by q1
union all
select 'q2', q2, count(*) as n from survey group by q2
union all
select 'q3', q3, count(*) as n from survey group by q3
order by 1, 2

Related

How to parse <first_value> aggregate in a group by statement [SNOWFLAKE] SQL

How do you rewrite this code correctly in Snowflake?
select account_code, date,
sum(box_revenue_recognition_amount) as box_revenue_recognition_amount
, sum(case when box_flg = 1 then box_sku_quantity end) as box_sku_quantity
, sum(box_revenue_recognition_refund_amount) as box_revenue_recognition_refund_amount
, sum(box_discount_amount) as box_discount_amount
, sum(box_shipping_amount) as box_shipping_amount
, sum(box_cogs) as box_cogs
, max(invoice_number) as invoice_number
, max(order_number) as order_number
, min(box_refund_date) as box_refund_date
, first (case when order_season_rank = 1 then box_type end) as box_type
, first (case when order_season_rank = 1 then box_order_season end) as box_order_season
, first (case when order_season_rank = 1 then box_product_name end) as box_product_name
, first (case when order_season_rank = 1 then box_coupon_code end) as box_coupon_code
, first (case when order_season_rank = 1 then revenue_recognition_reason end) as revenue_recognition_reason
from dedupe_sub_user_day
group by account_code, date
I have tried to apply window rule has explained in first_value Snowflake documentation to no avail with the SQLCompilation Error: ... is not a valid group by expression
select account_code, date,
first_value(case when order_season_rank = 1 then box_type end) over (order by box_type ) as box_type
first_value(case when order_season_rank = 1 then box_order_season end) over (order by box_order_season ) as box_order_season,
first_value(case when order_season_rank = 1 then box_product_name end) over (order by box_product_name ) as box_product_name,
first_value(case when order_season_rank = 1 then box_coupon_code end) over (order by box_coupon_code ) as box_coupon_code,
first_value(case when order_season_rank = 1 then revenue_recognition_reason end) over (order by revenue_recognition_reason ) as revenue_recognition_reason
, sum(box_revenue_recognition_amount) as box_revenue_recognition_amount
, sum(case when box_flg = 1 then box_sku_quantity end) as box_sku_quantity
, sum(box_revenue_recognition_refund_amount) as box_revenue_recognition_refund_amount
, sum(box_discount_amount) as box_discount_amount
, sum(box_shipping_amount) as box_shipping_amount
, sum(box_cogs) as box_cogs
, max(invoice_number) as invoice_number
, max(order_number) as order_number
, min(box_refund_date) as box_refund_date
from dedupe_sub_user_day
group by 1,2
First_value is not an aggregate function. But an window function, thus you get an error when you use it in relation to a GROUP BY. If you want to use it with a group up put an ANY_VALUE around it.
here is some data I will use below in a CTE:
with data(id, seq, val) as (
select * from values
(1, 1, 10),
(1, 2, 11),
(1, 3, 12),
(1, 4, 13),
(2, 1, 20),
(2, 2, 21),
(2, 3, 22)
)
So to show FIRST_VALUE is a window function we can just use it
select *
,first_value(val)over(partition by id order by seq) as first_val
from data
ID
SEQ
VAL
FIRST_VAL
1
1
10
10
1
2
11
10
1
3
12
10
1
4
13
10
2
1
20
20
2
2
21
20
2
3
22
20
So if we GROUP BY id, to avoid an error we have to wrap the FIRST_VALUE by an aggregate value, as given the are all equal, ANY_VALUE is a good pick, and it seems it needs to be in another layer of SQL:
select id
,count(*) as count
,any_value(first_val) as first_val
from (
select *
,first_value(val)over(partition by id order by seq) as first_val
from data
)
group by 1
order by 1;
ID |COUNT |FIRST_VAL
1 |4 |10
2 |3 |20
now MAX can be fun to use where used in relation to ROW_NUMBER() to pick the best value:
select id
,count(*) as count
,max(first_val) as first_val
from (
select *
,row_number() over (partition by id order by seq) as rn
,iff(rn=1, val, null) as first_val
from data
)
group by 1
order by 1;
but this is almost more complex than the ANY_VALUE solution, but I feel the performance would be better, but if they have the same magnitude of performance, I would always choose readable to you and your team, over a smaller performance difference.
With the way you've written your case statement, it leads me to believe that there is only one row with order_season_rank = 1 when grouping by account_code and date.
If that is true, then you can use several of Snowflake's aggregate functions and you will get what you want. Rather than trying to get the first value, you could use min, max, any_value, mode (or really any aggregate function that will ignore nulls) to return the only non-null value in the aggregation.
first() this link suggests first is only supported by MS ACCESS however you've tagged the question with MYSQL, Snowflake. Could you confirm the DBMS's you are using?
by moving the first_value() function outside the aggregation it seems to work fine

SQL - PIVOT for one column and add new column

I am fairly new to SQL. I have got this input table as
TypeId EventDescription FeedHeader FeedHeaderValue
---------------------------------------------------------
166 Financial AllocRule 130
166 Financial DealID 0
175 Partner Capital InvestorID OV_P1
175 Investment Querter Q1
175 Investment DealID offset
175 Investment InvestorID OV_P2
I need an output as follows
Financial value Partner Capital value Investment value
-------------------------------------------------------------------------------
AllocRule 130 InvestorID OV_P1 Querter Q1
DealID 0 DealID offset
InvestorID OV_P2
Not sure if that is even possible. I tried using pivot but its not giving desired output
select
[Financial] as FinancialHeader
, [Partner Capital] as PartnerCapitalHeader
, [Investment] as Investmentheader
from
(
select EventDescription, FeedHeader
from [Feeder]
) x
pivot
(
MAX(FeedHeader)
for EventDescription in([Financial], [Partner Capital], [Investment])
)p
Another approach i tried
Select
Min(Case [EventDescription] When 'Financial' Then [FeedHeader] End)
Financial,
Min(Case [EventDescription] When 'Financial' Then [FeedHeaderValue] End)
value,
Min(Case [EventDescription] When 'Partner Capital' Then [FeedHeader]
End) PartnerCapital,
Min(Case [EventDescription] When 'Partner Capital' Then
[FeedHeaderValue] End) value,
Min(Case [EventDescription] When 'Investment' Then [FeedHeader] End)
Investment,
Min(Case [EventDescription] When 'Investment' Then [FeedHeaderValue] End)
value
From [Feeder]
Group By EventDescription
Is there a another way to do it?
I was curious and did some research with PIVOT on SO and google and finally my luck clicked (at least what I think now)
The key point here is that you create new EventDescription values by appending 1 or 2 to the end depending on how many columns we want to PIVOT.
Without doing this, the pivot query won't work properly and would lead to error as per my experience with this task.
select max([Financial]) as FinancialHeader
, max([Financial1]) as FinancialHeaderValue
, max([Partner Capital]) as PartnerCapitalHeader
, max([Partner Capital1]) as PartnerCapitalHeaderValue
, max([Investment]) as InvestmentHeader
, max([Investment1]) as InvestmentHeaderValue
from
(select EventDescription,
EventDescription+'1' as EventDescription1,
FeedHeader,
FeedHeaderValue,
row_number() over (partition by EventDescription order by EventDescription) rn
from [testtable]
) x
pivot
(
MAX(FeedHeader)
for EventDescription in([Financial], [Partner Capital], [Investment])
) p
pivot
(
MAX(FeedHeaderValue)
for EventDescription1 in([Financial1], [Partner Capital1] , [Investment1] )
) v
group by [RN]
DEMO: db<>fiddle

Get all users who placed requests in last three years

I would like to get all the list of users who placed requests in the last three years.
Requests( request_id, request_day, user_id, userprofile_id )
Am I doing it right?
SELECT user_id
FROM requests
WHERE EXTRACT(YEAR FROM request_day) IN ( 2014,2015,2016 )
GROUP BY user_id
HAVING COUNT(*) = 3;
Your current approach is almost correct. All you need to do is to count by the distinct number of years:
SELECT user_id
FROM requests
WHERE YEAR(request_day) IN (2014, 2015, 2016)
GROUP BY user_id
HAVING COUNT(DISTINCT YEAR(request_day)) = 3;
If the DISTINCT count of years is 3, then it implies that a user has all three of the years in your WHERE IN clause.
Note that another way to do this would be conditional aggregation:
SELECT user_id
FROM requests
GROUP BY user_id
HAVING SUM(CASE WHEN YEAR(request_day) = 2014 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN YEAR(request_day) = 2015 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN YEAR(request_day) = 2016 THEN 1 ELSE 0 END) > 0
This approach would scale better if your query were to get more complex.

Nested SQL Query for count of months

Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
I am new to SQL and would like to know how to approach writing a query for this question.
Lets say we have these fields:
date_created date_unsubscribed subscriberid
How to write a SQL query that lists, by month, how many people subscribed to the list, unsubscribed from the list, and how many net subscribers there were (new subscribers minus unsubscribers).
All in a single query...
Here's one option using conditional aggregation and union all:
select month(dt),
count(case when subscribe = 1 then 1 end) subscribecount,
count(case when subscribe = -1 then 1 end) unsubscribecountt,
sum(subscribe) overallcount
from (
select date_created as dt, 1 as subscribe
from yourtable
union all
select date_unsubscribed, -1
from yourtable
where date_unsubscribed is not null
) t
group by month(dt)
The subquery creates a list of dates with a flag for subscribe or unsubscribe. Then you can use count with case to determine the appropriate number of subscribers/unsubscribers.
SQL Fiddle Demo
You could write a sum(case) (a sum with conditions) to aggregate - assuming the date_created column is never null. For instance:
ORACLE:
SELECT
TO_CHAR(DATE_CREATED,'MM-YYYY') CREATE_MONTH
,SUM(CASE WHEN date_unsubscribed is not null then 1 else 0 end) unsubscribed
,SUM(CASE WHEN date_unsubscribed is null then 1 else 0 end) subscribed
,COUNT(SUBSCRIBER_ID)
FROM
--YOURTABLENAME
--WHERE
--WHATEVER OTHER CONDITIONS YOU HAVE APPLY
GROUP BY TO_CHAR(DATE_CREATED,'MM-YYYY')
MYSQL:
SELECT
DATE_FORMAT(DATE_CREATED,'%m-%Y') CREATE_MONTH
,SUM(CASE WHEN date_unsubscribed is not null then 1 else 0 end) unsubscribed
,SUM(CASE WHEN date_unsubscribed is null then 1 else 0 end) subscribed
,COUNT(SUBSCRIBER_ID)
FROM
--YOURTABLENAME
--WHERE
--WHATEVER OTHER CONDITIONS YOU HAVE APPLY
GROUP BY DATE_FORMAT(DATE_CREATED,'%m-%Y')
Oracle solution
Here is a query using the PIVOT operator, which was created exactly for this kind of work, and ROLLUP to get the net number. This is just for illustration; I assume the year is a user or application input (bind variable :year, set to 2015 for the output), and I show the summary for January through June.
with
test_data ( date_created, date_unsubscribed, subscriber_id ) as (
select date '2015-05-10', null , 330053448 from dual union all
select date '2015-04-28', null , 330053457 from dual union all
select date '2015-05-10', null , 330053466 from dual union all
select date '2015-04-28', null , 220053475 from dual union all
select date '2015-04-28', date '2015-05-10', 330053484 from dual
),
prep ( type, val, mth ) as (
select 'Subscribed' , 1, extract(month from date_created) from test_data
where extract(year from date_created) = :year
union all
select 'Unsubscribed', -1, extract(month from date_unsubscribed) from test_data
where extract(year from date_unsubscribed) = :year
)
select nvl(type, 'Net Subscr') as description,
nvl(sum(jan), 0) as jan, nvl(sum(feb), 0) as feb, nvl(sum(mar), 0) as mar,
nvl(sum(apr), 0) as apr, nvl(sum(may), 0) as may, nvl(sum(jun), 0) as jun
from prep
pivot (
sum(val)
for mth in (1 as jan, 2 as feb, 3 as mar, 4 as apr, 5 as may, 6 as jun)
)
group by rollup(type)
order by case type when 'Subscribed' then 1 when 'Unsubscribed' then 2 else 3 end
;
DESCRIPTION JAN FEB MAR APR MAY JUN
------------ ---------- ---------- ---------- ---------- ---------- ----------
Subscribed 0 0 0 3 2 0
Unsubscribed 0 0 0 0 -1 0
Net Subscr 0 0 0 3 1 0
3 rows selected.

Selecting in GroupBy

I have a following query
select YEAR(t1.date) as 'year'
, MONTHNAME(t1.date) as 'month'
, COUNT(*) as total
, if (t1.Sex = 1, 'male','female') as sex
from outpatients t1
where YEAR(t1.date) = 2015
group by MONTH(t1.date), t1.Sex
order by t1.date, t1.Sex
So the output will look like this:
I would like to write a query to see female and male as the columns.
So the output will look like.
I can't find the way to group the data
You can use SUM function to sum the male and female
select YEAR(t1.date) as year, MONTHNAME(t1.date) as month, COUNT(*) as total
, SUM(CASE WHEN sex = 'male' THEN 1 ELSE 0 END) AS male
, SUM(CASE WHEN sex = 'female' THEN 1 ELSE 0 END) AS female
from outpatients t1
where YEAR(t1.date) = 2015
group by MONTH(t1.date), t1.Sex
order by t1.date, t1.Sex