Count Distinct from SQL Aggregation - mysql

I have a table that looks like this:
store_id cust_id amount indicator
1 1000 2.05 A
1 1000 3.10 A
1 2000 3.10 A
2 1000 5.10 B
2 2000 6.00 B
2 1000 1.05 A
What I'm trying to do is find the percent of sales with indicators A, B for each store by only looking at unique customer IDs (i.e., the two sales to customer 1000 at store 1 would only count once). Something like this:
store_id pct_sales_A pct_sales_B pct_sales_AB
1 1.0 0.00 0.00
2 0.0 0.50 0.50
I know that I can use a subquery to find the counts of each transaction type, but I'm having trouble only counting the distinct customer IDs. Here's an (incorrect) approach for the pct_sales_A column:
SELECT
store_id,
COUNT(DISTINCT(CASE WHEN txns_A>0 AND txns_B=0 THEN cust_ID ELSE NULL))/COUNT(*) AS pct_sales_A --this is wrong
FROM (SELECT store_id, cust_id,
COUNT(CASE WHEN indicator='A' THEN amount ELSE 0 END) as txns_A,
COUNT(CASE WHEN indicator='B' THEN amount ELSE 0 END) as txns_B
FROM t1
GROUP BY store_id, cust_id
)
GROUP BY store_id;

You can use conditional aggregation with COUNT(DISTINCT):
SELECT store_id,
COUNT(DISTINCT CASE WHEN indicator = 'A' THEN cust_id END) * 1.0 / COUNT(DISTINCT cust_id) as ratio_a,
COUNT(DISTINCT CASE WHEN indicator = 'B' THEN cust_id END) * 1.0 / COUNT(DISTINCT cust_id) as ratio_a,
FROM t1
GROUP BY store_id;
Based on your comment, you need two levels of aggregation:
SELECT store_id,
AVG(has_a) as ratio_a,
AVG(has_b) as ratio_b,
AVG(has_a * has_b) as ratio_ab
FROM (SELECT store_id, cust_id,
MAX(CASE WHEN indicator = 'A' THEN 1.0 ELSE 0 END) as has_a,
MAX(CASE WHEN indicator = 'B' THEN 1.0 ELSE 0 END) as has_b
FROM t1
GROUP BY store_id, cust_id
) sc
GROUP BY store_id;

I think you want two levels of conditional aggregation:
select
store_id,
avg(has_a = 1 and has_b = 0) pct_sales_a,
avg(has_a = 0 and has_b = 1) pct_sales_b,
avg(has_a + has_b = 2) pct_sales_ab
from (
select
store_id,
cust_id,
max(indicator = 'A') has_a,
max(indicator = 'B') has_b
from t1
group by store_id, cust_id
) t
group by store_id
Demo on DB Fiddle:
store_id | pct_sales_a | pct_sales_b | pct_sales_ab
-------: | ----------: | ----------: | -----------:
1 | 1.0000 | 0.0000 | 0.0000
2 | 0.0000 | 0.5000 | 0.5000

Related

How to do an arithmetic operation with aliased column in SQL

I have a databse table as like below:
id received_by sent_by amount product_id
1 1 2 10 1
2 1 3 12 1
3 2 1 5 1
4 3 1 8 2
Here, received_by and sent_by are two user ID those who are receiving and sending the product respectively. I want to calculate the total amount of each product of a single user by subtracting the sent amount from received amount.
My current query looks like below:
select
product_id,
(received - sent) as quantity,
case(when received_by = 1 then amount end) as received,
case(when sent_by = 1 then amount end) as sent
group by
product_id;
Here I get an error that Unknown column 'received' in 'field list'.
How can I calculate each users inventory/stock?
You can't use the calculated columns in the SELECT list.
Also you need the aggregate function SUM().
One way to do it is with a subquery:
select *, (received - sent) as quantity
from (
select product_id,
sum(case when received_by = 1 then amount else 0 end) as received,
sum(case when sent_by = 1 then amount else 0 end) as sent
from tablename
where 1 in (received_by, sent_by)
group by product_id
) t
Or:
select product_id,
sum(case when received_by = 1 then amount else -amount end) as quantity,
sum(case when received_by = 1 then amount else 0 end) as received,
sum(case when sent_by = 1 then amount else 0 end) as sent
from tablename
where 1 in (received_by, sent_by)
group by product_id
See the demo.
First of all you missed the from clause in your query before group by. Secondly you cannot use column aliases (received, sent) in same select statement.
create table mytable(id int, received_by int, sent_by int, amount int, product_id int);
insert into mytable values(1, 1, 2, 10, 1);
insert into mytable values(2, 1, 3, 12, 1);
insert into mytable values(3, 2, 1, 5, 1);
insert into mytable values(4, 3, 1, 8, 2);
Query:
select product_id, (coalesce(received,0)-coalesce(sent,0)) as Quantity, coalesce(received,0) received,coalesce(sent)sent
from
( select
product_id,
sum(case when received_by = 1 then amount end) as received,
sum(case when sent_by = 1 then amount end) as sent
from mytable
group by
product_id
)t;
Output:
|product_id | Quantity | received | sent|
|-----------|----------|----------|-----|
| 1 | 17 | 22 | 5|
| 2 | -8 | 0 | 8|
db<>fiddle here

Sum hours value, count and display based on hours using SQL

I have 2 tables which are Teacher and Activities.
CREATE TABLE teacher (
TeacherId INT, BranchId VARCHAR(5));
INSERT INTO teacher VALUES
("1121","A"),
("1132","A"),
("1141","A"),
("2120","B"),
("2122","B");
CREATE TABLE activities (
ID INT, TeacherID INT, Hours INT);
INSERT INTO activities VALUES
(1,1121,2),
(2,1121,1),
(3,1132,1),
(4,1141,NULL),
(5,2120,NULL),
(6,2122,NULL);
NULL indicates no activities and will be convert to 0 on output table. I want to produce a query to count total of hours and count how many activities base on teacher hours such as the following table:
+-----------+------------+------------+
| Hours | A | B |
+-----------+------------+------------+
| 0 | 1 | 2 |
| 1 | 1 | 0 |
| 2 | 0 | 0 |
| 3 | 1 | 0 |
+-----------+------------+------------+
Edited: Sorry I don't know how to elaborate accurately, but here is the fiddle i received from other member https://www.db-fiddle.com/f/mmtuZquKyUqdhPvTFN9qaF/1
Edit: Last, modification need, to sum the hours and count the hours base on branch id and teacher id as the output.
Expected output here (red text): https://drive.google.com/file/d/1wyZ_aX5hz_7I1Ncf5sXLpstYk6FT8PMg/view?usp=sharing
We can handle this via the use of a calendar table of hours joined to an aggregation subquery:
SELECT
t1.Hours,
SUM(CASE WHEN t2.BranchId = 'A' THEN t2.cnt ELSE 0 END) AS A,
SUM(CASE WHEN t2.BranchId = 'B' THEN t2.cnt ELSE 0 END) AS B
FROM (SELECT 0 AS Hours UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) t1
LEFT JOIN
(
SELECT t.BranchId, COALESCE(a.Hours, 0) AS Hours, COUNT(*) AS cnt
FROM Teacher t
LEFT JOIN Activities a ON a.TeacherId = t.TeacherId
GROUP BY t.BranchId, COALESCE(a.Hours, 0)
) t2
ON t1.Hours = t2.Hours
GROUP BY
t1.Hours
ORDER BY
t1.Hours
Demo
This is basically a JOIN and aggregation . . . but you need to start with all the hours you want:
SELECT h.Hours,
COALESCE(SUM(t.BranchId = 'A'), 0) AS A,
COALESCE(SUM(t.BranchId = 'B'), 0) AS B
FROM (SELECT 0 AS Hours UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
) h LEFT JOIN
activities a
ON h.hours = COALESCE(a.hours, 0) LEFT JOIN
teacher t
ON t.TeacherId = a.TeacherId
GROUP BY h.Hours
ORDER BY h.Hours;
Here is a db<>fiddle.

SQL counting occurrences of conditional statements using values from 2 different roles

I have a table with fields including time (UTC) and accountID.
accountID | time | ...
1 |12:00 |....
1 |12:01 |...
1 |13:00 |...
2 |14:00 |...
I need to make an sql query to return the accountID with a new field counting 'category' where 'category' can be 'a' or 'b'. If there is a row entry from the same accountID that has a positive time difference of 1 minute or less, category 'a' needs to be incremented, otherwise 'b'. The results from the above table would be
accountID| cat a count| cat b count
1 | 1 | 2
2 | 0 | 1
What approaches can I take to compare values between different rows and output occurrences of comparison outcomes?
Thanks
To compute this categories you'll need to pre-compute the findings of close rows in a "table expression". For example:
select
accountid,
sum(case when cnt > 0 then 1 else 0 end) as cat_a_count,
sum(case when cnt = 0 then 1 else 0 end) as cat_b_count
from (
select
accountid, tim,
( select count(*)
from t b
where b.accountid = t.accountid
and b.tim <> t.tim
and b.tim between t.tim and addtime(t.tim, '00:01:00')
) as cnt
from t
) x
group by accountid
Result:
accountid cat_a_count cat_b_count
--------- ----------- -----------
1 1 2
2 0 1
For reference, the data script I used is:
create table t (
accountid int,
tim time
);
insert into t (accountid, tim) values
(1, '12:00'),
(1, '12:01'),
(1, '13:00'),
(2, '14:00');
Use lag() and conditional aggregation:
select accountid,
sum(prev_time >= time - interval 1 minute) as a_count,
sum(prev_time < time - interval 1 minute or prev_time is null) as b_count
from (select t.*,
lag(time) over (partition by accountid order by time) as prev_time
from t
) t
group by accountid;

SUM values or take MAX to SUM depending on value

I have a query to select some shippingcost and I want to sum them up in a special way.
Sample Data:
supplierID | articleID | sumUP | shippingCost
10 | 100 | 1 | 20
10 | 101 | 1 | 15
20 | 200 | 0 | 15
20 | 201 | 0 | 10
30 | 300 | 0 | 10
=============================================
Sum should be: 60
What I want to achive is to sum up all shippingCost values, but since sumUP on supplierID 20 and 30 is 0, i just want to have the maximum value of these suppliers.
so
supplier 10 should have 35 (sum of values)
supplier 20 should have 15 (maximum value)
supplier 30 should have 10 (maximum value)
in sum it should be 60.
I tried a lot of complex querys but always got stuck when I want to decide to sum or take max and sum all afterwards.
Is this even possible with a mysql statement? (of course subquerys in it).
Any suggestions how to solve this?
First group by supplierid to get the sum and the max of shippingcost for each supplier and then use conditional aggregation on the results:
select
sum((t.sumup = 0) * maxshippingcost + (t.sumup = 1) * sumshippingcost) total
from (
select supplierid,
max(sumup) sumup,
max(shippingcost) maxshippingcost,
sum(shippingcost) sumshippingcost
from tablename
group by supplierid
) t
See the demo.
Or with a CASE expression:
select
sum(
case t.sumup
when 0 then maxshippingcost
when 1 then sumshippingcost
end
) total
from (
select supplierid,
max(sumup) sumup,
max(shippingcost) maxshippingcost,
sum(shippingcost) sumshippingcost
from tablename
group by supplierid
) t
See the demo.
Use a case expression to either return the SUM() or the MAX():
select supplierID,
case when max(sumUP) = 1 then sum(shippingCost) else max(shippingCost) end
from tablename
group by supplierID
EDIT BY Dwza
As forpas mentioned, this statement just gives me the result that needs to be summed up. The total statement could look like:
select sum(my.result) from
(select supplierID,
case when max(sumUP) = 1 then sum(shippingCost) else max(shippingCost) end as result
from tablename
group by supplierID) as my
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(supplier_ID INT NOT NULL
,articleID INT NOT NULL PRIMARY KEY
,sum_UP INT NOT NULL
,shippingCost INT NOT NULL
);
INSERT INTO my_table VALUES
(10,100,1,20),
(10,101,1,15),
(20,200,0,15),
(20,201,0,10),
(30,300,0,10);
SELECT SUM(x) total
FROM
( SELECT supplier_id, MAX(shippingcost) x FROM my_table WHERE sum_up = 0 GROUP BY supplier_id
UNION
SELECT supplier_id, shippingcost FROM my_table WHERE sum_up = 1
) a;
+-------+
| total |
+-------+
| 60 |
+-------+
I use case and group by
select supplier_id,
case
when sum_up = 0
then max(shipping_cost)
when sum_up = 1
then sum(shipping_cost) end as total
from table_name
group by supplier_id, sum_up;
The result as follows:
supplier_id, sum_up
20 15
10 35
30 10
Now, I can sum it
select sum(total)
from (
select supplier_id,
case
when sum_up = 0
then max(shipping_cost)
when sum_up = 1
then sum(shipping_cost) end as total
from cd.sample
group by supplier_id, sum_up
) a;
SELECT sum(A.SumShipping) as TotalSum
FROM (SELECT supplierID, if(sumup = 1, sum(shippingcost), max(shippingcost))
as SumShipping FROM tablename group by supplierID) as A;

How to select count where time between a list of date ranged?

There is a table named "record".
userid | date
------ | ------
1 | 2017-08-21
1 | 2017-08-22
2 | 2017-08-22
1 | 2017-08-23
3 | 2017-08-23
Now I need three SQL to count userid.
select count(DISTINCT userid) from record where date between 2017-08-21 and 2017-08-23;
result : 3
select count(DISTINCT userid) from record where date between 2017-08-22 and 2017-08-23;
result : 3
select count(DISTINCT userid) from record where date between 2017-08-23 and 2017-08-23;
result : 2
I want count those by one time, Could someone help me with this please ?
SELECT distinct date_field,
(select count(distinct userid) from record r1 where r1.date_field between r2.date_field and '2017-08-23')
FROM record r2
Or this more efficient version suggested by Shelden
SELECT date_field,
(select count(distinct userid) from record r1 where r1.date_field between r2.date_field and '2017-08-23')
FROM (select distinct date_field from record) r2
Use conditional aggregation:
select count(distinct case when between '2017-08-21' and '2017-08-23' then userid end),
count(distinct case when between '2017-08-22' and '2017-08-23' then userid end),
count(distinct case when between '2017-08-24' and '2017-08-23' then userid end)
from record r;