Conditional Window Functions - mysql

I have a sales table that looks like this:
store_id cust_id txn_id txn_date amt industry
200 1 1 20180101 21.01 1000
200 2 2 20200102 20.01 1000
200 2 3 20200103 19 1000
200 3 4 20180103 19 1000
200 4 5 20200103 21.01 1000
300 2 6 20200104 1.39 2000
300 1 7 20200105 12.24 2000
300 1 8 20200105 25.02 2000
400 2 9 20180106 103.1 1000
400 2 10 20200107 21.3 1000
Here's the code to generate this sample table:
CREATE TABLE sales(
store_id INT,
cust_id INT,
txn_id INT,
txn_date bigint,
amt float,
industry INT);
INSERT INTO sales VALUES(200,1,1,20180101,21.01,1000);
INSERT INTO sales VALUES(200,2,2,20200102,20.01,1000);
INSERT INTO sales VALUES(200,2,3,20200103,19.00,1000);
INSERT INTO sales VALUES(200,3,4,20180103,19.00,1000);
INSERT INTO sales VALUES(200,4,5,20200103,21.01,1000);
INSERT INTO sales VALUES(300,2,6,20200104,1.39,2000);
INSERT INTO sales VALUES(300,1,7,20200105,12.24,2000);
INSERT INTO sales VALUES(300,1,8,20200105,25.02,2000);
INSERT INTO sales VALUES(400,2,9,20180106,103.1,1000);
INSERT INTO sales VALUES(400,2,10,20200107,21.3,1000);
What I would like to do is create a new table, results that answers the question: what percentage of my VIP customers have, since January 3rd 2020, shopped i) at my store only; ii) at my store and at other stores in the same industry; iii) at only other stores in the same industry? Define a VIP customer to be someone who has shopped at a given store at least once since 2019.
Here's the target output table:
store industry pct_my_store_only pct_both pct_other_stores_only
200 1000 0.5 0.5 0.0
300 2000 0.5 0.5 0.0
400 1000 0.0 1.0 0.0
I'm trying to use window functions to accomplish this. Here's what I have so far:
CREATE TABLE results as
SELECT s.store_id, s.industry,
COUNT(DISTINCT (CASE WHEN s.txn_date>=20200103 THEN s.cust_id END)) * 1.0 / sum(count(DISTINCT (CASE WHEN s.txn_date>=20200103 THEN s.cust_id END))) OVER (PARTITION BY s.industry) AS pct_my_store_only
...AS pct_both
...AS pct_other_stores_only
FROM sales s
WHERE sales.txn_date>=20190101
GROUP BY s.store_id, s.industry;
The above does not seem to be correct; how can I correct this?

Join the distinct store_ids and industries to the concatenated distinct store_ids and industries for each customer and then use window function avg() with the function find_in_set() to determine if a customer how many customer have shopped or not from each store:
with
stores as (
select distinct store_id, industry
from sales
where txn_date >= 20190103
),
customers as (
select cust_id,
group_concat(distinct store_id) stores,
group_concat(distinct industry) industries
from sales
where txn_date >= 20190103
group by cust_id
),
cte as (
select *,
avg(concat(s.store_id) = concat(c.stores)) over (partition by s.store_id, s.industry) pct_my_store_only,
avg(find_in_set(s.store_id, c.stores) = 0) over (partition by s.industry) pct_other_stores_only
from stores s inner join customers c
on find_in_set(s.industry, c.industries) and find_in_set(s.store_id, c.stores)
)
select distinct store_id, industry,
pct_my_store_only,
1 - pct_my_store_only - pct_other_stores_only pct_both,
pct_other_stores_only
from cte
order by store_id, industry
See the demo.
Results:
> store_id | industry | pct_my_store_only | pct_both | pct_other_stores_only
> -------: | -------: | ----------------: | -------: | --------------------:
> 200 | 1000 | 0.5000 | 0.5000 | 0.0000
> 300 | 2000 | 0.5000 | 0.5000 | 0.0000
> 400 | 1000 | 0.0000 | 1.0000 | 0.0000

Related

Running total in two tables

Consider two tables: invoices and payments. The invoices table contains records of invoices raised, and the payments table contains records of payments received.
invoices
id
date
cname
amount
1
2021-12-12
cname1
10000
2
2021-12-13
cname2
5000
3
2022-01-15
cname1
7000
4
2022-01-16
cname2
1000
payments
id
date
cname
amount
1
2022-01-05
cname1
5000
2
2022-01-07
cname2
5000
3
2022-02-05
cname1
10000
4
2022-02-06
cname2
1000
CALCULATE RUNNING BALANCE
Q) Extend the SQL query to do invoice / payment matching as follows (as of 28/2/2022)
matching
date
document_id
cname
amount
due
2021-12-12 00:00:00
1
cname1
10000
10000
2022-01-05 00:00:00
1
cname1
-5000
5000
2022-01-15 00:00:00
3
cname1
7000
12000
2022-02-05 00:00:00
3
cname1
-10000
2000
2021-12-13 00:00:00
2
cname2
5000
5000
2022-01-07 00:00:00
2
cname2
-5000
0
2022-01-16 00:00:00
4
cname2
1000
1000
2022-02-06 00:00:00
4
cname2
-1000
0
You can union both tables considering the second one with negative amount, and then a simple running total will produce the result you want. For example:
select
date,
id as document_id,
cname,
amount,
sum(amount) over(partition by id order by date) as due
from (
select * from invoices
union all select id, date cname, -amount from payments
) x
order by cname, date
SELECT `date`,
documentId,
cname,amount,
due FROM (SELECT `date`,
documentId,
cname,
amount,
(CASE WHEN #running_customer='' THEN #running_balance:=amount
WHEN #running_customer=cname THEN #running_balance:=#running_balance+amount ELSE #running_balance:=amount END) due,
#running_customer:=cname
FROM (SELECT `date`, id AS documentId,cname, amount FROM `invoices`i
UNION ALL
SELECT `date`, id AS documentId,cname, amount*-1 AS actionType FROM `payments` p) final
JOIN (SELECT #running_customer:='') rc
JOIN (SELECT #running_balance:=0) rb
ORDER BY cname, `date`) finalResult
You need to be using assignment operator for these kind of problems.

Income and Expense calculation

I need to show income and expense per day
Income and expenses are different table
I need to show in following format, for ex:
17/08/2019 date have two incomes in the table
I need to calculate sum of the income in the date, showing on the result with the same day expense.
I have tried with some queries, but it's not working.
Date | Income | Expense | Profit
Select SUM(d.amount)
, SUM(e.amount)
, d.date
, e.date
FROM due d
JOIN expenses e
ON d.date = e.date
Expense table -table-name : expenses
id | date | details | amount
1 13-08-2019 daily 50
2 17-08-2019 cleaning 50
3 17-08-2019 cleaning 50
4 18-08-2019 Tea 150
5 18-08-2019 other 50
Income table -table-name : due
id | date | amount
4 12-08-2019 150
5 13-08-2019 100
6 18-08-2019 450
7 18-08-2019 50
result will be:
id | date | Income | Expense | Profit
1 12-08-2019 150 NULL 150
2 13-08-2019 100 50 50
3 17-08-2019 NULL 100 -100
4 18-08-2019 500 200 300
In the future, I'd suggest posting some table details by using SHOW CREATE TABLE table_name which will allow us to better assist you.
You should be able to use a union and some grouping to get what you are after:
SELECT
Date,
SUM(Income) as Income,
SUM(Expense) as Expense,
SUM(Income) - SUM(Expense) as Profit
FROM (
SELECT
due.date as Date,
due.amount as Income,
0 as Expense
FROM due
UNION ALL
SELECT
expenses.date as Date
0 as Income,
expenses.amount as Expense
FROM expenses
)
GROUP BY Date

SQL Query get total value based on different unit price,quantity at different time

I have a transaction table that like this: quantity is the total quantity in stock based on different unit price. let's call it T
id | transaction_time | item | unit_price | quantity | subtotal
1 2012-5-15 A 1.00 15 15.00
2 2012-5-15 A 3.00 15 45.00
3 2012-5-15 B 1.00 10 10.00
4 2012-6-10 A 2.00 15 30.00
5 2012-6-15 A 2.00 10 20.00
I need to get the total value of each item in stock over time...however, same items are based on different unit price. The result for A for example is:
transaction_time | item | quantity | subtotal
2012-5-15 A 30 60.00
2012-6-10 A 45 90.00
2012-6-15 A 40 80.00
2012-5-15, we have 15 item A with price 1.00, 15 item A with price 3.00, so the total quantity is 30, subtotal is 15*1+15*3=60.
2012-6-10 we have 15 more item A with price 2, so the total quantity become 30+15=45, subtotal become 60+15*2=90
2012-6-15 we have 10 item A with price 2, so item A with price 2 goes down from 15 to 10. the total quantity become 40, and the subtotal goes down -2*5, which become 80.
I tried
select transaction_time,sum(quantity),sum(subtotal)
where id in(select max(id) from T group by unit_price,item)
group by item
having item=A
This only gives me the last line
2012-6-15 A 40 80.00
You need first to identify all possible unit_price values for the specific item:
SELECT DISTINCT unit_price
FROM t
WHERE item = 'A'
Output:
unit_price
----------
1
3
2
You also need to identify all possible transaction_times:
SELECT DISTINCT transaction_time
FROM t
WHERE item = 'A';
Output:
transaction_time
----------------
2012-05-15
2012-06-10
2012-06-15
Now perform a CROSS JOIN between the above two sets
SELECT *
FROM (
SELECT DISTINCT transaction_time
FROM t
WHERE item = 'A') AS times
CROSS JOIN (
SELECT DISTINCT unit_price
FROM t
WHERE item = 'A') AS up
ORDER BY times.transaction_time
to get:
transaction_time unit_price
----------------------------
2012-05-15 3
2012-05-15 2
2012-05-15 1
2012-06-10 3
2012-06-10 2
2012-06-10 1
2012-06-15 1
2012-06-15 3
2012-06-15 2
Now use the above and perform a correlated subquery to get unit_price per transaction_time from item 'A':
SELECT transaction_time, unit_price,
(SELECT quantity
FROM t
WHERE t.item = 'A'
AND t.unit_price = up.unit_price
AND t.transaction_time <= times.transaction_time
ORDER BY transaction_time DESC LIMIT 1) AS quantity
FROM (
SELECT DISTINCT transaction_time
FROM t
WHERE item = 'A') AS times
CROSS JOIN (
SELECT DISTINCT unit_price
FROM t
WHERE item = 'A') AS up
ORDER BY times.transaction_time
Output:
transaction_time unit_price quantity
----------------------------------------
15.05.2012 00:00:00 1 15
15.05.2012 00:00:00 3 15
15.05.2012 00:00:00 2 NULL
10.06.2012 00:00:00 1 15
10.06.2012 00:00:00 3 15
10.06.2012 00:00:00 2 15
15.06.2012 00:00:00 1 15
15.06.2012 00:00:00 3 15
15.06.2012 00:00:00 2 10
The final result is simply a matter of performing a GROUP BY on the above:
SELECT transaction_time,
'A' AS item,
SUM(quantity) AS quantity,
SUM(quantity*unit_price) AS subtotal
FROM (
SELECT transaction_time, unit_price,
(SELECT quantity
FROM t
WHERE t.item = 'A'
AND t.unit_price = up.unit_price
AND t.transaction_time <= times.transaction_time
ORDER BY transaction_time DESC LIMIT 1) AS quantity
FROM (
SELECT DISTINCT transaction_time
FROM t
WHERE item = 'A') AS times
CROSS JOIN (
SELECT DISTINCT unit_price
FROM t
WHERE item = 'A') AS up) AS x
GROUP BY transaction_time
Output:
transaction_time item quantity subtotal
----------------------------------------------
15.05.2012 A 30 60
10.06.2012 A 45 90
15.06.2012 A 40 80
Demo here
Following query(kind of complex, maybe slow, needs optimization) works, check DEMO
SELECT tr_sub.cur_tt, tr_sub.item, sum(tr.quantity), sum(tr.quantity*tr.unit_price)
FROM
(SELECT tr1.transaction_time as cur_tt, max(tr2.transaction_time) as prev_tt, tr1.item as item,
IF (tr1.unit_price=tr2.unit_price, tr1.unit_price, tr2.unit_price) as t_p
FROM transactions tr1 LEFT JOIN transactions tr2 ON
tr1.transaction_time>=tr2.transaction_time AND tr1.item=tr2.item
GROUP BY tr1.item, tr1.transaction_time, t_p
) as tr_sub INNER JOIN transactions tr ON
tr_sub.prev_tt=tr.transaction_time
AND tr_sub.item=tr.item
AND tr_sub.t_p=tr.unit_price
GROUP BY tr_sub.item, tr_sub.cur_tt
ORDER BY tr_sub.cur_tt, tr_sub.item

How to combine this MySQL queries?

I have 2 tables like this:
table:prices
id, project_id, real_min_price, real_max_price
1 | 100100 | 500 | 2000
2 | 100100 | 900 | 3000
3 | 100100 | 2500 | 3200
4 | 100100 | 320 | 3900
table:gifts
id, project_id, min_price, max_price, gift
1 | 100100 | 0 | 1000 | 10
2 | 100100 | 1001 | 2000 | 20
3 | 100100 | 2001 | 3000 | 30
4 | 100100 | 3001 | 4000 | 40
5 | 100100 | 4001 | 5000 | 50
6 | 100100 | 5001 | 6000 | 60
$ID = 100100;
// find highest price
SELECT MAX(real_max_price) FROM `prices` WHERE project_id='$ID';
$MAX_PROJECT_PRICE = $dbo->getOne();
-- returns 3900
// find the limit row which between min-max columns of this value
SELECT gift FROM `gifts` WHERE project_id='$ID'
AND max_price>='$MAX_PROJECT_PRICE' ORDER BY max_price ASC LIMIT 1;
$MAX_GIFT = $dbo->getOne();
-- founded 4th row of the gifts table and returns 40
// remove other gift rows higher then MAX_GIFT value
DELETE FROM `gifts` WHERE project_id='$ID' AND gift>'$MAX_GIFT';
-- deleted 5th and 6th rows.
in this scenario
it will find max price as "3900" so, 5th and 6th row of gifts table will be removed.
but this way is really bad, it should be done in one query but how?
Okay you can except delete query, at least maybe we can combine first
2 queries to find MAX_GIFT value.
To combine the first two queries you can do.
Query
SELECT
gift
FROM
gifts
WHERE
project_id = 100100
AND max_price >= (SELECT MAX(real_max_price)
FROM prices
WHERE project_id = 100100;)
ORDER BY
max_price ASC
LIMIT 1;
And using all three queries into one.
Query
DELETE FROM
gifts
WHERE
project_id = 100100
AND
gift > (SELECT
gift
FROM
gifts
WHERE
project_id = 100100
AND
max_price >= (SELECT
MAX(real_max_price)
FROM
prices
WHERE
project_id = 100100;
)
ORDER BY
max_price ASC
LIMIT 1
)
Here is a solution to run the delete command based on your queries conditions:
delete from gifts where gift in
(
select gift from
(
select gift, #row:=#row+1 as rw
from gifts, (select #row:=0) a
where (select max(real_max_price)
from prices
where project_id=100100 limit 1) < max_price
and project_id = 100100
order by gift
) sub
where rw>1
);
Sample data and test.
create table prices (
id integer,
project_id integer,
real_min_price integer,
real_max_price integer
);
create table gifts (
id integer,
project_id integer,
min_price integer,
max_price integer,
gift integer
);
insert into prices values
(1,100100,500,2000),
(2,100100,900,3000),
(3,100100,2500,3200),
(4,100100,320,3900);
insert into gifts values
(1 , 100100 , 0 , 1000 , 10),
(2 , 100100 , 1001 , 2000 , 20),
(3 , 100100 , 2001 , 3000 , 30),
(4 , 100100 , 3001 , 4000 , 40),
(5 , 100100 , 4001 , 5000 , 50),
(6 , 100100 , 5001 , 6000 , 60);
Running the query I provided you will get on your gifts table.
1 100100 0 1000 10
2 100100 1001 2000 20
3 100100 2001 3000 30
4 100100 3001 4000 40

Single query to retrieve multiple values from multiple tables

Expenses table
1/1/2016 exp1 2000
13/1/2016 exp11 2500
1/2/2016 exp2 1500
1/3/2016 exp3 1000
10/3/2016 exp1 2000
Income table
1/1/2016 income1 2500
1/2/2016 income2 3500
1/3/2016 income3 1500
10/3/2016 income3 1000
1/4/2016 income4 5000
From single query what I need is group by month, this is what I need
Expenses Incomes Month
4500 2500 Jan
1500 3500 Feb
3000 2500 Mar
0 5000 April
I need the above query to show the data in Google graph
Terrible data structure and format, but not impossible:
SELECT
IFNULL(exp.Expenses,0) Expenses,
IFNULL(inc.Incomes,0) Incomes,
inc.`monthname` `Month`
FROM
(
SELECT
SUM(i.amount) Incomes,
MONTHNAME(STR_TO_DATE(i.`date`, '%d/%m/%Y')) `monthname`,
MONTH(STR_TO_DATE(i.`date`, '%d/%m/%Y')) `month`
FROM
incomes i
GROUP BY
MONTHNAME(STR_TO_DATE(i.`date`, '%d/%m/%Y')),
MONTH(STR_TO_DATE(i.`date`, '%d/%m/%Y'))
) inc
LEFT JOIN
(
SELECT
SUM(e.amount) Expenses,
MONTHNAME(STR_TO_DATE(e.`date`, '%d/%m/%Y')) `monthname`,
MONTH(STR_TO_DATE(e.`date`, '%d/%m/%Y')) `month`
FROM
expenses e
GROUP BY
MONTHNAME(STR_TO_DATE(e.`date`, '%d/%m/%Y')),
MONTH(STR_TO_DATE(e.`date`, '%d/%m/%Y'))
) exp
ON exp.`month` = inc.`month`
ORDER BY
inc.`month`
Output of this simplicity:
+----------+---------+----------+
| Expenses | Incomes | Month |
+----------+---------+----------+
| 4500 | 2500 | January |
| 1500 | 3500 | February |
| 3000 | 2500 | March |
| 0 | 5000 | April |
+----------+---------+----------+
4 rows in set
Anyway better thing seriously how to improve and normalize your data.
In my solution, I give the number of the month rather than text. I'll leave it to you to convert it to text (using a CASE expression, for example) if you wish:
SELECT
sum(expense) AS total_expense, sum(income) AS total_income, trans_month
FROM (
SELECT
month(trans_date) AS trans_month,
0 AS income,
sum(amount) AS expense
FROM expense
GROUP BY month(trans_date)
UNION ALL
SELECT
month(trans_date) AS trans_month,
sum(amount) AS income,
0 AS expense
FROM income
GROUP BY month(trans_date)
) AS a
GROUP BY trans_month;