Combine columns from 2 queries - mysql

I have 2 queries where I group by week, by quarter and year using added_on, the problem comes in when I try to combine the queries since the first query weeks are from 1,2,3,4,5,....,25, it has transactions in every week of the year and the second query is 1,4,5,8,15,25 because it does not have transactions on some weeks and I need for both to have 1,2,3,4,5,6,7,...,25 is there any way to achieve this?
transaction table
id
value
added_on
currency_id
1
100
2020/01/20
2
320
2020/2/10
currency table
id
type
1
USD
2
EUR
My query is looking like this
SELECT
week_usd,
week_eur,
total_usd,
total_eur,
quarter_year
FROM
(SELECT
WEEK(transaction.added_on) AS week_usd,
SUM(transaction.value) AS total_usd,
CONCAT(QUARTER(transaction.added_on)," ", YEAR(transaction.added_on)) AS quarter_year
FROM transaction
JOIN currency ON transaction.currency_id = currency.id
WHERE
currency.type = 'USD'
GROUP BY 1,3
) AS table1,
(SELECT
WEEK(transaction.added_on) AS week_eur,
SUM(transaction.value) AS total_eur,
CONCAT(QUARTER(transaction.added_on)," ", YEAR(transaction.added_on)) AS quarter_year
FROM transaction
JOIN currency ON transaction.currency_id = currency.id
WHERE
currency.type = 'EUR'
GROUP BY 1,3
) AS table2
The problem with my query is that it will display like this
week_usd
week_eur
total_usd
total_eur
quarter_year
0
1
100
150
1 2020
1
1
100
150
1 2020
2
1
100
150
1 2020
3
1
100
150
1 2020
4
1
100
150
1 2020
5
1
100
150
1 2020
6
1
100
150
1 2020
7
1
100
150
1 2020
8
1
100
150
1 2020

You need first to create unique list of dates before getting joins of totals:
Select date_list.quarter_year
, date_list.week
, USD.total_usd
, EUR.total_eur
from
( select distinct CONCAT(QUARTER(transaction.added_on)," ", YEAR(transaction.added_on)) AS quarter_year
, WEEK(transaction.added_on) week
from transaction
) as date_list
Left Join (
select CONCAT(QUARTER(transaction.added_on)," ", YEAR(transaction.added_on)) AS quarter_year
, WEEK(transaction.added_on) week
, SUM(transaction.value) AS total_usd
from transaction
join currency on transaction.currency_id = currency.id
where currency.type = 'USD'
group by 1,2
) as USD on date_list.quarter_year = USD.quarter_year and date_list.week = USD.week
Left Join (
select CONCAT(QUARTER(transaction.added_on)," ", YEAR(transaction.added_on)) AS quarter_year
, WEEK(transaction.added_on) week
, SUM(transaction.value) AS total_eur
from transaction
join currency on transaction.currency_id = currency.id
where currency.type = 'EUR'
group by 1,2
) as EUR on date_list.quarter_year = EUR.quarter_year and date_list.week = EUR.week ;

Related

Cohort Analysis in SQL - Total Returning Users for that Day regardless if the user had visited in the previous days

I have the following data (base_data):
visit_date user_id
11/12/2021 Jake
11/12/2021 Amy
12/12/2021 Holt
12/12/2021 Jake
13/12/2021 Amy
13/12/2021 Jake
14/12/2021 Jake
14/12/2021 Holt
There are two users that visit on 11th and then only one of them visit on 12th. Hence where 11th is the first day, Day_1 = 2 and Day_2 = 1.
According to my query, I get the following result after pivoting rcohortday as row and day_number as column:
Date Day_1 Day_2 Day_3 Day_4
11/12/2021 2 1 2 1
12/12/2021 1 0 1
13/12/2021 0 0
14/12/2021 1
However, the 12/12/2021 row doesn't consider the user that arrived on the 1st and the 2nd day. I want it to consider totals for that day regardless if the user had visited in the previous days or not.
My desired result would be:
Date Day_1 Day_2 Day_3 Day_4
11/12/2021 2 1 2 1
12/12/2021 2 1 1
13/12/2021 2 1
14/12/2021 1
Let me know if you need anymore clarity especially with the examples.
The following is my query:
with user_cohorts as (
SELECT user_id
, MIN(DATETRUNC(to_date(visit_date, 'yyyymmdd'),'dd')) as cohortday
FROM base_data
GROUP BY user_id
),
visit_day as (
SELECT user_id
, (DATEDIFF(to_date(visit_date, 'yyyymmdd'),cohortday, 'dd')+1) as day_number
, count(distinct user_id) as user_count
FROM base_data
LEFT JOIN user_cohorts USING(user_id)
GROUP BY user_id, DATEDIFF(to_date(visit_date, 'yyyymmdd'),cohortday, 'dd')
),
cohort_size as (
SELECT count(*) as user_count
, cohortday
FROM user_cohorts
GROUP BY cohortday
ORDER BY cohortday
),
retention_table as (
SELECT c.cohortday as rcohortday
, o.day_number
, sum(user_count) as user_count
FROM visit_day o
LEFT JOIN user_cohorts c USING (user_id)
group by c.cohortday
, o.day_number
)
select * from retention_table
I am using Max compute SQL which is an Ali Baba technology. It's similar to MySQL.

Creating an overdraft statement

I'm currently stuck on how to create a statement that shows daily overdraft statements for a particular council.
I have the following, councils, users, markets, market_transactions, user_deposits.
market_transaction run daily reducing user's account balance. When the account_balance is 0 the users go into overdraft (negative). When users make a deposit their account balance increases.
I Have put the following tables to show how transactions and deposits are stored.
if I reverse today's transactions I'm able to get what account balance a user had yesterday but to formulate a query to get the daily OD amount is where the problem is.
USERS
user_id
name
account_bal
1
Wells
-5
2
James
100
3
Joy
10
4
Mumbi
-300
DEPOSITS
id
user_id
amount
date
1
1
5
2021-04-26
2
3
10
2021-04-26
3
3
5
2021-04-25
4
4
5
2021-04-25
TRANSACTIONS
id
user_id
amount_tendered
date
1
1
5
2021-04-27
2
2
10
2021-04-26
3
3
15
2021-04-26
4
4
50
2021-04-25
The Relationships are as follows,
COUNCILS
council_id
name
1
a
2
b
3
c
MARKETS
market_id
name
council_id
1
x
3
2
y
1
3
z
2
MARTKET_USER_LINK
id
market_id
user_id
1
1
3
2
2
2
3
3
1
I'm running this SQL query to get the total amount users have spent and subtracting with the current user account balance.
Don't know If I can use this to figure out the account_balance for each day.
SELECT u.user_id, total_spent, total_deposits,m.council_id
FROM users u
JOIN market_user_link ul ON ul.user_id= u.user_id
LEFT JOIN markets m ON ul.market_id =m.market_id
LEFT JOIN councils c ON m.council_id =c.council_id
LEFT JOIN (
SELECT user_id, SUM(amount_tendered) AS total_spent
FROM transactions
WHERE DATE(date) BETWEEN DATE('2021-02-01') AND DATE(NOW())
GROUP BY user_id
) t ON t.user_id= u.user_id
ORDER BY user_id, total_spent ASC
// looks like this when run
| user_id | total_spent | council_id |
|-------------|----------------|------------|
| 1 | 50.00 | 1 |
| 2 | 2.00 | 3 |
I was hoping to reverse transactions and deposits done to get the account balance for a day then get the sum of users with an account balance < 0... But this has just failed to work.
The goal is to produce a query that shows daily overdraft (Only SUM the total account balance of users with account balance below 0 ) for a particular council.
Expected Result
date
council_id
o_d_amount
2021-04-24
1
-300.00
2021-04-24
2
-60.00
2021-04-24
3
-900.00
2021-04-25
1
-600.00
2021-04-25
2
-100.00
2021-04-25
3
-1200.00
This is actually not that hard, but the way you asked makes it hard to follow.
Also, your expected result should match the data you provided.
Edited: Previous solution was wrong - It counted withdraws and deposits more than once if you have more than one event for each user/date.
Start by having the total exchanged on each day, like
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d
group by user_id, date
order by user_id, date;
What follows gets the state of the account only on days that had any deposits or withdraws.
To get the results of all days (and not just those with account movement) you just have to change the cross join part to get a table with all dates you want (like Get all dates between two dates in SQL Server) but I digress...
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
order by dates.date desc, c.council_id, u.name;
From there you can rearrange to get the result you want.
select date, council_id
, sum(case when amount_on_start_of_day<0 then amount_on_start_of_day else 0 end) o_d_amount_start
, sum(case when amount_on_end_of_day<0 then amount_on_end_of_day else 0 end) o_d_amount_end
from (
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
) result
group by date, council_id
order by date;
You can check it on https://www.db-fiddle.com/f/msScT6B5F7FjU2aQXVr2da/6
Basically the query maps users to councils, caculates periods of overdrafts for users, them aggregates over councils. I assume that starting balance is dated start of the month '2021-04-01' (it could be ending balance as well, see below), change it as needed. Also that negative starting balance counts as an overdraft. For simplicity and debugging the query is divided into a number of steps.
with uc as (
select distinct m.council_id, mul.user_id
from markets m
join market_user_link mul on m.market_id = mul.market_id
),
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
sum(sum(s)) over(partition by user_id order by date) rt
from (
select user_id, date, -amount_tendered s
from transactions
union all
select user_id, date, amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, 0 s
union all
select '2021-04-01' d, account_bal
) se
) t
group by user_id, date
),
user_overdraft as (
select user_id, date, nxt, least(rt, 0) ovd
from user_running_total
where date <= date(NOW())
),
dates as (
select date
from user_overdraft
union
select nxt
from user_overdraft
),
council__overdraft as (
select uc.council_id, d.date, sum(uo.ovd) total_overdraft, lag(sum(uo.ovd), 1, sum(uo.ovd) - 1) over(partition by uc.council_id order by d.date) prev_ovd
from uc
cross join dates d
join user_overdraft uo on uc.user_id = uo.user_id and d.date between uo.date and uo.nxt
group by uc.council_id, d.date
)
select council_id, date, total_overdraft
from council__overdraft
where total_overdraft <> prev_ovd
order by date, council_id
Really council__overdraft is quite usable, the last step just compacts output excluding intermidiate dates when overdraft is not changed.
With following sample data:
users
user_id name account_bal
1 Wells -5
2 James 100
3 Joy 10
4 Mumbi -300
deposits, odered by date, extra row added for the last date
id user_id amount date
3 3 5 2021-04-25
4 4 5 2021-04-25
1 1 5 2021-04-26
2 3 10 2021-04-26
5 3 73 2021-05-06
transactions, odered by date (note the added row, to illustrate running total in action)
id user_id amount_tendered date
5 4 50 2021-04-25
2 2 10 2021-04-26
3 3 15 2021-04-26
1 1 5 2021-04-27
4 3 17 2021-04-27
councils
council_id name
1 a
2 b
3 c
markets
market_id name council_id
1 x 3
2 y 1
3 z 2
market_user_link
id market_id user_id
1 1 3
2 2 2
3 3 1
4 3 4
the query ouput is
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-305
3
2021-04-01
0
2
2021-04-25
-350
2
2021-04-26
-345
2
2021-04-27
-350
3
2021-04-27
-7
3
2021-05-06
0
Alternatively, provided the users table is holding a closing (NOW()) balance, replace user_running_total CTE with the following code
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
coalesce(sum(sum(s)) over(partition by user_id order by date desc
rows between unbounded preceding and 1 preceding), sum(s)) rt
from (
select user_id, date, amount_tendered s
from transactions
union all
select user_id, date, -amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, account_bal s
union all
select '2021-04-01' d, 0
) se
) t
where DATE(date) between date '2021-04-01' and date(NOW() + interval 1 day)
group by user_id, date
),
This way the query starts with closing balance dated next date after now and rollouts a running total in the reverse order till '2021-04-01' as a starting date.
Output
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-260
3
2021-04-01
-46
2
2021-04-25
-305
3
2021-04-25
-41
2
2021-04-26
-300
3
2021-04-26
-46
2
2021-04-27
-305
3
2021-04-27
-63
3
2021-05-06
0
db-fiddle both versions

Optimising a JOIN to the latest data in a second table in MySQL

We are having a problem optimising a set of queries that all follow the same pattern.
The basic scenario is that we have a table (hours) that records the weekly hours worked by an individual, against project codes.
We have a second table (rates) which records the historical hourly rates for each individual.
We want to report the total salary (hours * rate) for each project in the hours table
The query to return all the hours that we are interested in is
SELECT hours_job, SUM(hour_value) AS total_hours
FROM hours_table
WHERE hours_job_status = "1"
GROUP BY hours_job
We need to join that to the rates table, on the latest rates_date, however an individual may, in edge cases, have >1 rate on a particualr date, and in these cases we want the MAX rate
This is out current attempt, which is extremenly slow
SELECT hours_job, SUM(hour_value * rate_value) AS salary
FROM hours_table
JOIN rates_table ON rate_person_id = hours_person_id
WHERE hours_job_active = "1"
AND rate_date = (
SELECT MAX(rate_date)
FROM rates_table
WHERE hours_person_id = rate_person_id
AND hours_week >= rate_date
AND rate_active = 1
)
AND rt_rate = (
SELECT MAX(rate_value)
FROM rates_table
WHERE hours_person_id = rate_person_id
AND hours_week >= rate_date
AND rate_active = 1
GROUP BY hours_job
ORDER BY hours_job;
As suggested here are is a snippet of the data and expected result
hours_table
hours_id hours_person_id hours_week hours_job hours_value hours_job_active
1 1 "2020-06-12" 100 20 1
2 1 "2020-06-12" 101 10 1
3 1 "2020-06-12" 102 10 1
4 2 "2020-06-12" 100 30 1
5 2 "2020-06-12" 102 10 1
rates_table
rate_id rate_person_id rate_date rate_value rate_type rate_active
1 1 "2020-04-01" 25.00 A 1
2 1 "2019-04-01" 20.00 A 1
3 1 "2018-04-01" 18.00 A 1
4 2 "2020-04-01" 20.00 A 1
5 2 "2020-04-01" 18.00 Y 1
Expected result would be
hours_job salary
100 1100 ((20*25) + (30*20))
101 250 (10*25)
102 450 ((10*25) + (20*10))
It's been long since I've had one of this problems to optimize, and without a test db I can't make sure this works, but have you tried something like this?
SELECT hours_job, SUM(hour_value * MAX(rate_value)) AS salary
FROM hours_table
JOIN rates_table ON rate_person_id = hours_person_id
WHERE hours_job_active = "1"
AND hours_week >= rate_date
AND rate_active = 1
GROUP BY hours_job, rates_date
ORDER BY hours_job;

Select join on subquery from multiple table

Existing code i have for my select query
Table named material:
mat_id mat_name supplier_id stock_in stock_released Balance date
1 alloy 4 30 0 30 feb13
2 steel 2 15 0 15 feb13
3 alloy 2 0 3 15 feb14
SELECT m.`mat_id`, m.`mat_name`, m.`stock_in`, m.`stock_released`,
(select sum(stock_in) - sum(stock_released)
from material m2
where m2.mat_name = m.mat_name and
m2.mat_id <= m.mat_id
) as balance,
m.`date`
FROM `material` m
ORDER BY m.`mat_id` ASC;
How can I add in my query the supplier name from supplier table? How to Join this?
sup_id sup_name
2 rain
4 george
Try this SQL:
SELECT m.`mat_id`, m.`mat_name`, m.`stock_in`, m.`stock_released`,
(select sum(stock_in) - sum(stock_released)
from material m2
where m2.mat_name = m.mat_name and
m2.mat_id <= m.mat_id
) as balance,
m.`date`,
s.`sup_name`
FROM `material` m
LEFT JOIN `supplier` s on s.sup_id = m.supplier_id
ORDER BY m.`mat_id` ASC;

automaticly change variable in where clause

The code i use gives me the correct information based on a date in a where clause. I want to have the same information on other dates. So now I have to change the date myself and run the code, copy/paste it somewhere else and start over again with a new date. That takes a lot of work if i want the information for every day of the year. Is it possible to automaticly change the date in the where clause and what is the best or easiest way to do that?
Select t4.Count, t4.Status
From(
SELECT count(l.VoerID) as Count, l.Datum, l.Status, l.LogID
FROM (
SELECT k.VoerID, k.Datum, MAX(k.LogID) AS LogID
FROM DB.LogStatus k
Where Datum < '2013-07-01'
GROUP BY k.VoerID
) m
JOIN DB.LogStatus l
ON l.VoerID = m.VoerID AND l.LogID = m.LogID
Where status in ('B','IN1','IN2''V','Reserv')
Group by Status
)t4
EDIT:
original table (selected on one VoerID) (table consist of thousands of VoerID's)
LogID Datum UserID Status Time VoerID
1299772 2013-04-17 259 N 14:09:11 50174
1319774 2013-05-23 68 B 11:19:17 50174
1320038 2013-05-23 197 IN1 16:53:30 50174
1322002 2013-05-28 68 IN2 09:22:32 50174
1325052 2013-05-31 161 G 09:00:59 50174
1325166 2013-05-31 10 400 09:15:12 50174
1325182 2013-05-31 10 V 09:30:07 50174
1325208 2013-05-31 10 V 09:45:06 50174
1325406 2013-05-31 10 Reserv 11:45:06 50174
1325522 2013-05-31 10 Reserv 12:15:06 50174
1325954 2013-05-31 10 Reserv 15:15:13 50174
1328474 2013-06-05 10 Reserv 13:15:06 50174
1329230 2013-06-06 10 Reserv 09:45:03 50174
1329244 2013-06-06 10 Archived 10:00:08 50174
1329268 2013-06-06 10 Archived 10:15:08 50174
1330286 2013-06-07 10 Archived 10:15:06 50174
I want to now what was the status of the VoerID on all first of months. so on 2013-05-01 status = N, on 2013-06-01 status = Reserv and from 2013-07-01 it is Archived.
So above is for one VoerID. I want to count the number of VoerID's per first of month, per last LOGID before the first of next month and per status
Finally if I get the information i want to edit it in MSExcel to a crosstable and Chart:
1-1-2013 1-2-2013 1-3-2013 1-4-2013 1-5-2013
N 20 22 24 26 28
B 23 21,5 20 18,5 17
IN1 12 15 18 21 24
IN2 15 7 14 18 25
V 800 1000 1200 1400 1600
Reserv 50 63 76 89 102
Archived 100000 101220 102440 103660 104880
Doing a cross join of all the days of the year, then grouping by that day.
Something like this:-
SELECT COUNT(l.VoerID) as COUNT, m.aDate, l.Status
FROM
(
SELECT Sub1.aDate, k.VoerID, MAX(k.LogID) AS LogID
FROM DB.LogStatus k
CROSS JOIN
(
SELECT DATE_ADD('2013-01-01', INTERVAL units.i + tens.i * 10 + hundreds.i * 100 DAY) AS aDate -- return the first day of the year + all the numbers from 0 to 999
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units -- Select units of days
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens -- select tens
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) hundreds -- select hundreds
WHERE DATE_ADD('2013-01-01', INTERVAL units.i + tens.i * 10 + hundreds.i * 100 DAY) <= '2013-12-31' -- limit the dates to the days of the specific year
) Sub1
WHERE k.Datum < Sub1.aDate -- This should give up multiple copies of record, one for each date where the d Datum is less that that date
GROUP BY Sub1.aDate, k.VoerID -- GRoup by date and id, so getting the max log id for each date and id
) m
JOIN DB.LogStatus l
ON l.VoerID = m.VoerID AND l.LogID = m.LogID -- Join where log it is the max log id
WHERE status in ('x','y','z')
GROUP BY m.aDate, Status
EDIT - or for each month:-
SELECT COUNT(l.VoerID) as COUNT, m.aDate, l.Status
FROM
(
SELECT Sub1.aDate, k.VoerID, MAX(k.LogID) AS LogID
FROM DB.LogStatus k
CROSS JOIN
(
SELECT DATE_ADD('2013-01-01', INTERVAL units.i MONTH) AS aDate -- return the first day of each month of the year
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11) units -- Select units of days
) Sub1
WHERE k.Datum < Sub1.aDate -- This should give up multiple copies of record, one for each date where the d Datum is less that that date
GROUP BY Sub1.aDate, k.VoerID -- GRoup by date and id, so getting the max log id for each date and id
) m
JOIN DB.LogStatus l
ON l.VoerID = m.VoerID AND l.LogID = m.LogID -- Join where log it is the max log id
WHERE status in ('x','y','z')
GROUP BY m.aDate, Status