I have a couple of relatively straight forward tables (examples below).
One has accounts details in:
AccountNo | CurrentBalance* | ReferredBalance
12345 | £1254.25 | 1500.00
Current balance refreshes hourly so is not static
Another one has payments:
Accountno | TranasctionNo | TransDate | Amount |
123456 | 558745489 | 01/01/2015 | £25.99 |
123456 | 558745490 | 01/02/2015 | £25.99 |
123456 | 558745491 | 01/02/2015 | £25.99 |
I’ve been tasked with keep a rolling balance based on payments received including a pre transaction amount.
So for example I need to an output to mirror :
AccountNo | TransactionDate | PreTransactionBalance | Amount | Current Balance|
123456 | 01/01/2015 | 1254.25 | 25.99 | 1228.26|
123456 | 01/02/2015 | 1228.26 | 25.99 | 1202.27|
123456 | 01/03/2015 | 1202.27 | 25.99 | 1176.28|
123456 | 01/03/2015 | 1176.28 | -100 | 1276.28|
I’ve added the negative in as it will need to calculate debits as well as credits.
Can't quiet work out how to get the rolling pre-transaction totals to work. Hopefully this is clear enough!
You can use SUM(amount) OVER() to get rolling amount based on the TransactionNo and then calculate pre transaction and current balance from this value.
Sample Data
DECLARE #Account TABLE
(
AccountNo VARCHAR(10) ,
CurrentBalance MONEY,
ReferredBalance MONEY
)
DECLARE #AccountPayment TABLE
(
Accountno VARCHAR(10),TranasctionNo VARCHAR(10),TransDate DATE,Amount MONEY
)
insert into #Account VALUES
('12345','£1254.25','1500.00');
insert into #AccountPayment values
('12345','558745489','01/01/2015','£25.99'),
('12345','558745490','01/02/2015','£25.99'),
('12345','558745491','01/02/2015','£25.99'),
('12345','558745492','01/02/2015','-100');
Query
SELECT AP.AccountNo,
TransDate,
A.CurrentBalance - SUM(amount) OVER(ORDER BY TranasctionNo) + amount as PreTransactionBalance ,
amount,
A.CurrentBalance - SUM(amount) OVER(ORDER BY TranasctionNo) Current_Balance
FROM #AccountPayment AP
INNER JOIN #Account A
ON AP.Accountno = A.Accountno
Edit
It seems ORDER BY is not supported with in SUM() OVER() in SQL Server 2008 / SQL Server 2008 R2. As per msdn
ORDER BY Clause cannot be used with aggregate window functions.
We can use CROSS APPLY like this.
SELECT AP.AccountNo,
TransDate,
A.CurrentBalance - pre_amount as PreTransactionBalance ,
amount,
A.CurrentBalance - pre_amount - amount Current_Balance
FROM #AccountPayment AP
INNER JOIN #Account A
ON AP.Accountno = A.Accountno
CROSS APPLY
(
select ISNULL(sum(amount),0) as pre_amount
from #AccountPayment ap1
where ap1.Accountno = ap.Accountno and ap1.TranasctionNo < ap.TranasctionNo
) as b
Output
AccountNo TransDate PreTransactionBalance amount Prev_Payments
12345 2015-01-01 1254.25 25.99 1228.26
12345 2015-01-02 1228.26 25.99 1202.27
12345 2015-01-02 1202.27 25.99 1176.28
12345 2015-01-02 1176.28 -100.00 1276.28
SQL Fiddle
Related
Mysql newbie here.
I have a table( name:'audit_webservice_aua' ) like this:
+---------+------------------------------------+-------------------+------------------------+
| auditId | device_code | response_status | request_date
+---------+------------------------------------+-------------------+------------------------+
| 10001 | 0007756-gyy66-4c6e-a59d-xxxccyyyt1 | P | 2020-03-02 00:00:08.785
| 10002 | 0007756-gyy66-4c6e-a59d-xxxccyyyt2 | F | 2020-04-06 00:00:08.785
| 10003 | 0007756-gyy66-4c6e-a59d-xxxccyyyt3 | F | 2020-04-01 00:01:08.785
| 10004 | 0007756-gyy66-4c6e-a59d-xxxccyyyt1 | P | 2020-05-02 00:02:08.785
| 10005 | 0007756-gyy66-4c6e-a59d-xxxccyyyt1 | P | 2020-05-09 00:03:08.785
| 10006 | 0007756-gyy66-4c6e-a59d-xxxccyyyt2 | P | 2020-05-09 01:00:08.785
| 10007 | 0007756-gyy66-4c6e-a59d-xxxccyyyt7 | F | 2020-06-06 02:00:08.785
+---------+------------------------------------+-------------------+------------------------+
Every time a new request is made the above table stores the requesting device_code ,response_status and request time.
I have a requirement of getting the result set which contains the each device_code, total_trans, total_successful, total_failure and date for each day between two given dates.
The query i have written is as follows:
SELECT DATE_FORMAT(aua.request_date,'%b') as month ,
YEAR(aua.request_date) as year,
DATE_FORMAT(aua.request_date,'%Y-%m-%d') as date,
(select count(aua.audit_id) )as total_trans ,
(select count(aua.audit_id) where aua.response_status 'P') as total_failure ,
(select count(aua.audit_id) where aua.response_status = 'P') as total_successful ,
aua.device_code as deviceCode
FROM audit_webservice_aua aua where DATE_FORMAT(aua.request_date,'%Y-%m-%d') between '2020-04-16' and '2020-07-17'
group by dates,deviceCode ;
In the above code im tring to get results between '2020-03-02' and '2020-06-06' but the count im getting is not correct.
Any help would be appreciated.
Thank you in advance.
I think you just want conditional aggregation:
SELECT DATE_FORMAT(aua.request_date,'%b') as month ,
YEAR(aua.request_date) as year,
DATE_FORMAT(aua.request_date, '%Y-%m-%d') as date,
COUNT(aua.audit_id) as total_trans ,
SUM(aua.response_status <> 'P') as total_failure,
SUM(aua.response_status = 'P') as total_successful,
aua.device_code as deviceCode
FROM audit_webservice_aua aua
WHERE DATE_FORMAT(aua.request_date, '%Y-%m-%d') between '2020-04-16' and '2020-07-17'
GROUP BY month, year, date, deviceCode ;
I would also advise you to change the WHERE clause to:
WHERE aua.request_date >= '2020-04-16' AND
aua.request_date >= '2020-07-18'
I am using MySQL to make some data analysis on subscribers and I would like to sort out daily active subscribers since the service launch.
i have a subscription table like below
id | subscriptiondate | unsubscriptiondate
---|------------------|--------------------
1 | 2020-02-12 | null
---|------------------|--------------------
2 | 2020-03-20 | 2020-04-01
---|------------------|--------------------
3 | 2020-03-10 | null
---|------------------|--------------------
4 |2020-04-02 | null
and i expect a result like:
date | active_user
-----------|---------------------------
2020-02-12 | 1
-----------|------------------
2020-03-10 | 2
-----------|------------------
2020-03-20 | 3
-----------|------------------
2020-04-02 | 3
A subscriber opted out the 2020-04-01, that is why we have 3 active subscribers the 2020-04-02.
here is my SQL script, someone could check and assist me to achieve my goal?
SELECT
COUNT(distinct is) AS active_user,
date(subscriptiondate) as day
FROM
subscriptions
WHERE
subscriptiondate in (select subscriptiondate from subscriptions where subscriptiondate <=date(subscriptiondate))
AND (unsubscriptiondate is NULL or unsubscriptiondate>date(subscriptiondate))
GROUP BY
day
ORDER BY day ASC*
You can "unpivot" the table and aggregate with a cumulative sum:
select date, sum(inc) as change_on_date,
sum(sum(inc)) over (order by date) as active_on_day
from ((select subscriptiondate as date, 1 as inc from subscriptions
) union all
(select unsubscriptiondate, -1 from subscriptions
)
) s
group by date;
I have two tables (Invoices and taxes) in mysql:
Invoices:
- id
- account_id
- issued_at
- total
- gross_amount
- country
Taxes:
- id
- invoice_id
- tax_name
- tax_rate
- taxable_amount
- tax_amount
I'm trying to retrive a report like this
rep_month | country | total_amount | tax_name | tax_rate(%) | taxable_amount | tax_amount
--------------------------------------------------------------------------------------
2017-01-01 | ES | 1000 | TAX1 | 21 | 700 | 147
2017-01-01 | ES | 1000 | TAX2 | -15 | 700 | 105
2016-12-01 | FR | 100 | TAX4 | 20 | 30 | 6
2016-12-01 | FR | 100 | B2B | 0 | 70 | 0
2017-01-01 | GB | 2500 | TAX3 | 20 | 1000 | 200
The idea behind this is that an invoice has a has_many relation with taxes. So an invoice can have or not taxes. The report should show the total amount collected (total_amount) for a given country (regardess if it includes taxes)
and indicate which part of that total amount is taxable (taxable_amount) for an specific tax.
My current approach is this one:
SELECT
DATE_FORMAT(invoices.issued_at, '%Y-%m-01') AS rep_month,
invoices.country AS country
( SELECT sum(docs.gross_amount)
FROM invoices AS docs
WHERE docs.country = invoices.country
AND DATE_FORMAT(docs.issue_date, '%Y-%m-01') = rep_month
) AS total_amount,
taxes.tax_name AS tax_name,
taxes.tax_rate AS tax_rate,
SUM(taxes.taxable_amount) AS taxable_amount,
SUM(taxes.tax_amount) AS tax_amount
FROM invoices
JOIN taxes ON invoices.id = taxes.document_id
AND documents.issue_date BETWEEN '2016-01-01' AND '2017-12-31'
GROUP BY account_id, rep_month, country, tax_name, tax_rate
ORDER BY country desc
Well, this works but for a real dataset (thousands of records) it's really slow as the select subquery for retrieving the total_amount is being run for each row of the report.
I cannot make a LEFT JOIN taxes with a direct SUM(gross_amount) as the GROUP BY groups by tax name and rate and I need to show the total collected per country regardless if the amount was taxed or not. Is there a faster alternative to this?
I do not know the exact use case of using this query but the issue is the way with which you're trying to structure the DB, you're trying to get the entire data in one go.
Ideally, you should run the query you have and store in a different table (summary table) and then query directly from the summary table whenever you want. And if you have a new entry in the Invoices table then you can use the query to run either on every entry or periodically update the summary table via a cronjob.
I would like to find out the average number of days between orders grouping by account_id in the database.
Let's say I have the following table named 'orders' with this data.
id account_id account_name order_date
1 555 Acme Fireworks 2015-06-15
2 342 Kent Brewery 2015-09-12
3 555 Acme Fireworks 2015-09-15
4 342 Kent Brewery 2015-10-12
5 342 Kent Brewery 2015-11-12
6 342 Kent Brewery 2015-12-12
7 555 Acme Fireworks 2015-12-15
8 900 Plastic Inc. 2015-12-20
I would like a query to produce the following results
account_id account_name average_days_between_orders
342 Kent Brewery 30.333
555 Acme Fireworks 91.5
900 Plastic Inc. (unsure of what value would go here since there's 1 order only)
I checked the following questions to get an idea, but still couldn't figure out the problem:
Average difference between two dates, grouped by a third field?
Thanks!
You need a query that produces the difference between the previous purchase for a given (null if there is no previous purchase) and take the average of these values.
I would self-join the above table to get for each order the maximum order date of any previous order in a subquery. In the avg() function calculate the difference between the calculated date and the current order date:
SELECT o3.account_id, o3.account_name, avg(diff) as average_days_between_orders
FROM
(select o1.id,
o1.account_id,
o1.account_name,
datediff(o1.order_date, max(o2.order_date)) as diff
from orders o1
left join orders o2 on o1.account_id=o2.account_id and o1.id>o2.id
group by o1.id, o1.account_id, o1.account_name, o1.order_date) o3
GROUP BY o3.account_id, o3.account_name
As an alternative to joins, you can use a user defined variable in the subquery or a correlated subquery in the select list to calculate the differences. You can check mysql running total solutions to get a hang of this solution, such as this SO topic. Specifically, check out the solution provided by Andomar.
If your orders table is huge, then the alternative aprroaches described in that topic may be better from a performance point of view.
Note: Please test it carefully and use it as you wish. I couldn't find an easy query for it. I don't guarantee to work for all cases :) If you just want the answer, the complete query is shown in the end.
The goal is that I'll try to get a table with start and end dates in one row, and then I'll simply calculate average difference between two dates. Something like this.
id | account_id | account_name | start_date | end_date
------------------------------------------------------------
1 | 342 | Kent Brewery | 2015-09-12 | 2015-10-12
2 | 342 | Kent Brewery | 2015-10-12 | 2015-11-12
3 | 342 | Kent Brewery | 2015-11-12 | 2015-12-12
4 | 555 | Acme Fireworks | 2015-06-15 | 2015-09-15
5 | 555 | Acme Fireworks | 2015-09-15 | 2015-12-15
I'll create few temporary tables to make it a bit more clear. First query for start_date:
QUERY:
create temporary table uniq_start_dates
select (#sid := #sid + 1) id, tmp_uniq_start_dates.*
from
(select distinct o1.account_id, o1.account_name, o1.order_date start_date
from orders o1
join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date
order by o1.account_id, o1.order_date) tmp_uniq_start_dates
join (select #sid := 0) AS sid_generator
OUTPUT: temporary table - uniq_start_dates
id | account_id | account_name | start_date
-----------------------------------------------
1 | 342 | Kent Brewery | 2015-09-12
2 | 342 | Kent Brewery | 2015-10-12
3 | 342 | Kent Brewery | 2015-11-12
4 | 555 | Acme Fireworks | 2015-06-15
5 | 555 | Acme Fireworks | 2015-09-15
Do the same thing for end_date:
QUERY:
create temporary table uniq_end_dates
select (#eid := #eid + 1) id, tmp_uniq_end_dates.*
from
(select distinct o2.account_id, o2.account_name, o2.order_date end_date
from orders o1
join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date
order by o2.account_id, o2.order_date) tmp_uniq_end_dates
join (select #eid := 0) AS eid_generator
OUTPUT: temporary table - uniq_end_dates
id | account_id | account_name | end_date
-----------------------------------------------
1 | 342 | Kent Brewery | 2015-10-12
2 | 342 | Kent Brewery | 2015-11-12
3 | 342 | Kent Brewery | 2015-12-12
4 | 555 | Acme Fireworks | 2015-09-15
5 | 555 | Acme Fireworks | 2015-12-15
If you notice, I created new auto id for each view so that I can join them back to one table (like the very first table). Let's join uniq_start_dates and uniq_end_dates.
QUERY:
create temporary table uniq_start_end_dates
select uniq_start_dates.*, uniq_end_dates.end_date
from uniq_start_dates
join uniq_end_dates using (id)
OUTPUT: temporary table - uniq_start_end_dates
(the same one as the first table)
Now it's an easy part. Just aggregate and get average date time difference.
QUERY:
select account_id, account_name, avg(timestampdiff(day, start_date, end_date)) average_days
from uniq_start_end_dates
group by account_id, account_name
OUTPUT:
account_id | account_name | average_days
--------------------------------------------
342 | Kent Brewery | 30.3333
555 | Acme Fireworks | 91.5000
If you may notice, Plastic Inc. is not in the result. If you care about "null" average_days. Here it is:
QUERY:
select all_accounts.account_id, all_accounts.account_name, accounts_with_average_days.average_days
from
(select distinct account_id, account_name from orders) all_accounts
left join
(select account_id, account_name, avg(timestampdiff(day, start_date, end_date)) average_days
from uniq_start_end_dates
group by account_id, account_name) accounts_with_average_days
using (account_id, account_name)
OUTPUT:
account_id | account_name | average_days
--------------------------------------------
342 | Kent Brewery | 30.3333
555 | Acme Fireworks | 91.5000
900 | Plastic Inc. | null
Here is a complete messy query:
select all_accounts.account_id, all_accounts.account_name, accounts_with_average_days.average_days
from
(select distinct account_id, account_name from orders) all_accounts
left join
(select uniq_start_dates.account_id, uniq_start_dates.account_name, avg(timestampdiff(day, start_date, end_date)) average_days
from
(select (#sid := #sid + 1) id, tmp_uniq_start_dates.*
from
(select distinct o1.account_id, o1.account_name, o1.order_date start_date from orders o1
join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date order by o1.account_id, o1.order_date) tmp_uniq_start_dates join (select #sid := 0) AS sid_generator
) uniq_start_dates
join
(select (#eid := #eid + 1) id, tmp_uniq_end_dates.*
from
(select distinct o2.account_id, o2.account_name, o2.order_date end_date from orders o1
join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date order by o2.account_id, o2.order_date) tmp_uniq_end_dates join (select #eid := 0) AS eid_generator
) uniq_end_dates
using (id)
group by uniq_start_dates.account_id, uniq_start_dates.account_name) accounts_with_average_days
using (account_id, account_name)
I have two MySQL tables
Payments:
| employeeID | period_begin | period_end |
and
Services:
| serviceID | date | employeeID |
I need to find all serviceIDs performed by a given employee whose date is not between any of the period ranges described in Payments. So, for example if I have the following records on Payments and Services for employee 10000:
Payments:
| employeeID | period_begin | period_end |
...
| 10000 | 2013-05-01 | 2013-05-16 |
| 10000 | 2013-05-17 | 2013-06-02 |
| 10000 | 2013-07-01 | 2013-07-16 |
| 10000 | 2013-07-17 | 2013-08-02 |
...
Services:
| serviceID | date | employeeID |
...
| 2001 | 2013-01-01 | 10000 |
| 2002 | 2013-05-15 | 10000 |
| 2003 | 2013-06-01 | 10000 |
| 2004 | 2013-07-10 | 10000 |
| 2005 | 2013-08-01 | 10000 |
...
The output should be
2001,
2003,
2005
because the dates for services 2002, 2004 are in one of the intervals in the Payments table.
Any ideas? I'm having trouble checking that a service's date is not accounted for one of the intervals recorded on the Payments table. I'm currently joining Services and Payments on employeeID and stating the date condition there, but I'm not getting the right answer; I should probably be joining on a different condition:
select distinct serviceID from Services as X left join Payments as Y on
(X.employeeID=Y.employeeID AND (X.date < Y.period_begin OR X.date > Y.period_end))
where X.employeeID='10000';
is not working.
... AND X.date < Y.period_begin and X.date > Y.period_end
Is obviously impossible (the date cannot be before the start date and after the end date...)
You probably want to write:
... AND (X.date < Y.period_begin or X.date > Y.period_end)
Please wrap the "OR" expression is parenthesis. I think this is important regarding operator precedence and it improves readability.
EDIT: As suggested by #BigToach in a comment, you could use NOT BETWEEN ... AND ... (if the AND word is not too confusing in that context):
... AND (X.date NOT BETWEEN Y.period_begin AND Y.period_end)
Concerning the main problem which is "select the services not in any payment range" you could use a LEFT JOIN to keep only the row that does not have a corresponding payment range:
SELECT Services.* FROM Services
LEFT JOIN
(SELECT serviceID,period_begin FROM Services
JOIN Payments
USING (employeeID)
WHERE employeeID = #EID
AND the_date BETWEEN Payments.period_begin AND Payments.period_end
) AS X
USING (serviceID)
WHERE employeeID = #EID
AND period_begin is NULL;
Or, use a subquery -- somewhat more readable, but usually less efficient:
SELECT Services.* FROM Services
WHERE employeeID = #EID
AND serviceID NOT IN (SELECT serviceID FROM Services JOIN Payments
USING (employeeID)
WHERE employeeID = #EID
AND the_date BETWEEN Payments.period_begin AND Payments.period_end);
See http://sqlfiddle.com/#!2/a3557/11
As indicated by BigToach and Sylvain Leroux, there was clearly a logic-typo confusing an AND with an OR. However, once fixed, that query still doesn't give the right answer. I managed to get the right solution by first finding all serviceIDs for contained in some interval, and then excluding those from the list of all serviceIDs:
select distinct serviceID from Services as X left join Payments as Y on
(X.employeeID=Y.employeeID) where X.employeeID='10000'
and X.serviceID not in (
select serviceID from Services as Z join Payments as W on
(Z.employeeID=W.employeeID and
(Z.date between W.period_begin and W.period_end))
where Z.employeeID='10000'
);
This query, however, is not very pretty as I'm really doing two queries, but it's basically the same thing than Sylvain Leroux first answer, perhaps a bit more human-readable. Maybe there is yet another way we all haven't seen yet. Sylvain's subquery is indeed very nice.