Daily consumption delta based on purchase dates - mysql

I need to make a (Tableau) daily graph depicting consumption dynamics against previous day grouped by those clients who increased consumption, decreased consumption, and net change overall. Sample is below.
Calculation logic for sample: for every day for every client calculate difference vs previous day for that client, sum those above 0, sum those below 0, sum total.
The sample was made manually from a relatively small data set.
The real table has over 2 mil rows, and is not very consistent in that clients start buying at different days, may skip various periods buying nothing.
Initial table structure is like that:
client_id date sales
1 2018-09-01 4
1 2018-09-02 5
1 2018-09-04 3
2 2018-09-1 2
2 2018-09-2 2
While calculating table difference per date is simple, calculating pure growth and pure churn is hard, because the date row is not continuous for all clients.
I thought of adding the delta_to_previous column to each row when loading the initial dataset from the data storage, like:
WITH orders AS (
SELECT client_id,
date,
SUM(sales) as sales
FROM dwh_orders
GROUP BY client_id, date
)
SELECT
client_id,
date,
sales,
LAG(sales, 1) OVER (
PARTITION BY client_id
ORDER BY date
) as prev_date_order_value,
sales - LAG(sales, 1) OVER (
PARTITION BY client_id
ORDER BY date
) as prev_date_order_delta
FROM
orders;
Then for each date I can just show sum of positive values, negative values, total.
Problem, this approach will show consumption change at the next date of purchase, and if client buys 5 items on March 1 and then 5 on May 1, there will be no change for him at all. What it should do is show -5 for March 2 and +5 for May 1.
I am a bit puzzled at the optimal approach to this. The general solution could also use some review probably.
If someone dealt with a similar problem, I could really use your advice.
If you are experienced with sql, I could use your advice on how to convert the initial dataset (see sample above) into something like
client_id date sales delta
1 2018-09-01 4 0
1 2018-09-02 5 1
1 2018-09-03 0 -5
1 2018-09-04 3 3
2 2018-09-1 2 0
2 2018-09-2 2 0
If you know a bit about Tableau, I could use help on building graphs like this using its tools.

with cdates as (
select client_id, min(date) as dte, max(date) as maxd
from dwh_orders
group by client_id
union all
select client_id, dateadd(day, 1, dte), maxd
from cdates
where dte < maxd
),
cd as (
select client_id, date, sum(sales) as sales
from dwh_orders
group by client_id, date
)
select cdates.client_id, cdates.date,
coalesce(sales, 0) as sales,
(coalesce(sales, 0) -
lag(coalesce(sales, 0)) over (partition by cdates.client_id order by cdates.date
) as delta
from cdates left join
cd
on cdates.client_id = cd.client_id and
cdates.date = cd.date
option (maxrecursion 0);

Related

Adding multiple columns' rows together

I'm trying to arrange data in a table. The table has the following columns:
Customer Name, Amount, Day. The customer names are not distinct, the amount is an amount represented by dollars and the Day is over the course of 365 days.
I'm trying to arrange the amount paid per quarter, regardless of the customer name.
This is a homework assignment and I've tried this code
SELECT day as 'Quarter', SUM(amount) as 'Total Earnings'
FROM invoices
WHERE day BETWEEN 0 and 90
GROUP BY day
I'm running into 3 problems. I did the above code just to test that it would work for one quarter before i tried to tackle the whole year.
The first problem is that I need the day 'value' to be 'First' and I'm not sure how to do that at all.
Secondly, it is totaling the amounts, but not 0-90, it's totaling 1, 2, 3... 89, 90. Rather than a single row with the total 'amounts' for days 0-90.
Lastly, I'm not sure how to do another sum for the other quarters (91-180, 181-270, 271-365). I'm assuming possibly subqueries, but I'm not sure how to do that while using WHERE/BETWEEN.
My output should be something like:
Quarter | Total Earnings
-------------------------
First | 111111111
Second | 111111111
Third | 111111111
Fourth | 111111111
SELECT 'first' AS quarter, SUM(amount) AS total_earnings
FROM invoices where day between 0 AND 90
UNION ALL
SELECT 'second' AS quarter, SUM(amount) AS total_earnings
FROM invoices where day between 91 AND 180
UNION ALL
SELECT 'third' AS quarter, SUM(amount) AS total_earnings
FROM invoices where day between 181 AND 270
UNION ALL
SELECT 'fourth' AS quarter, SUM(amount) AS total_earnings
FROM invoices where day >= 271
This will get you the expected results. The group by you were using will try to group based on day unlike on quater
You could use a CASE to find what quarter a day is in and then group by that. Something like this:
SELECT `quarter` AS 'Quarter',
SUM(amount) AS 'Total Earnings'
FROM (
SELECT CASE WHEN DAY < (365/4)
THEN 'First'
WHEN t.`day` < (365/4)*2
THEN 'Second'
WHEN t.`day` < (365/4)*3
THEN 'Third'
ELSE 'Fourth'
END AS `quarter`,
t.*
FROM `table` t
) t2
GROUP BY `quarter`;
You could of course replace the 365/ whatever with just a number of days or set a variable for the number of days in a year like SET #days_in_year = 365;. I'm just manually calculating to give a quick explanation of what the number is.
With a CASE statement you can evaluate the Quarter and then you can group by Quarter:
SELECT
case
when day BETWEEN 0 and 90 then 'First'
when day BETWEEN 91 and 180 then 'Second'
when day BETWEEN 181 and 270 then 'Third'
else 'Fourth'
end Quarter,
SUM(amount) as `Total Earnings`
FROM invoices
GROUP BY Quarter
Change the day ranges as you like.

Getting daily counts for events that don't happen every day

I have a customer table in which a new row is inserted when a customer signup occurs.
Problem
I want to know the total number of signup per day for a given date range.
For example, find the total number of signup each day from 2015-07-01 to 2015-07-10
customer table
sample data [relevant columns shown]
customerid username created
1 mrbean 2015-06-01
2 tom 2015-07-01
3 jerry 2015-07-01
4 bond 2015-07-02
5 superman 2015-07-10
6 tintin 2015-08-01
7 batman 2015-08-01
8 joker 2015-08-01
Required Output
created signup
2015-07-01 2
2015-07-02 1
2015-07-03 0
2015-07-04 0
2015-07-05 0
2015-07-06 0
2015-07-07 0
2015-07-08 0
2015-07-09 0
2015-07-10 1
Query used
SELECT
DATE(created) AS created, COUNT(1) AS signup
FROM
customer
WHERE
DATE(created) BETWEEN '2015-07-01' AND '2015-07-10'
GROUP BY DATE(created)
ORDER BY DATE(created)
I am getting the following output:
created signup
2015-07-01 2
2015-07-02 1
2015-07-10 1
What modification should I make in the query to get the required output?
You're looking for a way to get all the days listed, even those days that aren't represented in your customer table. This is a notorious pain in the neck in SQL. That's because in its pure form SQL lacks the concept of a contiguous sequence of anything ... cardinal numbers, days, whatever.
So, you need to introduce a table containing a source of contiguous cardinal numbers, or dates, or something, and then LEFT JOIN your existing data to that table.
There are a few ways of doing that. One is to create yourself a calendar table with a row for every day in the present decade or century or whatever, then join to it. (That table won't be very big compared to the capability of a modern database.
Let's say you have that table, and it has a column named date. Then you'd do this.
SELECT calendar.date AS created,
ISNULL(a.customer_count, 0) AS customer_count
FROM calendar
LEFT JOIN (
SELECT COUNT(*) AS customer_count,
DATE(created) AS created
FROM customer
GROUP BY DATE(created)
) a ON calendar.date = a.created
WHERE calendar.date BETWEEN start AND finish
ORDER BY calendar.date
Notice a couple of things. First, the LEFT JOIN from the calendar table to your data set. If you use an ordinary JOIN the missing data in your data set will suppress the rows from the calendar.
Second, the ISNULL in the toplevel SELECT to turn the missing, null, values from your dataset into zero values.
Now, you ask, where can I get that calendar table? I respectfully suggest you look that up, and ask another question if you can't figure it out.
I wrote a little essay on this, which you can find here.http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/
Look here
Create teble with calendar and join it in your query.
DECLARE #MinDate DATE = '2015-07-01',
#MaxDate DATE = '2015-07-10';
Create Table tblTempDates
(created date, signup int)
insert into tblTempDates
SELECT TOP (DATEDIFF(DAY, #MinDate, #MaxDate) + 1)
Date = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY a.object_id) - 1, #MinDate), 0 As Signup
FROM sys.all_objects a
CROSS JOIN sys.all_objects b;
Create Table tblTempQueryDates
(created date, signup int)
INSERT INTO tblTempQueryDates
SELECT
created AS created, COUNT(scandate) AS signup
FROM
customer
WHERE
created BETWEEN #MinDate AND #MaxDate
GROUP BY created
UPDATE tblTempDates
SET tblTempDates.signup = tblTempQueryDates.signup
FROM tblTempDates INNER JOIN
tblTempQueryDates ON tblTempDates.created = tblTempQueryDates.created
select * from tblTempDates
order by created
Drop Table tblTempDates
Drop Table tblTempQueryDates
Not pretty, but it gives you what you want.

SQL querying - counting from two tables by weekday

I have the two following (MySQL) tables called "Jobs" and "Employees_Jobs:
Jobs:
job_id job_creation_date
1 2016-01-01
2 2016-01-02
Employees_Jobs (job applications):
EJ_job_id EJ_creation_date
1 2016-01-02
2 2016-01-02
2 2016-01-03
I want MySQL returning the number of jobs created, and the number of job applications created per day of the week; taking the above data it should return:
weekday num_of_jobs_entered num_of_applications_entered
Friday 1 0
Saturday 1 2 // corrected from 1
Sunday 0 1 // 2
I now have the following query:
SELECT
DAYNAME(job_creation_date) as weekday,
(SELECT COUNT(*) FROM Jobs) as num_of_jobs_entered,
(SELECT COUNT(*) FROM Employees_Jobs) as num_of_applications_entered
FROM
dual
GROUP BY
weekday
ORDER BY
weekday;
What am I doing wrong?
Thanks!
Try this:
SELECT wd, COUNT(jcnt), COUNT(ecnt) FROM
(SELECT DAYNAME(job_creation_date) wd, 1 jcnt, null ecnt FROM Jobs UNION ALL
SELECT DAYNAME(EJ_creation_date), null, 1 FROM Employees_Jobs ) a
GROUP BY wd
See here for a working example.
EDIT
If I understand you correctly you want to divide the above calculated counts of job offers and requirements by the current week number of the current year (which will make sense of course only if all considered job counts have also occurred in the current year -> some filtering might become necessary).
However, whithout the filtering you could do
SELECT week_day, Week(CURDATE()) weekCurrdate,
COUNT(num_of_jobs_entered)/Week(CURDATE()) avg_of_jobs_entered,
COUNT(num_of_applications_entered)/Week(CURDATE()) avg_num_of_applications_entered
FROM (
SELECT DAYNAME(job_creation_date) week_day, 1 num_of_jobs_entered,
null num_of_applications_entered FROM Jobs UNION ALL
SELECT DAYNAME(EJ_creation_date), null, 1 FROM Employees_Jobs ) A
GROUP BY week_day;
The IFNULL() function is obsolete but you will have to use COUNT() instead in the outer select. Since the current week number is 1 at the moment the result from this query will (presently!) be identical to the one of the previous query, see modified fiddle here.

How to aggregate (sum) values over month in jasperreports (each month should be the sum of all month before)

I want to do a report with jasperreports that aggregates our contracts over the month but adding all new and old contracts to the month. The database is a mysql database.
My SELECT would look like this with example data below:
SELECT month(contract_date), amount
FROM contracts
WHERE year(contract_date)=2013
GROUP BY month(contract_date)
1.1.2013 300
1.1.2013 500
1.2.2013 250
1.3.2013 250
Now i get:
1 800
2 250
3 250
...
But i would like to have:
1 800
2 1050
3 1300
...
So each month contains the amount of all month before.
I dont mind if i can do this in SQL or with jasperreports/iReport, so any solution is welcome.
Is there any way i can do this?
MySQL doesn't have CTEs which is inconvenient, but a view will do in a pinch.
create view MonthlyTotals as
select Month( ContractDate ) as ContractMonth, Sum( ContractQty ) as TotalQty
from contracts
group by ContractMonth;
Now we can join the view with itself, maintaining a running total of the month and all previous months:
select t1.ContractMonth, t1.TotalQty, Sum( t2.TotalQty ) as RunningTotal
from MonthlyTotals t1
join MonthlyTotals t2
on t2.ContractMonth <= t1.ContractMonth
group by t1.ContractMonth;
The output matches your desired output, as seen at SQL Fiddle.

Calculate salary of tutor based on distinct sittings using mysql

I have the following table denoting a tutor teaching pupils in small groups. Each pupil has an entry into the database. A pupil may be alone or in a group. I wish to calculate the tutors "salary" as such: payment is based on time spent - this means that for each sitting (with one or more pupils) only one sitting will be calculated - distinct sittings! The start and end times are unix times.
<pre>
start end attendance
1359882000 1359882090 1
1359867600 1359867690 0
1359867600 1359867690 1
1359867600 1359867690 0
1360472400 1360477800 1
1360472400 1360477800 1
1359867600 1359867690 1
1359914400 1359919800 1
1360000800 1360006200 1
1360000800 1360006200 0
1360000800 1360006200 1
</pre>
This is what I tried: with no success - I can't get the right duration (number of hours for all distinct sittings)
SELECT YEAR(FROM_UNIXTIME(start)) AS year,
MONTHNAME(STR_TO_DATE(MONTH(FROM_UNIXTIME(start)), '%m')) AS month,
COUNT(DISTINCT start) AS sittings,
SUM(TRUNCATE((end-start)/3600, 1)) as duration
FROM schedules
GROUP BY
YEAR(FROM_UNIXTIME(start)),
MONTH(FROM_UNIXTIME(start))
Thanks for your proposals / support!
EDIT: Required results
Rate = 25
Year Month Sittings Duration Bounty
2013 February 2 2.2 2.2*25
2013 April 4 12.0 12.0*25
You could probably do something with subqueries, I've had a play with SQL fiddle, how does this look for you. Link to sql fiddle : http://sqlfiddle.com/#!2/50718c/3
SELECT
YEAR(d.date) AS year,
MONTH(d.date) AS month,
COUNT(*) AS sittings,
SUM(d.duration) AS duration_mins
FROM (
SELECT
DATE(FROM_UNIXTIME(s.start)) AS date,
s.attendance,
end-start AS duration
FROM schedules s
) d
GROUP BY
year,
month
I couldn't really see where attendance comes into this at present, you didn't specify. The inner query is responsible for taking the schedules, extracting a start date, and a duration (in seconds).
The outer query then uses these derived values but groups them up to get the sums. You could elaborate from here i.e. maybe you only want to select where attendance > 0, or maybe you want to multiply by attendance.
In this next example I have done this, calculating the duration in hours instead, and calculating the applicable duration for where sessions have >1 attendance along with the appropriate bounty assuming bounty == hours * rate : http://sqlfiddle.com/#!2/50718c/21
SELECT
YEAR(d.date) AS year,
MONTH(d.date) AS month,
COUNT(*) AS sittings,
SUM(d.duration) AS duration,
SUM(
IF(d.attendance>0,1,0)
) AS sittingsWorthBounty,
SUM(
IF(d.attendance>0,d.duration,0)
) AS durationForBounty,
SUM(
IF(d.attendance>0,d.bounty,0)
) AS bounty
FROM (
SELECT
DATE(FROM_UNIXTIME(s.start)) AS date,
s.attendance,
(end-start)/3600 AS duration,
(end-start)/3600 * #rate AS bounty
FROM schedules s,
(SELECT #rate := 25) v
) d
GROUP BY
year,
month
The key point here, is that in the subquery you do all the calculation per-row. The main query then is responsible for grouping up the results and getting your totals. The IF statements in the outer query could easily be moved into the subquery instead, for example. I just included them like this so you could see where the values came from.