Getting daily counts for events that don't happen every day - mysql

I have a customer table in which a new row is inserted when a customer signup occurs.
Problem
I want to know the total number of signup per day for a given date range.
For example, find the total number of signup each day from 2015-07-01 to 2015-07-10
customer table
sample data [relevant columns shown]
customerid username created
1 mrbean 2015-06-01
2 tom 2015-07-01
3 jerry 2015-07-01
4 bond 2015-07-02
5 superman 2015-07-10
6 tintin 2015-08-01
7 batman 2015-08-01
8 joker 2015-08-01
Required Output
created signup
2015-07-01 2
2015-07-02 1
2015-07-03 0
2015-07-04 0
2015-07-05 0
2015-07-06 0
2015-07-07 0
2015-07-08 0
2015-07-09 0
2015-07-10 1
Query used
SELECT
DATE(created) AS created, COUNT(1) AS signup
FROM
customer
WHERE
DATE(created) BETWEEN '2015-07-01' AND '2015-07-10'
GROUP BY DATE(created)
ORDER BY DATE(created)
I am getting the following output:
created signup
2015-07-01 2
2015-07-02 1
2015-07-10 1
What modification should I make in the query to get the required output?

You're looking for a way to get all the days listed, even those days that aren't represented in your customer table. This is a notorious pain in the neck in SQL. That's because in its pure form SQL lacks the concept of a contiguous sequence of anything ... cardinal numbers, days, whatever.
So, you need to introduce a table containing a source of contiguous cardinal numbers, or dates, or something, and then LEFT JOIN your existing data to that table.
There are a few ways of doing that. One is to create yourself a calendar table with a row for every day in the present decade or century or whatever, then join to it. (That table won't be very big compared to the capability of a modern database.
Let's say you have that table, and it has a column named date. Then you'd do this.
SELECT calendar.date AS created,
ISNULL(a.customer_count, 0) AS customer_count
FROM calendar
LEFT JOIN (
SELECT COUNT(*) AS customer_count,
DATE(created) AS created
FROM customer
GROUP BY DATE(created)
) a ON calendar.date = a.created
WHERE calendar.date BETWEEN start AND finish
ORDER BY calendar.date
Notice a couple of things. First, the LEFT JOIN from the calendar table to your data set. If you use an ordinary JOIN the missing data in your data set will suppress the rows from the calendar.
Second, the ISNULL in the toplevel SELECT to turn the missing, null, values from your dataset into zero values.
Now, you ask, where can I get that calendar table? I respectfully suggest you look that up, and ask another question if you can't figure it out.
I wrote a little essay on this, which you can find here.http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

Look here
Create teble with calendar and join it in your query.

DECLARE #MinDate DATE = '2015-07-01',
#MaxDate DATE = '2015-07-10';
Create Table tblTempDates
(created date, signup int)
insert into tblTempDates
SELECT TOP (DATEDIFF(DAY, #MinDate, #MaxDate) + 1)
Date = DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY a.object_id) - 1, #MinDate), 0 As Signup
FROM sys.all_objects a
CROSS JOIN sys.all_objects b;
Create Table tblTempQueryDates
(created date, signup int)
INSERT INTO tblTempQueryDates
SELECT
created AS created, COUNT(scandate) AS signup
FROM
customer
WHERE
created BETWEEN #MinDate AND #MaxDate
GROUP BY created
UPDATE tblTempDates
SET tblTempDates.signup = tblTempQueryDates.signup
FROM tblTempDates INNER JOIN
tblTempQueryDates ON tblTempDates.created = tblTempQueryDates.created
select * from tblTempDates
order by created
Drop Table tblTempDates
Drop Table tblTempQueryDates
Not pretty, but it gives you what you want.

Related

Daily consumption delta based on purchase dates

I need to make a (Tableau) daily graph depicting consumption dynamics against previous day grouped by those clients who increased consumption, decreased consumption, and net change overall. Sample is below.
Calculation logic for sample: for every day for every client calculate difference vs previous day for that client, sum those above 0, sum those below 0, sum total.
The sample was made manually from a relatively small data set.
The real table has over 2 mil rows, and is not very consistent in that clients start buying at different days, may skip various periods buying nothing.
Initial table structure is like that:
client_id date sales
1 2018-09-01 4
1 2018-09-02 5
1 2018-09-04 3
2 2018-09-1 2
2 2018-09-2 2
While calculating table difference per date is simple, calculating pure growth and pure churn is hard, because the date row is not continuous for all clients.
I thought of adding the delta_to_previous column to each row when loading the initial dataset from the data storage, like:
WITH orders AS (
SELECT client_id,
date,
SUM(sales) as sales
FROM dwh_orders
GROUP BY client_id, date
)
SELECT
client_id,
date,
sales,
LAG(sales, 1) OVER (
PARTITION BY client_id
ORDER BY date
) as prev_date_order_value,
sales - LAG(sales, 1) OVER (
PARTITION BY client_id
ORDER BY date
) as prev_date_order_delta
FROM
orders;
Then for each date I can just show sum of positive values, negative values, total.
Problem, this approach will show consumption change at the next date of purchase, and if client buys 5 items on March 1 and then 5 on May 1, there will be no change for him at all. What it should do is show -5 for March 2 and +5 for May 1.
I am a bit puzzled at the optimal approach to this. The general solution could also use some review probably.
If someone dealt with a similar problem, I could really use your advice.
If you are experienced with sql, I could use your advice on how to convert the initial dataset (see sample above) into something like
client_id date sales delta
1 2018-09-01 4 0
1 2018-09-02 5 1
1 2018-09-03 0 -5
1 2018-09-04 3 3
2 2018-09-1 2 0
2 2018-09-2 2 0
If you know a bit about Tableau, I could use help on building graphs like this using its tools.
with cdates as (
select client_id, min(date) as dte, max(date) as maxd
from dwh_orders
group by client_id
union all
select client_id, dateadd(day, 1, dte), maxd
from cdates
where dte < maxd
),
cd as (
select client_id, date, sum(sales) as sales
from dwh_orders
group by client_id, date
)
select cdates.client_id, cdates.date,
coalesce(sales, 0) as sales,
(coalesce(sales, 0) -
lag(coalesce(sales, 0)) over (partition by cdates.client_id order by cdates.date
) as delta
from cdates left join
cd
on cdates.client_id = cd.client_id and
cdates.date = cd.date
option (maxrecursion 0);

Combine found results and not found results

I have a table that has the information about a log, to know how many hits there were on the pages of the website.
This is the query that shows me the above:
select pageview_page, DATE(pageview_date) as pageview_date, count(*) as view_count
from pageviews
group by pageviews.pageview_page, DAY(pageviews.pageview_date)
order by pageviews.pageview_date desc
Resulting in the following:
Page Day view_count
index 2016-01-12 50
index 2016-01-11 10
index 2016-01-10 20
contact 2016-01-12 5
contact 2016-01-11 5
PD: using desc on date because the chart must start on the latest date.
Notice: in the above table, contact is not present on day 2016-01-10, meaning no one used that page on that day.
I want to get the query to show 0 if there is nothing on that date, how can I achieve that? The result must be like the following
Page Day view_count
index 2016-01-12 50
index 2016-01-11 10
index 2016-01-10 20
contact 2016-01-12 5
contact 2016-01-11 5
contact 2016-01-10 0 <-------- (I want this to appear, as it is missing in the table above, in the first table)
Lets take the next 3 dates as an example: 2016-01-10, 2016-01-11, 2016-01-12
The point is to view the statistics by day, I use the next to get the dates above:
select DATE(pageview_date) as pageview_date from pageviews GROUP by DAY(pageview_date)
I have tried a combination with IN and NOT IN with the query above but I cant get it working.
I am not sure about DAY function in MySQL but by looking your first query I think you can do something like this -
select T2.pageview_page AS pageview_page,
T1.pageview_date AS pageview_date,
(select count(*)
from pageviews t3
where t3.pageview = t2.pageview
and DAY(t3.pageview_date) = t1.pageview_date) as view_count
from (select distinct DAY(pageview_date) pageview_date FROM pageviews) t1,
(SELECT DISTINCT pageview_page FROM pageviews) t2
group by pageview_page, pageview_date
order by pageview_date desc
Create a table of days (or integers) and do a LEFT JOIN from it to your query:
SELECT ...
FROM AllDays a
LEFT JOIN ( your query ) b ON b.date = a.date
WHERE a.date BETWEEN ...;
Even better would be to use a MariaDB "sequence table".

need to fetch all the students who have not paid for the current month

I have two tables
Students table :
id studentname admissionno
3 test3 3
2 test2 2
1 test 1
2nd table is fee :
id studentid created
1 3 2015-06-06 22:55:34
2 2 2015-05-07 13:32:48
3 1 2015-06-07 17:47:46
I need to fetch the students who haven't paid for the current month,
I'm performing the following query:
SELECT studentname FROM students
WHERE studentname != (select students.studentname from students
JOIN submit_fee
ON (students.id=submit_fee.studentid)
WHERE MONTH(CURDATE()) = MONTH(submit_fee.created)) ;
and I'm getting error:
'#1242 - Subquery returns more than 1 row'
Can you tell me what the correct query is to fetch all the students who haven't paid for the current month?
Use not in, please try query below :
SELECT s.*
FROM students s
WHERE s.id NOT IN ( SELECT sf.studentid FROM studentfees sf WHERE month(sf.created) = EXTRACT(month FROM (NOW())) )
You want to use not exists or a left join for this:
select s.*
from students s
where not exists (select 1
from studentfees sf
where s.id = sf.studentid and
sf.created >= date_sub(curdate(), interval day(curdate) - 1) and
sf.created < date_add(date_sub(curdate(), interval day(curdate) - 1), 1 month)
)
Note the careful construction of the date arithmetic. All the functions are on curdate() rather than on created. This allows MySQL to use an index for the where clause, if one is appropriate. One error in your query is the use of MONTH() without using YEAR(). In general, the two would normally be used together, unless you really want to combine months from different years.
Also, note that paying or not paying for the current month may not really answer the question. What if a student paid for the current month but missed the previous month's payment?

MySQL right outer join query

I have a query regarding a query in MySQL.
I have 2 tables one containing SalesRep details like name, email, etc. I have another table with the sales data which has reportDate, customers served and link to the salesrep via a foreign key. One thing to note is that the reportDate is always a friday.
So the requirement is this: I need to find sales data for a 13 week period for a given list of sales reps - with 0 as customers served if on a particular friday there is no data. The query result is consumed by a Java application which relies on the 13 rows of data per sales rep.
I have created a table with all the Friday dates populated and wrote a outer join like below:
select * from (
select name, customersServed, reportDate
from Sales_Data salesData
join `SALES_REPRESENTATIVE` salesRep on salesRep.`employeeId` = salesData.`employeeId`
where employeeId = 1
) as result
right outer join fridays on fridays.datefield = reportDate
where fridays.datefield between '2014-10-01' and '2014-12-31'
order by datefield
Now my doubts:
Is there any way where i can get the name to be populated for all 13 rows in the above query?
If there are 2 sales reps, I'd like to use a IN clause and expect 26 rows in total - 13 rows per sales person (even if there is no record for that person, I'd still like to see 13 rows of nulls), and 39 for 3 sales reps
Can these be done in MySql and if so, can anyone point me in the right direction?
You must first select your lines (without customersServed) and then make an outer join for the customerServed
something like that:
select records.name, records.datefield, IFNULL(salesRep.customersServed,0)
from (
select employeeId, name, datefield
from `SALES_REPRESENTATIVE`, fridays
where fridays.datefield between '2014-10-01' and '2014-12-31'
and employeeId in (...)
) as records
left outer join `Sales_Data` salesData on (salesData.employeeId = records.employeeId and salesData.reportDate = records.datefield)
order by records.name, records.datefield
You'll have to do 2 level nesting, in your nested query change to outer join for salesrep, so you have atleast 1 record for each rep, then a join with fridays without any condition to have atleast 13 record for each rep, then final right outer join with condition (fridays.datefield = innerfriday.datefield and (reportDate is null or reportDate=innerfriday.datefield))
Very inefficient, try to do it in code except for very small data.

How to get values for every day in a month

Data:
values date
14 1.1.2010
20 1.1.2010
10 2.1.2010
7 4.1.2010
...
sample query about january 2010 should get 31 rows. One for every day. And values vould be added. Right now I could do this with 31 queries but I would like this to work with one. Is it possible?
results:
1. 34
2. 10
3. 0
4. 7
...
This is actually surprisingly difficult to do in SQL. One way to do it is to have a long select statement with UNION ALLs to generate the numbers from 1 to 31. This demonstrates the principle but I stopped at 4 for clarity:
SELECT MonthDate.Date, COALESCE(SUM(`values`), 0) AS Total
FROM (
SELECT 1 AS Date UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
--
SELECT 28 UNION ALL
SELECT 29 UNION ALL
SELECT 30 UNION ALL
SELECT 31) AS MonthDate
LEFT JOIN Table1 AS T1
ON MonthDate.Date = DAY(T1.Date)
AND MONTH(T1.Date) = 1 AND YEAR(T1.Date) = 2010
WHERE MonthDate.Date <= DAY(LAST_DAY('2010-01-01'))
GROUP BY MonthDate.Date
It might be better to use a table to store these values and join with it instead.
Result:
1, 34
2, 10
3, 0
4, 7
Given that for some dates you have no data, you'll need to fill in the gaps. One approach to this is to have a calendar table prefilled with all dates you need, and join against that.
If you want the results to show day numbers as you have showing in your question, you could prepopulate these in your calendar too as labels.
You would join your data table date field to the date field of the calendar table, group by that field, and sum values. You might want to specify limits for the range of dates covered.
So you might have:
CREATE TABLE Calendar (
label varchar,
cal_date date,
primary key ( cal_date )
)
Query:
SELECT
c.label,
SUM( d.values )
FROM
Calendar c
JOIN
Data_table d
ON d.date_field = c.cal_date
WHERE
c.cal_date BETWEEN '2010-01-01' AND '2010-01-31'
GROUP BY
d.date_field
ORDER BY
d.date_field
Update:
I see you have datetimes rather than dates. You could just use the MySQL DATE() function in the join, but that would probably not be optimal. Another approach would be to have start and end times in the Calendar table defining a 'time bucket' for each day.
This works for me... Its a modification of a query I found on another site. The "INTERVAL 1 MONTH" clause ensures I get the current month data, including zeros for days that have no hits. Change this to "INTERVAL 2 MONTH" to get last months data, etc.
I have a table called "payload" with a column "timestamp" - Im then joining the timestamp column on to the dynamically generated dates, casting it so that the dates match in the ON clause.
SELECT `calendarday`,COUNT(P.`timestamp`) AS `cnt` FROM
(SELECT #tmpdate := DATE_ADD(#tmpdate, INTERVAL 1 DAY) `calendarday`
FROM (SELECT #tmpdate :=
LAST_DAY(DATE_SUB(CURDATE(),INTERVAL 1 MONTH)))
AS `dynamic`, `payload`) AS `calendar`
LEFT JOIN `payload` P ON DATE(P.`timestamp`) = `calendarday`
GROUP BY `calendarday`
To dynamically get the dates within a date range using SQL you can do this (example in mysql):
Create a table to hold the numbers 0 through 9.
CREATE TABLE ints ( i tinyint(4) );
insert into ints (i)
values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Run a query like so:
select ((curdate() - interval 2 year) + interval (t.i * 100 + u.i * 10 + v.i) day) AS Date
from
ints t
join ints u
join ints v
having Date between '2015-01-01' and '2015-05-01'
order by t.i, u.i, v.i
This will generate all dates between Jan 1, 2015 and May 1, 2015.
Output
2015-01-01
2015-01-02
2015-01-03
2015-01-04
2015-01-05
2015-01-06
...
2015-05-01
The query joins the table ints 3 times and gets an incrementing number (0 through 999). It then adds this number as a day interval starting from a certain date, in this case a date 2 years ago. Any date range from 2 years ago and 1,000 days ahead can be obtained with the example above.
To generate a query that generates dates for more than 1,000 days simply join the ints table once more to allow for up to 10,000 days of range, and so forth.
If I'm understanding the rather vague question correctly, you want to know the number of records for each date within a month. If that's true, here's how you can do it:
SELECT COUNT(value_column) FROM table WHERE date_column LIKE '2010-01-%' GROUP BY date_column