MySQL - Request to count periodically with accumulation - mysql

Sorry if this is a duplicate but I never found an answer to this.
I have a User table which is as follows :
| id | pseudo | inscription date |
|----|-------------|------------------|
| 1 | johndoe | 01/01/1970 |
| 2 | janeyes | 02/01/1970 |
| 3 | thirdpseudo | 05/01/1970 |
And I am searching for a query to do statistics of accumulation. I would like to retrieve, day by day, the number of users registered.
I made a query that retrieves only for the registering days, but I don't find how to accumulate every days...
SELECT DATE_FORMAT(date, "%d/%m/%Y") AS 'Day', COUNT(*) AS 'Number of registered users'
FROM User
GROUP BY DATE(date)
ORDER BY date DESC;
This query outputs :
| date | number of registered users |
| ---------- | -------------------------- |
| 01/01/1970 | 1 |
| 02/01/1970 | 1 |
| 05/01/1970 | 1 |
The output I would like for this example is :
| date | number of registered users |
| ---------- | -------------------------- |
| 01/01/1970 | 1 |
| 02/01/1970 | 2 |
| 03/01/1970 | 2 |
| 04/01/1970 | 2 |
| 05/01/1970 | 3 |
| 06/01/1970 | 3 |

I would suggest to generate some dates data defined as range of dates. Then join all users available to these dates and count how many users were registered during such days.
Here is the code:
-- creating simple table
create table Users
(
id int not null,
pseudo varchar(15),
date date
);
-- adding some data
insert into Users
values
(1,'jonh','1970-01-01'),
(2,'doe','1970-01-02'),
(3,'janeyes','1970-01-02'),
(4,'third','1970-01-03'),
(5,'pseudo','1970-01-03'),
(6,'title','1970-01-04'),
(7,'somename','1970-01-04'),
(8,'anothername','1970-01-04');
-- defines the start date and the end date
set #startDate = '1970-01-01';
set #endDate = '1970-02-01';
-- recursively geneterates all dates within the range
with RECURSIVE dateRange (Date) as
(
select #startDate as Date
union ALL
select DATE_ADD(Date, INTERVAL 1 DAY)
from dateRange
where Date < #endDate
)
-- using SUM() over () would result in running total starting
-- from 1, it would count next day + all previous days
select Date, Sum(RegisteredUsersCount) over(order by RegisteredUsersCount asc
rows between unbounded preceding and current row) as RegisteredUsersCount
from
(
-- left join will join all users, if there is no users that correspond to the date of join, then it would be 0 for that date.
select dr.Date, Count(u.id) as RegisteredUsersCount
from dateRange as dr
left join Users as u
on dr.Date = u.date
group by dr.Date
) as t
order by Date asc;
And working example to test: SQLize Online

Related

SQL Query with all data from lest column and fill blank with previous row value

After searching a lot on this forum and the web, i have an issue that i cannot solve without your help.
The requirement look simple but not the code :-(
Basically i need to make a report on cumulative sales by product by week.
I have a table with the calendar (including all the weeks) and a view which gives me all the cumulative values by product and sorted by week. What i need the query to do is to give me all the weeks for each products and then add in a column the cumulative values from the view. if this value does not exist, then it should give me the last know record.
Can you help?
Thanks,
The principal is establish all the weeks that a product could have had sales , sum grouping by week, add the missing weeks and use the sum over window function to get a cumulative sum
DROP TABLE IF EXISTS T;
CREATE TABLE T
(PROD INT, DT DATE, AMOUNT INT);
INSERT INTO T VALUES
(1,'2022-01-01', 10),(1,'2022-01-01', 10),(1,'2022-01-20', 10),
(2,'2022-01-10', 10);
WITH CTE AS
(SELECT MIN(YEARWEEK(DT)) MINYW, MAX(YEARWEEK(DT)) MAXYW FROM T),
CTE1 AS
(SELECT DISTINCT YEARWEEK(DTE) YW ,PROD
FROM DATES
JOIN CTE ON YEARWEEK(DTE) BETWEEN MINYW AND MAXYW
CROSS JOIN (SELECT DISTINCT PROD FROM T) C
)
SELECT CTE1.YW,CTE1.PROD
,SUMAMT,
SUM(SUMAMT) OVER(PARTITION BY CTE1.PROD ORDER BY CTE1.YW) CUMSUM
FROM CTE1
LEFT JOIN
(SELECT YEARWEEK(DT) YW,PROD ,SUM(AMOUNT) SUMAMT
FROM T
GROUP BY YEARWEEK(DT),PROD
) S ON S.PROD = CTE1.PROD AND S.YW = CTE1.YW
ORDER BY CTE1.PROD,CTE1.YW
;
+--------+------+--------+--------+
| YW | PROD | SUMAMT | CUMSUM |
+--------+------+--------+--------+
| 202152 | 1 | 20 | 20 |
| 202201 | 1 | NULL | 20 |
| 202202 | 1 | NULL | 20 |
| 202203 | 1 | 10 | 30 |
| 202152 | 2 | NULL | NULL |
| 202201 | 2 | NULL | NULL |
| 202202 | 2 | 10 | 10 |
| 202203 | 2 | NULL | 10 |
+--------+------+--------+--------+
8 rows in set (0.021 sec)
Your calendar date may be slightly different to mine but you should get the general idea.

How to get the average time between multiple dates

What I'm trying to do is bucket my customers based on their transaction frequency. I have the date recorded for every time they transact but I can't work out to get the average delta between each date. What I effectively want is a table showing me:
| User | Average Frequency
| 1 | 15
| 2 | 15
| 3 | 35
...
The data I currently have is formatted like this:
| User | Transaction Date
| 1 | 2018-01-01
| 1 | 2018-01-15
| 1 | 2018-02-01
| 2 | 2018-06-01
| 2 | 2018-06-18
| 2 | 2018-07-01
| 3 | 2019-01-01
| 3 | 2019-02-05
...
So basically, each customer will have multiple transactions and I want to understand how to get the delta between each date and then average of the deltas.
I know the datediff function and how it works but I can't work out how to split them transactions up. I also know that the offset function is available in tools like Looker but I don't know the syntax behind it.
Thanks
In MySQL 8+ you can use LAG to get a delayed Transaction Date and then use DATEDIFF to get the difference between two consecutive dates. You can then take the average of those values:
SELECT User, AVG(delta) AS `Average Frequency`
FROM (SELECT User,
DATEDIFF(`Transaction Date`, LAG(`Transaction Date`) OVER (PARTITION BY User ORDER BY `Transaction Date`)) AS delta
FROM transactions) t
GROUP BY User
Output:
User Average Frequency
1 15.5
2 15
3 35
Demo on dbfiddle.com
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(user INT NOT NULL
,transaction_date DATE
,PRIMARY KEY(user,transaction_date)
);
INSERT INTO my_table VALUES
(1,'2018-01-01'),
(1,'2018-01-15'),
(1,'2018-02-01'),
(2,'2018-06-01'),
(2,'2018-06-18'),
(2,'2018-07-01'),
(3,'2019-01-01'),
(3,'2019-02-05');
SELECT user
, AVG(delta) avg_delta
FROM
( SELECT x.*
, DATEDIFF(x.transaction_date,MAX(y.transaction_date)) delta
FROM my_table x
JOIN my_table y
ON y.user = x.user
AND y.transaction_date < x.transaction_date
GROUP
BY x.user
, x.transaction_date
) a
GROUP
BY user;
+------+-----------+
| user | avg_delta |
+------+-----------+
| 1 | 15.5000 |
| 2 | 15.0000 |
| 3 | 35.0000 |
+------+-----------+
I don't know what to say other than use a GROUP BY.
SELECT User, AVG(DATEDIFF(...))
FROM ...
GROUP BY User

How can I retrieve all the columns on a timerange aggregation?

I am currently struggling on how to aggregate my daily data in other time aggregations (weeks, months, quarters etc).
Here is how my raw data type looks like:
| date | traffic_type | visits |
|----------|--------------|---------|
| 20180101 | 1 | 1221650 |
| 20180101 | 2 | 411424 |
| 20180101 | 4 | 108407 |
| 20180101 | 5 | 298117 |
| 20180101 | 6 | 26806 |
| 20180101 | 7 | 12033 |
| 20180101 | 8 | 80368 |
| 20180101 | 9 | 69544 |
| 20180101 | 10 | 39919 |
| 20180101 | 11 | 26291 |
| 20180102 | 1 | 1218490 |
| 20180102 | 2 | 410965 |
| 20180102 | 4 | 108037 |
| 20180102 | 5 | 297727 |
| 20180102 | 6 | 26719 |
| 20180102 | 7 | 12019 |
| 20180102 | 8 | 80074 |
First, I would like to check the sum of visits regardless of traffic_type:
SELECT date, SUM(visits) as visits_per_day
FROM visits_tbl
GROUP BY date
Here is the outcome:
| ymd | visits_per_day |
|:--------:|:--------------:|
| 20180101 | 2294563 |
| 20180102 | 2289145 |
| 20180103 | 2300367 |
| 20180104 | 2310256 |
| 20180105 | 2368098 |
| 20180106 | 2372257 |
| 20180107 | 2373863 |
| 20180108 | 2364236 |
However, if I want to check the specific day which the visits_per_day was the highest for each time aggregation (eg.: Month), I am struggling to retrieve the right output.
Here is what I did:
SELECT
(date div 100) as y_month, MAX(visits_per_day) as max_visit_per_day
FROM
(SELECT date, SUM(visits) as visits_per_day
FROM visits_tbl
GROUP BY date) as t1
GROUP BY
y_month
And here is the output of my query:
| y_month | max_visit_per_day |
|:-------:|:-----------------:|
| 201801 | 2435845 |
| 201802 | 2519000 |
| 201803 | 2528097 |
| 201804 | 2550645 |
However, I cannot know what was the exact day where the visits_per_day was the highest.
Desired output:
| y_month | max_visit_per_day | ymd |
|:-------:|:-----------------:|:--------:|
| 201801 | 2435845 | 20180130 |
| 201802 | 2519000 | 20180220 |
| 201803 | 2528097 | 20180325 |
| 201804 | 2550645 | 20180406 |
ymd would represent the day in which the visits_per_day was the highest.
This logic would be used in a dashboard with the help of programming in order to automatically select the time aggregation.
Can someone please help me?
This is a job for the structured part of structured query language. That is, you will write some subqueries and treat them as tables.
You already know how to find the number of visits per day. Let's add the month for each day to that query (http://sqlfiddle.com/#!9/a8455e/13/0).
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
Next you need to find the largest number of daily visits in each month. (http://sqlfiddle.com/#!9/a8455e/12/0)
SELECT month, MAX(visits) max_daily_visits
FROM (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) dayvisits
GROUP BY month
Then, the trick is retrieving the date on which that maximum occurred in each month. That requires a join. Without common table expressions (which MySQL lacks) you need to repeat the first subquery. (http://sqlfiddle.com/#!9/a8455e/11/0)
SELECT detail.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) dayvisits
GROUP BY month
) maxvisits
JOIN (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
) detail ON detail.visits = maxvisits.max_daily_visits
AND detail.month = maxvisits.month
The outline of this rather complex query helps explain it. Instead of that subquery, we'll use an imaginary table called dayvisits.
SELECT detail.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM dayvisits
GROUP BY date DIV 100
) maxvisits
JOIN dayvisits detail ON detail.visits = maxvisits.max_daily_visits
AND detail.month = maxvisits.month
You're seeking an extreme value for each month in the subquery. (This is a fairly standard sort of SQL operation.) To do that you find that value with a MAX() ... GROUP BY query. Then you join that to the subquery itself to find the other values corresponding to the extreme value.
If you did have common table expressions, the query would look like this. YOu might consider adopting the MySQL fork called MariaDB, which has CTEs.
WITH dayvisits AS (
SELECT date DIV 100 as month, date,
SUM(visits) as visits
FROM visits_tbl
GROUP BY date
)
SELECT dayvisits.*
FROM (
SELECT month, MAX(visits) max_daily_visits
FROM dayvisits
GROUP BY month
) maxvisits
JOIN dayvisits ON dayvisits.visits = maxvisits.max_daily_visits
AND dayvisits.month = maxvisits.month
[Query Check on MSSQL] its quick and efficient.
select visit_sum_day_wise.date
, visit_sum_day_wise.Max_Visits
, visit_sum_day_wise.traffic_type
, LAST_VALUE(visit_sum_day_wise.visits) OVER(PARTITION BY
visit_sum_day_wise.date ORDER BY visit_sum_day_wise.date ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS max_visit_per_day
from (
select visits_tbl.date , visits_tbl.visits , visits_tbl.traffic_type
,max(visits_tbl.visits ) OVER ( PARTITION BY visits_tbl.date ORDER
BY visits_tbl.date ROWS BETWEEN UNBOUNDED PRECEDING AND 0
PRECEDING) Max_visits
from visits_tbl
) as visit_sum_day_wise
where visit_sum_day_wise.visits = (select max(visits_B.visits ) from
visits_tbl visits_B where visits_B.Date = visit_sum_day_wise.date )
enter image description here

With a table of visitor data, how do I get hourly totals of total and unique visitors?

I have a Visits table, structured like the below:
+--------------------------------------+
| ID | Date | Time | Session |
+--------------------------------------+
| 1 | 05-18-2014 | 20:15:10 | 1 |
| 2 | 05-18-2014 | 20:15:20 | 1 |
| 3 | 05-18-2014 | 21:40:20 | 2 |
| 4 | 05-18-2014 | 21:45:30 | 1 |
| 5 | 05-18-2014 | 21:50:50 | 3 |
+--------------------------------------+
The session column is the user's session ID. I would like to query the table to get the hourly total and unique visitors, to get a result like:
+-----------------------+
| Time | Total | Unique |
+-----------------------+
| 20 | 2 | 1 |
| 21 | 3 | 2 |
+-----------------------+
Unique visitors are visitors with sessions that have never been seen before, anywhere in the Visits table.
The below only selects unique visitors inside each hour:
SELECT COUNT(*) Total, COUNT(DISTINCT Session) Unique, HOUR(Time) Time
WHERE Date = '05-18-2014'
FROM Visits
GROUP BY HOUR(Time)
The following seems to work, however requires two queries, and a sub-query:
SELECT COUNT(*) Total, HOUR(Time) Time
FROM Visits
GROUP BY HOUR(Time);
SELECT COUNT(*) Unique, HOUR(Time) Time
FROM (
SELECT *
FROM Visits
GROUP BY Session
ORDER BY Date, Time DESC
) UniqueVisits
WHERE Date = '05-18-2014'
GROUP BY HOUR(Time);
Is there a simpler way to get the two totals?
I think by "distinct" you mean that you only want one session counted once (during the first hour). If so, you can do this:
select max(h.total) as total, count(firstvisit.session) as Firsts, h.hr
from (select hour(time) as hr, count(*) as total
from visits v
where Date = '05-18-2014'
group by hour(time)
) h left outer join
(select session, min(hour(time))as hr
from visits v
where Date = '05-18-2014'
group by session
) firstvisit
on h.hr = firstvisit.hr
GROUP BY h.hr;

How can I fetch last 30 days left joined with values from my own table?

In my Symfony2/Doctrine2 application, I have an entity, respectively a table in my database where I keep track of every user, if he or she has done a specific action on a specified day.
My table looks like that, let's call it track_user_action:
+---------+------------+
| user_id | date |
+---------+------------+
| 1 | 2013-09-19 |
| 2 | 2013-09-19 |
| 1 | 2013-09-18 |
| 5 | 2013-09-18 |
| 8 | 2013-09-17 |
| 5 | 2013-09-17 |
+---------+------------+
I would like to retrieve a set of rows, where it shows the last 30 days, the corresponding weekday and if the specified user has an entry in this table, e.g. for user with user_id = 1:
+------------+--------------+-----------------+
| date | weekday | has_done_action |
+------------+--------------+-----------------+
| 2013-09-20 | Friday | false |
| 2013-09-19 | Thursday | true |
| 2013-09-18 | Wednesday | true |
| ... | | |
| 2013-08-20 | Tuesday | false |
+------------+--------------+-----------------+
I could think of a LEFT JOIN of a date-table and my track_user_action. But it seems senseless to create a special table just for the dates. MySQL should be able to handle the days, shouldn't it?
Approach:
SELECT
# somehow retrieve last 30 days
date AS date,
DAYNAME(date) AS weekday,
IF ... THEN has_done_action = true ELSE has_done_action = false
# and according weekdays
LEFT JOIN track_user_action AS t
ON t.date = # date field from above
WHERE t.user_id = 1
ORDER BY # date field from above
DESC
LIMIT 0,30
My questions:
What would be a good (My)SQL query that fetches this kind of result?
In how far is this query implementable in Doctrine2 (I know for fact that Doctrine2 doesn't support all MySQL statements, e.g. YEAR() or MONTH())?
This is a working query statement for seven days (adapt query for 30 days accordingly):
SELECT
d.date AS date,
DAYNAME(d.date) AS weekday,
IF(t.user_id IS NOT NULL, 'true', 'false') AS has_done_action
FROM (
SELECT SUBDATE(CURDATE(), 1) AS date UNION
SELECT SUBDATE(CURDATE(), 2) AS date UNION
SELECT SUBDATE(CURDATE(), 3) AS date UNION
SELECT SUBDATE(CURDATE(), 4) AS date UNION
SELECT SUBDATE(CURDATE(), 5) AS date UNION
SELECT SUBDATE(CURDATE(), 6) AS date UNION
SELECT SUBDATE(CURDATE(), 7) AS date
) AS d
LEFT JOIN track_user_action t
ON t.date = d.date