Get accumulated data by dates in MySQL - mysql

I need to get accumulated number of users by a range of dates i.e. for a month by date. The following query works fine but I have to run it for each date and I cannot use group by date. Please advise.
MySQL version 8
Sample Data
+------------------------+
| id | Registration_Date |
+------------------------+
| 1 | 2020-05-01 |
| 2 | 2020-05-01 |
| 3 | 2020-05-02 |
| 4 | 2020-05-03 |
| 5 | 2020-05-04 |
+------------------------+
Current Query
SELECT COUNT(id) AS 'Registrations'
FROM users
WHERE DATE(Registration_Date) <= "2020-05-04";
Desired Result
+-----------------------------------+
| Registration_Date | Registrations |
+-----------------------------------+
| 2020-05-01 | 2 |
| 2020-05-02 | 3 |
| 2020-05-03 | 4 |
| 2020-05-04 | 5 |
+-----------------------------------+

You can use window functions to achieve the result you want, COUNTing id values on or before the current registration date. Note we use DISTINCT to avoid duplication of entries where multiple users register on the same day:
SELECT DISTINCT Registration_Date,
COUNT(id) OVER (ORDER BY Registration_Date) AS Registrations
FROM users
Output:
Registration_Date Registrations
2020-05-01 2
2020-05-02 3
2020-05-03 4
2020-05-04 5
Demo on dbfiddle
To deal with the case where there are registrations before the first reporting date of interest, you need to count registrations up to and including the first date and then for each date in the reporting period in a derived table, and then sum those in an outer query:
SELECT Reporting_Date,
SUM(Registrations) OVER (ORDER BY Reporting_Date) AS Registrations
FROM (
SELECT '2020-05-01' AS Reporting_Date, COUNT(id) AS Registrations
FROM users
WHERE Registration_Date <= '2020-05-01'
UNION
SELECT Registration_Date, COUNT(id)
FROM users
WHERE Registration_Date BETWEEN '2020-05-02' AND '2020-05-04'
GROUP BY Registration_Date
) r
Generating the result this way in general will be more efficient than wrapping the original query as a derived table as it will require fewer aggregations.
Demo on dbfiddle

I used Nick's answer as source and now modified it a bit to get grand total plus daily increment value.
SELECT Reporting_Date, Registrations FROM
(SELECT DISTINCT DATE(Registration_Date) AS Reporting_Date,
COUNT(id) OVER (ORDER BY DATE(Registration_Date)) AS Registrations
FROM users) AS RAW_Result
WHERE Reporting_Date BETWEEN "2020-05-01" AND "2020-05-04";
Result:
+-----------------------------------+
| Registration_Date | Registrations |
+-----------------------------------+
| 2020-05-01 | 1200 | (grand total until this date)
| 2020-05-02 | 1201 | (grand total + daily increment)
| 2020-05-03 | 1202 |
| 2020-05-04 | 1203 |
+-----------------------------------+

Related

Sum datetime difference for values of same column and group by day

I have a table with 'ON' and 'OFF' values in column activity and another column datetime.
id(AUTOINCREMENT) id_device activity datetime
1 a ON 2017-05-26 22:00:00
2 b ON 2017-05-26 05:00:00
3 a OFF 2017-05-27 04:00:00
4 b OFF 2017-05-26 08:00:00
5 a ON 2017-05-28 12:00:00
6 a OFF 2017-05-28 15:00:00
I need to get total ON time by day
day id_device total_minutes_on
2017-05-26 a 120
2017-05-26 b 180
2017-05-27 a 240
2017-05-27 b 0
2017-05-28 a 180
2017-05-28 b 0
i have searched and tried answers for another posts, i tried TimeDifference and i get correct total time.
I don't find the way to get total time grouped by date
i appreciate your help
I'm not posting this as a definite answer rather it's an experiment for me and hopefully you'll find is useful in your case. Also I would like to mention that the MySQL database version I'm working with is quite old so the method I'm using is also very manual to say the least.
First of all lets extract your expected output:
The date value in day need to be repeated twice fro each of id_device a and b.
Minutes are calculated based on the activity; if activity is 'ON' until tomorrow, it needs to be calculated until the day end at 24:00:00 while the next day will calculate minutes until the activity is OFF.
What I come up with is this:
Creating condition (1):
SELECT * FROM
(SELECT DATE(datetime) dtt FROM mytable GROUP BY DATE(datetime)) a,
(SELECT id_device FROM mytable GROUP BY id_device) b
ORDER BY dtt,id_device;
The query above will return the following result:
+------------+-----------+
| dtt | id_device |
+------------+-----------+
| 2017-05-26 | a |
| 2017-05-26 | b |
| 2017-05-27 | a |
| 2017-05-27 | b |
| 2017-05-28 | a |
| 2017-05-28 | b |
+------------+-----------+
*Above will only work with all the dates you have in the table. If you want all date regardless if there's activity or not, I suggest you create a calendar table (refer: Generating a series of dates).
So this become the base query. Then I've added an outer query to left join the query above with the original data table:
SELECT v.*,
GROUP_CONCAT(w.activity ORDER BY w.datetime SEPARATOR ' ') activity,
GROUP_CONCAT(TIME_TO_SEC(TIME(w.datetime)) ORDER BY w.datetime SEPARATOR ' ') tr
FROM
-- this was the first query
(SELECT * FROM
(SELECT DATE(datetime) dtt FROM mytable GROUP BY DATE(datetime)) a,
(SELECT id_device FROM mytable GROUP BY id_device) b
ORDER BY a.dtt,b.id_device) v
--
LEFT JOIN
mytable w
ON v.dtt=DATE(w.datetime) AND v.id_device=w.id_device
GROUP BY DATE(v.dtt),v.id_device
What's new in the query is the addition of GROUP_CONCAT operation on both activity and time value extracted from datetime column which is converted into seconds value. You notice that in both of the GROUP_CONCAT there's a similar ORDER BY condition which is important in order to get the exact corresponding value.
The query above will return the following result:
+------------+-----------+----------+-------------+
| dtt | id_device | activity | tr |
+------------+-----------+----------+-------------+
| 2017-05-26 | a | ON | 79200 |
| 2017-05-26 | b | ON OFF | 18000 28800 |
| 2017-05-27 | a | OFF | 14400 |
| 2017-05-27 | b | (NULL) | (NULL) |
| 2017-05-28 | a | ON OFF | 43200 54000 |
| 2017-05-28 | b | (NULL) | (NULL) |
+------------+-----------+----------+-------------+
From here, I've added another query outside to calculate how many minutes and attempt to get the expected result:
SELECT dtt,id_device,
CASE
WHEN SUBSTRING_INDEX(activity,' ',1)='ON' AND SUBSTRING_INDEX(activity,' ',-1)='OFF'
THEN (SUBSTRING_INDEX(tr,' ',-1)-SUBSTRING_INDEX(tr,' ',1))/60
WHEN activity='ON' THEN 1440-(tr/60)
WHEN activity='OFF' THEN tr/60
WHEN activity IS NULL AND tr IS NULL THEN 0
END AS 'total_minutes_on'
FROM
-- from the last query
(SELECT v.*,
GROUP_CONCAT(w.activity ORDER BY w.datetime SEPARATOR ' ') activity,
GROUP_CONCAT(TIME_TO_SEC(TIME(w.datetime)) ORDER BY w.datetime SEPARATOR ' ') tr
FROM
-- this was the first query
(SELECT * FROM
(SELECT DATE(datetime) dtt FROM mytable GROUP BY DATE(datetime)) a,
(SELECT id_device FROM mytable GROUP BY id_device) b
ORDER BY a.dtt,b.id_device) v
--
LEFT JOIN
mytable w
ON v.dtt=DATE(w.datetime) AND v.id_device=w.id_device
GROUP BY DATE(v.dtt),v.id_device
--
) z
The last part I do is if the activity value have both ON and OFF on the same day then (OFF-ON)/60secs=total minutes. If activity value is only ON then minutes value for '24:00:00' > 24 hr*60 min= 1440-(ON/60secs)= total minutes, and if activity only OFF, I just convert seconds to minutes because the day starts at 00:00:00 anyhow.
+------------+-----------+------------------+
| dtt | id_device | total_minutes_on |
+------------+-----------+------------------+
| 2017-05-26 | a | 120 |
| 2017-05-26 | b | 180 |
| 2017-05-27 | a | 240 |
| 2017-05-27 | b | 0 |
| 2017-05-28 | a | 180 |
| 2017-05-28 | b | 0 |
+------------+-----------+------------------+
Hopefully this will give you some ideas. ;)

How to get the average time between multiple dates

What I'm trying to do is bucket my customers based on their transaction frequency. I have the date recorded for every time they transact but I can't work out to get the average delta between each date. What I effectively want is a table showing me:
| User | Average Frequency
| 1 | 15
| 2 | 15
| 3 | 35
...
The data I currently have is formatted like this:
| User | Transaction Date
| 1 | 2018-01-01
| 1 | 2018-01-15
| 1 | 2018-02-01
| 2 | 2018-06-01
| 2 | 2018-06-18
| 2 | 2018-07-01
| 3 | 2019-01-01
| 3 | 2019-02-05
...
So basically, each customer will have multiple transactions and I want to understand how to get the delta between each date and then average of the deltas.
I know the datediff function and how it works but I can't work out how to split them transactions up. I also know that the offset function is available in tools like Looker but I don't know the syntax behind it.
Thanks
In MySQL 8+ you can use LAG to get a delayed Transaction Date and then use DATEDIFF to get the difference between two consecutive dates. You can then take the average of those values:
SELECT User, AVG(delta) AS `Average Frequency`
FROM (SELECT User,
DATEDIFF(`Transaction Date`, LAG(`Transaction Date`) OVER (PARTITION BY User ORDER BY `Transaction Date`)) AS delta
FROM transactions) t
GROUP BY User
Output:
User Average Frequency
1 15.5
2 15
3 35
Demo on dbfiddle.com
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(user INT NOT NULL
,transaction_date DATE
,PRIMARY KEY(user,transaction_date)
);
INSERT INTO my_table VALUES
(1,'2018-01-01'),
(1,'2018-01-15'),
(1,'2018-02-01'),
(2,'2018-06-01'),
(2,'2018-06-18'),
(2,'2018-07-01'),
(3,'2019-01-01'),
(3,'2019-02-05');
SELECT user
, AVG(delta) avg_delta
FROM
( SELECT x.*
, DATEDIFF(x.transaction_date,MAX(y.transaction_date)) delta
FROM my_table x
JOIN my_table y
ON y.user = x.user
AND y.transaction_date < x.transaction_date
GROUP
BY x.user
, x.transaction_date
) a
GROUP
BY user;
+------+-----------+
| user | avg_delta |
+------+-----------+
| 1 | 15.5000 |
| 2 | 15.0000 |
| 3 | 35.0000 |
+------+-----------+
I don't know what to say other than use a GROUP BY.
SELECT User, AVG(DATEDIFF(...))
FROM ...
GROUP BY User

mysql need complete count of a column and group by some columns

I need a complete count of each person_id from the database according to the date wise report
SELECT date, person_id, count(person_id)
FROM visits
group by date, person_id
I tried this one but this couldn't give the result what I expected.
Date | person_id| count(person_id)
2018-01-01 | 33000 | 10 |
2018-01-01 | 712000 | 111 |
2018-01-01 | 730000 | 30 |
2018-01-01 | 743000 | 5 |
2018-01-01 | 755000 | 123 |
you need total append to your query result? For example:
Date | person_id| count(person_id) | total
2018-01-01 | 33000 | 10 | 1000
2018-01-01 | 712000 | 111 | 1000
right? if so, I don't think it's a good idea only using sql query. On my case, I will query twice asynchronously,and then merge the result.
like this:
query1:
SELECT date, person_id, count(person_id)
FROM visits
group by date, person_id
query2:
SELECT count(person_id) as total
FROM visits
and then merge the results by program.

Using MySQL group by clause with where clause

I have two tables, one that store product information and one that stores reviews for the products.
I am now trying to get the number of reviews submitted for the products between two dates but for some reason I get the same results regardless of the dates i put.
This is my query:
SELECT
productName,
COUNT(*) as `count`,
avg(rating) as `rating`
FROM `Reviews`
LEFT JOIN `Products` using(`productID`)
WHERE `date` BETWEEN '2015-07-20' AND '2015-07-30'
GROUP BY
`productName`
ORDER BY `count` DESC, `rating` DESC;
This returns:
+------------+---------------------+
| productName| count|rating |
+------------+------+--------------+
| productA | 23 | 4.3333333 |
| productB | 17 | 4.25 |
| productC | 10 | 3.5 |
+------------+---------------------+
Products table:
+---------+-------------+
|productID | productName|
+---------+-------------+
| 1 | productA |
| 2 | productB |
| 3 | productC |
+---------+-------------+
Reviews table
+---------+-----------+--------+---------------------+
|reviewID | productID | rating | date |
+---------+-----------+--------+---------------------+
| 1 | 1 | 4.5 | 2015-07-27 17:47:01|
| 2 | 1 | 3.5 | 2015-07-27 18:54:22|
| 3 | 3 | 2 | 2015-07-28 13:28:37|
| 4 | 1 | 5 | 2015-07-28 18:33:14|
| 5 | 2 | 1.5 | 2015-07-29 11:58:17|
| 6 | 2 | 3.5 | 2015-07-30 15:04:25|
| 7 | 2 | 2.5 | 2015-07-30 18:11:11|
| 8 | 1 | 3 | 2015-07-30 18:26:23|
| 9 | 1 | 3 | 2015-07-30 21:35:05|
| 10 | 1 | 4.5 | 2015-07-31 14:25:47|
| 11 | 3 | 0.5 | 2015-07-31 14:47:48|
+---------+-----------+--------+---------------------+
when I put two random dates that I do know for sure they not on the date column, I will still get the same results. Even when I want to retrieve records only on a certain day, I get the same results.
You should not use left join, because by doing so you retrieve all the data from one table. What you should use is something like :
select
productName,
count(*) as `count`,
avg(rating) as `rating`
from
products p,
reviews r
where
p.productID = r.productID
and `date` between '2015-07-20' and '2015-07-30'
group by productName
order by count desc, rating desc;
If the result, given your sample data, that you're looking for is:
| productName | count | rating |
|-------------|-------|--------|
| productA | 5 | 4 |
| productB | 3 | 3 |
| productC | 1 | 2 |
This is the count and average of reviews made on any date between 2015-07-20 and 2015-07-30 inclusive.
Then the there are two issues with your query. First, you need to change the join to a inner join instead of a left join, but more importantly you need to change the date condition as you are currently excluding reviews that fall on the last date on the range, but after midnight.
This happens because your between clause compares datetime values with date values so the comparison ends up being date between '2015-07-20 00:00:00' and '2015-07-30 00:00:00' which clearly excludes some dates at the end.
The fix is to either change the date condition so that the end is a day later:
where date >= '2015-07-20' and date < '2015-07-31'
or cast the date column to a date value, which will remove the time part:
where date(date) between '2015-07-20' and '2015-07-30'
Sample SQL Fiddle
You are using a LEFT JOIN between your reviews and your products tables. This will result in all the rows of reviews being shown with some rows having all product columns left empty.
You should use INNER JOIN, as this will filter only the wanted results.
(In the end I can only guess, since I don't even know which column belongs to which table ...)
The full query (very similar to Angelo Giannis's solution):
select
productName,
count(*) as `count`,
avg(rating) as `rating`
from
products INNER JOIN reviews USING(productId)
where date between '2015-07-20' and '2015-07-30'
group by productName
order by count desc, rating desc;
Here a fiddle with my and Angelo's solution (they both work).

MySQL - how to select id where min/max dates difference is more than 3 years

I have a table like this:
| id | date | user_id |
----------------------------------------------------
| 1 | 2008-01-01 | 10 |
| 2 | 2009-03-20 | 15 |
| 3 | 2008-06-11 | 10 |
| 4 | 2009-01-21 | 15 |
| 5 | 2010-01-01 | 10 |
| 6 | 2011-06-01 | 10 |
| 7 | 2012-01-01 | 10 |
| 8 | 2008-05-01 | 15 |
I’m looking for a solution how to select user_id where the difference between MIN and MAX dates is more than 3 yrs. For the above data I should get:
| user_id |
-----------------------
| 10 |
Anyone can help?
SELECT user_id
FROM mytable
GROUP BY user_id
HAVING MAX(`date`) > (MIN(`date`) + INTERVAL '3' YEAR);
Tested here: http://sqlize.com/MC0618Yg58
Similar to bernie's approach, I'd keep date formats native. I'd also probably list the MAX first as to avoid an ABS call (secure a positive number is always returned).
SELECT user_id
FROM my_table
WHERE DATEDIFF(MAX(date),MIN(date)) > 365
DATEDIFF just returns delta (in days) between two given date fields.
SELECT user_id
FROM (SELECT user_id, MIN(date) m0, MAX(date) m1
FROM table
GROUP by user_id)
HAVING EXTRACT(YEAR FROM m1) - EXTRACT(YEAR FROM m0) > 3
SELECT A.USER_ID FROM TABLE AS A
JOIN TABLE AS B
ON A.USER_ID = B.USER_ID
WHERE DATEDIFF(A.DATE,B.DATE) > 365