Getting total average between dates - mysql

I have a table named sales with the following format.
sale_id user_id sale_date sale_cost
j847bv-6ggd bd48ta36-cn5x 2017-01-10 15:43:12 30
vf87x2-15gr bd48ta36-cn5x 2017-01-05 13:41:16 60
3gfd7f-2cdd 8g4f5ccf-1fet 2017-01-15 14:10:12 100
4bgfd5-12vn 8g4f5ccf-1fet 2017-01-20 19:47:14 20
b58e32-bf87 8g4f5ccf-1fet 2017-01-20 17:35:13 15
bg87db-127g gr4gg1f4-3gbb 2017-01-20 12:26:15 80
How could I get the average amount that a user (user_d) spends within the first X amount of days since their first purchase? I don't want an average for every user, but a total average for all.
I know that I can get the average using select avg(sale_cost) but I'm unsure how to find out the average for a date period.

You can find average of total for each user within 10 days date range from intial sales date like this:
select avg(sale_cost)
from (
select sum(t.sale_cost) sale_cost
from your_table t
join (
select user_id, min(sale_date) start_date, date_add(min(sale_date), interval 10 day) end_date
from your_table
group by user_id
) t2 on t.user_id = t2.user_id
and t.sale_date between t2.start_date and t2.end_date
group by t.user_id
) t;
It finds the first sale_date and date 10 days after this for each user. Then joins it with the table to get total for each user within that range and then finally average of the above calculated totals.
Demo
If you want to find the average between overall first sale_date (not individual) and 10 days from it, use:
select avg(sale_cost)
from (
select sum(t.sale_cost) sale_cost
from your_table t
join (
select min(sale_date) start_date, date_add(min(sale_date), interval 10 day) end_date
from your_table
) t2 on t.sale_date between t2.start_date and t2.end_date
group by t.user_id
) t;
Demo

The between operator comes in handy whenever it comes to checking ranges
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
In this case value1 and value2 will be replaced by your dates using:
'2011-01-01 00:00:00' AND '2011-01-31 23:59:59'
or
sale_date AND DATE_ADD(OrderDate,INTERVAL 10 DAY)
The first way is faster and also the between values are inclusive.

Related

MySQL Get Orders From Last 12 Weeks Monday to Sunday

I have a table that stores each order made by a user, recording the date it was made , the amount and the user id. I am trying to create a query that returns the weekly transactions from Monday to Sunday for the last 12 weeks for a particular user. I am using the following query:
SELECT COUNT(*) AS Orders,
SUM(amount) AS Total,
DATE_FORMAT(transaction_date,'%m/%Y') AS Week
FROM shop_orders
WHERE user_id = 123
AND transaction_date >= now()-interval 3 month
GROUP BY YEAR(transaction_date), WEEKOFYEAR(transaction_date)
ORDER BY DATE_FORMAT(transaction_date,'%m/%Y') ASC
This produces the following result:
This however does not return the weeks where the user has made 0 orders, does not sum the orders from Monday to Sunday and does not return the weeks ordered from 1 to 12. Is there a way to achieve these things?
One way to accomplish this is with an self outer join (in this case, I use a right outer join, but of course a left outer join would work as well).
To start your weeks on Monday, subtract the result of WEEKDAY from your column transaction_date with DATE_SUB, as proposed in the most upvoted answer here.
SELECT
COALESCE(t1.Orders, 0) AS `Orders`,
COALESCE(t1.Total, 0) AS `Total`,
t2.Week AS `Week`
FROM
(
SELECT
COUNT(*) AS `Orders`,
SUM(amount) AS `Total`,
DATE(DATE_SUB(transaction_date, INTERVAL(WEEKDAY(transaction_date)) DAY)) AS `Week`
FROM
shop_orders
WHERE 1=1
AND user_id = 123
AND transaction_date >= NOW() - INTERVAL 12 WEEK
GROUP BY
3
) t1 RIGHT JOIN (
SELECT
DATE(DATE_SUB(transaction_date, INTERVAL(WEEKDAY(transaction_date)) DAY)) AS `Week`
FROM
shop_orders
WHERE
transaction_date >= NOW() - INTERVAL 12 WEEK
GROUP BY
1
ORDER BY
1
) t2 USING (Week)
To return the weeks with no Orders you have to create a table with all the weeks.
For the order order by the same fields in the group by

Count number of entries in time interval 1 that appear in time interval 2 - SQL

I am new here and tried to look up the answer to my question but couldn't find anything on it. I am currently learning how to work with SQL queries and am wondering how I can count the amount of unique values that appear in two time intervals?
I have two columns; one is the timestamp while the other is a customer id. What I want to do is to check, for example, the amount of customers that appear in time interval A, let's say January 2014 - February 2014. I then want to see how many of these also appear in another time interval that i specify, for example February 2014-April 2014. If the total sample were 2 people who both bought something in january while only one of them bought something else before the end of April, the count would be 1.
I am a total beginner and tried the query below but it obviously won't return what I want because each entry only having one timestamp makes it not possible to be in two intervals.
SELECT
count(customer_id)
FROM db.table
WHERE time >= date('2014-01-01 00:00:00')
AND time < date('2014-02-01 00:00:00')
AND time >= date('2014-02-01 00:00:00')
AND time < date('2014-05-01 00:00:00')
;
Try this.
select count(distinct t.customer_id) from Table t
INNER JOIN Table t1 on t1.customer_id = t.customer_id
and t1.time >= '2014-01-01 00:00:00' and t1.time<'2014-02-01 00:00:00'
where t.time >='2014-02-01 00:00:00' and t.time<'2014-05-01 00:00:00'
Here's one method of doing this with conditional grouping in an inner-select.
Select Case
When GroupBy = 1 Then 'January - February 2014'
When GroupBy = 2 Then 'February - April 2014'
End As Period,
Count (Customer_Id) As Total
From
(
SELECT Customer_Id,
Case
When Time Between '2014-01-01' And '2014-02-01' Then 1
When Time Between '2014-02-01' And '2014-04-01' Then 2
Else -1
End As GroupBy
From db.Table
) D
Where GroupBy <> -1
Group By GroupBy
Edit: Sorry, misread the question. This will show you those that overlap those two time ranges:
Select Count(Customer_Id)
From db.Table t1
Where Exists
(
Select Customer_Id
From db.Table t2
Where t1.customer_id = t2.customer_id
And t2.Time Between '2014-02-01' And '2014-04-01'
)
And t1.Time Between '2014-01-01' And '2014-02-01'

Return active users in the last 30 days for each day

I have a table, activity that looks like the following:
date | user_id |
Thousands of users and multiple dates and activity for all of them. I want to pull a query that will, for every day in the result, give me the total active users in the last 30 days. The query I have now looks like the following:
select date, count(distinct user_id) from activity where date > date_sub(date, interval 30 day) group by date
This gives me total unique users on only that day; I can't get it to give me the last 30 for each date. Help is appreciated.
To do this you need a list of the dates and join that against the activities.
As such this should do it. A sub query to get the list of dates and then a count of user_id (or you could use COUNT(*) as I presume user_id cannot be null):-
SELECT date, COUNT(user_id)
FROM
(
SELECT DISTINCT date, DATE_ADD(b.date, INTERVAL -30 DAY) AS date_minus_30
FROM activity
) date_ranges
INNER JOIN activity
ON activity.date BETWEEN date_ranges.date_minus_30 AND date_ranges.date
GROUP BY date
However if there can be multiple records for a user_id on any particular date but you only want the count of unique user_ids on a date you need to count DISTINCT user_id (although note that if a user id occurs on 2 different dates within the 30 day date range they will only be counted once):-
SELECT activity.date, COUNT(DISTINCT user_id)
FROM
(
SELECT DISTINCT date, DATE_ADD(b.date, INTERVAL -30 DAY) AS date_minus_30
FROM activity
) date_ranges
INNER JOIN activity
ON activity.date BETWEEN date_ranges.date_minus_30 AND date_ranges.date
GROUP BY date
A bit cruder would be to just join the activity table against itself based on the date range and use COUNT(DISTINCT ...) to just eliminate the duplicates:-
SELECT a.date, COUNT(DISTINCT a.user_id)
FROM activity a
INNER JOIN activity b
ON a.date BETWEEN DATE_ADD(b.date, INTERVAL -30 DAY) AND b.date
GROUP by a.date

Return a zero for a day with no results

I have a query which returns the total of users who registered for each day. Problem is if a day had no one register it doesn't return any value, it just skips it. I would rather it returned zero
this is my query so far
SELECT count(*) total FROM users WHERE created_at < NOW() AND created_at >
DATE_SUB(NOW(), INTERVAL 7 DAY) AND owner_id = ? GROUP BY DAY(created_at)
ORDER BY created_at DESC
Edit
i grouped the data so i would get a count for each day- As for the date range, i wanted the total users registered for the previous seven days
A variation on the theme "build your on 7 day calendar inline":
SELECT D, count(created_at) AS total FROM
(SELECT DATE_SUB(NOW(), INTERVAL D DAY) AS D
FROM
(SELECT 0 as D
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
) AS D
) AS D
LEFT JOIN users ON date(created_at) = date(D)
WHERE owner_id = ? or owner_id is null
GROUP BY D
ORDER BY D DESC
I don't have your table structure at hand, so that would need adjustment probably. In the same order of idea, you will see I use NOW() as a reference date. But that's easily adjustable. Anyway that's the spirit...
See for a live demo http://sqlfiddle.com/#!2/ab5cf/11
If you had a table that held all of your days you could do a left join from there to your users table.
SELECT SUM(CASE WHEN U.Id IS NOT NULL THEN 1 ELSE 0 END)
FROM DimDate D
LEFT JOIN Users U ON CONVERT(DATE,U.Created_at) = D.DateValue
WHERE YourCriteria
GROUP BY YourGroupBy
The tricky bit is that you group by the date field in your data, which might have 'holes' in it, and thus miss records for that date.
A way to solve it is by filling a table with all dates for the past 10 and next 100 years or so, and to (outer)join that to your data. Then you will have one record for each day (or week or whatever) for sure.
I had to do this only for MS SqlServer, so how to fill a date table (or perhaps you can do it dynamically) is for someone else to answer.
A bit long winded, but I think this will work...
SELECT count(users.created_at) total FROM
(SELECT DATE_SUB(CURDATE(),INTERVAL 6 DAY) as cdate UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 5 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 4 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 3 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 2 DAY) UNION ALL
SELECT DATE_SUB(CURDATE(),INTERVAL 1 DAY) UNION ALL
SELECT CURDATE()) t1 left join users
ON date(created_at)=t1.cdate
WHERE owner_id = ? or owner_id is null
GROUP BY t1.cdate
ORDER BY t1.cdate DESC
It differs from your query slightly in that it works on dates rather than date times which your query is doing. From your description I have assumed you mean to use whole days and therefore have used dates.

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.