I'm trying to get a query that will show number of visits per day for the last 7 days. Query that I come up with works but it has limitation I do not know how to get rid of.
Imagine, it is August 4th, 2019. Our table visits keeps timestamps of users visits to a website:
ID | timestamp
1 | 2019-08-03
2 | 2019-08-03
3 | 2019-08-02
4 | 2019-07-31
5 | 2019-07-31
6 | 2019-07-31
7 | 2019-07-31
8 | 2019-07-30
9 | 2019-07-30
10 | 2019-07-28
Objective: get number of visits to a website per day for the last 7 days. So the result should be something like:
DATE | NumberOfVisitis
2018-08-04 | 0
2018-08-03 | 2
2018-08-02 | 1
2018-08-01 | 0
2018-07-31 | 4
2018-07-30 | 1
2018-07-29 | 0
My query includes only dates registered in DB (it excludes days with no visits). This makes sense as query is data dependent, instead of calendar.
SELECT DATE_FORMAT(`timestamp`, "%Y%m/%d") AS Date, COUNT(`id`) AS
NumberOfVisitis FROM `visits` WHERE `timestamp` >= DATE_ADD(NOW(),
INTERVAL -7 DAY) GROUP BY DAY(`timestamp`) ORDER BY `timestamp` DESC
Can you please let me know how can I modify my query to include days with no visits in the query result?
MySQL lacks anything like Postgres's generate_series so we have to fake it.
Simplest thing to do is to make a table with a bunch of numbers in it. This will be useful for generating lots of things.
create table numbers ( number serial );
insert into numbers () values (), (), (), (), (), (), ();
From that we can generate a list of the last 7 days.
select date_sub(date(now()), interval number-1 day) as date
from numbers
order by number
limit 7
Then using that as a CTE (or a subquery) we left join it with visits. A left join means all dates will be present.
with dates as (
select date_sub(date(now()), interval number-1 day) as date
from numbers
order by number
limit 7
)
select date, coalesce(sum(id), 0)
from dates
left join visits on date = timestamp
group by date
order by date
I'm trying to finalize a query I have that is wanting an average of two metrics, inbound calls and missed calls. But I've never worked this granularly with the day of week and each hour block so I'm not sure if I'm even totalling correctly, let alone get the right average.
Basically, starting at 1/1/18, I want an average of inbound calls and missed calls for each hour from 7am to 6pm monday to friday. We were closed 7 days, and I'm getting 48 rows, so that's what I expect.
So if the last 3 mondays looked like:
creationtimestamp | Legtype1 | answered
07/23/18 08:15:00 | 2 | 0
07/23/18 08:25:00 | 2 | 1
07/23/18 08:35:00 | 2 | 1
07/30/18 08:15:00 | 2 | 0
07/30/18 08:25:00 | 2 | 0
07/30/18 08:35:00 | 2 | 0
07/30/18 08:45:00 | 2 | 1
07/30/18 08:55:00 | 2 | 0
08/06/18 08:15:00 | 2 | 0
08/06/18 08:25:00 | 2 | 1
08/06/18 08:35:00 | 2 | 0
08/06/18 08:45:00 | 2 | 0
That's a total of 12 calls, 4 missed, for monday between 8 and 9 am. If I were querying those three mondays from 8 to 9 I would expect:
Monday | 8 | 4 | 1.3
But I can't figure out how to take the sum of all calls for each individual week day, sum those and divide by the number of that weekday? My query below I currently have ```SUM''' instead of average but I'm not sure how to take the average I needs since it's hinging on the number of each individual weekday.
SELECT
dayname(s.creationtimestamp) as day, -- weekdays
HOUR(s.creationtimestamp) as Hour, -- Hours
sum(case when legtype1 = 2 then 1 else 0 end) as total_calls, -- total inbound
sum(case when legtype1 = 2 and answered = 0 then 1 else 0 end)as total_missed
FROM session s
WHERE (s.creationtimestamp >= '2018-01-01' AND creationtimestamp < now())
and WEEKDAY(s.creationtimestamp) BETWEEN 0 AND 4 -- Monday through friday
AND HOUR(s.creationtimestamp) between 7 and 18 -- 7am to 6pm
GROUP BY dayname(s.creationtimestamp), HOUR(s.creationtimestamp)
order by dayofweek(s.creationtimestamp), hour asc;
To reiterate: The query works but I'm not sure if I'm aggregating correctly based on each weekday and hour block from the 1st of the year to now.
here's a fiddle:
http://sqlfiddle.com/#!9/7b6b72
I think all things are fine for avg you just a bit more, hope below query will help you
select day,Hour,Avg(total_calls) as avg_total_calls,
Avg(total_missed) as avg_total_missed from
(
SELECT
dayname(s.creationtimestamp) as day, -- weekdays
HOUR(s.creationtimestamp) as Hour, -- Hours
sum(case when legtype1 = 2 then 1 else 0 end) as total_calls, -- total inbound
sum(case when legtype1 = 2 and answered = 0 then 1 else 0 end)as total_missed
FROM session s
WHERE (s.creationtimestamp >= '2018-01-01' AND creationtimestamp < now())
and WEEKDAY(s.creationtimestamp) BETWEEN 0 AND 4 -- Monday through friday
AND HOUR(s.creationtimestamp) between 7 and 18 -- 7am to 6pm
GROUP BY dayname(s.creationtimestamp), HOUR(s.creationtimestamp)
) as T group by day,Hour
order by day,Hour asc
I'm having a bit of an issue with max(date) in SQL.
Basically the problem being that I have to check if latest date entered by id is more than 1 days old and then return that date.
id| user_id| send_date
8 | 90 | 2016-10-21 14:31:14
| 10 | 90 | 2016-10-25 09:56:28
| 11 | 18 | 2016-10-22 09:56:28
| 12 | 19 | 2016-10-21 09:56:28
| 13 | 19 | 2016-10-23 09:56:28
| 13 | 20 | 2016-10-25 09:56:28
This is part of a much longer SQL (just the part that I have a problem with):
SELECT max(h.send_date) as lastSent
FROM history h
WHERE (h.send_date < NOW() - INTERVAL 1 DAY);
Now what happens is that instead of selecting rows where latest entered date is older than 1 day, I get the latest one that is older than 1 day even if there's a newer entry in the table.
Does anyone have an idea how to change it so that SQL would only return the latest date when it's older that 24h and the newest (by user) in the table (in the example, it would have to return nothing because there's an entry less than 24h old)?
Edited the table example a bit. This is what I need to get as a result (user_ids 90 and 20 get's ignored because of 2016-10-25 09:56:28):
18 | 2016-10-22 09:56:28
19 | 2016-10-23 09:56:28
for aggregation function you should use having and not where
SELECT max(h.send_date) as lastSent
FROM history h
having max(h.send_date ) < DATE_SUB(NOW() ,INTERVAL 1 DAY) ;
For my website, I have a loyalty program where a customer gets some goodies if they've spent $100 within the last 30 days. A query like below:
SELECT u.username, SUM(total-shipcost) as tot
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND user = :user
AND date >= DATE(NOW() - INTERVAL 30 DAY)
:user being their user ID. Column 2 of this result gives how much a customer has spent in the last 30 days, if it's over 100, then they get the bonus.
I want to display to the user which day they'll leave the loyalty program. Something like "x days until bonus expires", but how do I do this?
Take today's date, March 16th, and a user's order history:
id | tot | date
-----------------------
84 38 2016-03-05
76 21 2016-02-29
74 49 2016-02-20
61 42 2015-12-28
This user is part of the loyalty program now but leaves it on March 20th. What SQL could I do which returns how many days (4) a user has left on the loyalty program?
If the user then placed another order:
id | tot | date
-----------------------
87 12 2016-03-09
They're still in the loyalty program until the 20th, so the days remaining doesn't change in this instance, but if the total were 50 instead, then they instead leave the program on the 29th (so instead of 4 days it's 13 days remaining). For what it's worth, I care only about 30 days prior to the current date. No consideration for months with 28, 29, 31 days is needed.
Some create table code:
create table users (
userident int,
username varchar(100)
);
insert into users values
(1, 'Bob');
create table orders (
id int,
user int,
shipped int,
date date,
total decimal(6,2),
shipcost decimal(3,2)
);
insert into orders values
(84, 1, 1, '2016-03-05', 40.50, 2.50),
(76, 1, 1, '2016-02-29', 22.00, 1.00),
(74, 1, 1, '2016-02-20', 56.31, 7.31),
(61, 1, 1, '2015-12-28', 43.10, 1.10);
An example output of what I'm looking for is:
userident | username | days_left
--------------------------------
1 Bob 4
This is using March 16th as today for use with DATE(NOW()) to remain consistent with the previous bits of the question.
The following is basically how to do what you want. Note that references to "30 days" are rough estimates and what you may be looking for is "29 days" or "31 days" as works to get the exact date that you want.
Retrieve the list of dates and amounts that are still active, i.e., within the last 30 days (as you did in your example), as a table (I'll call it Active) like the one you showed.
Join that new table (Active) with the original table where a row from Active is joined to all of the rows of the original table using the date fields. Compute a total of the amounts from the original table. The new table would have a Date field from Active and a Totol field that is the sum of all the amounts in the joined records from the original table.
Select from the resulting table all records where the Amount is greater than 100.00 and create a new table with Date and the minimum Amount of those records.
Compute 30 days ahead from those dates to find the ending date of their loyalty program.
You would need to take the following steps (per user):
join the orders table with itself to calculate sums for different (bonus) starting dates, for any of the starting dates that are in the last 30 days
select from those records only those starting dates which yield a sum of 100 or more
select from those records only the one with the most recent starting date: this is the start of the bonus period for the selected user.
Here is a query to do that:
SELECT u.userident,
u.username,
MAX(base.date) AS bonus_start,
DATE(MAX(base.date) + INTERVAL 30 DAY) AS bonus_expiry,
30-DATEDIFF(NOW(), MAX(base.date)) AS bonus_days_left
FROM users u
LEFT JOIN (
SELECT o.user,
first.date AS date,
SUM(o.total-o.shipcost) as tot
FROM orders first
INNER JOIN orders o
ON o.user = first.user
AND o.shipped = 1
AND o.date >= first.date
WHERE first.shipped = 1
AND first.date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY o.user,
first.date
HAVING SUM(o.total-o.shipcost) >= 100
) AS base
ON base.user = u.userident
GROUP BY u.username,
u.userident
Here is a fiddle.
With this input as orders:
+----+------+---------+------------+-------+----------+
| id | user | shipped | date | total | shipcost |
+----+------+---------+------------+-------+----------+
| 61 | 1 | 1 | 2015-12-28 | 42 | 0 |
| 74 | 1 | 1 | 2016-02-20 | 49 | 0 |
| 76 | 1 | 1 | 2016-02-29 | 21 | 0 |
| 84 | 1 | 1 | 2016-03-05 | 38 | 0 |
| 87 | 1 | 1 | 2016-03-09 | 50 | 0 |
+----+------+---------+------------+-------+----------+
The above query will return this output (when executed on 2016-03-20):
+-----------+----------+-------------+--------------+-----------------+
| userident | username | bonus_start | bonus_expiry | bonus_days_left |
+-----------+----------+-------------+--------------+-----------------+
| 1 | John | 2016-02-29 | 2016-03-30 | 10 |
+-----------+----------+-------------+--------------+-----------------+
Simple solution
Seeing how you do your first query, I guessed that when you are at the point where you look for the "expiration date", you already know that the user meets the 100 points over last 30 days. Then you can do this :
SELECT DATE_ADD(MIN(date),INTERVAL 30 DAY)
FROM orders o
WHERE shipped = 1
AND user = :user
AND date >= (DATE(NOW() - INTERVAL 30 DAY))
It takes the minimum order date of a user over the last 30 days, and add 30 days to the result.
But that really is a poor design to achieve what you want.
You would better to think further and implement what's next.
Advanced solution
In order to reproduce all the following solution, I have used the Fiddle that Trincot kindly built, and expanded it to test on more data : 4 users having 4 orders.
SQL FIddle http://sqlfiddle.com/#!9/668939/1
Step 1 : Design
The following query will return all the users meeting the loyalty program criteria, along with their earlier order date within 30 days and the loyalty program expiration date calculated from the earlier date, and the number of days before it expires.
SELECT O.user, u.username, SUM(total-shipcost) as tot, MIN(date) AS mindate,
DATE_ADD(MIN(date),INTERVAL 30 DAY) AS expirationdate,
DATEDIFF(DATE_ADD(MIN(date),INTERVAL 30 DAY), DATE(NOW())) AS daysleft
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY user
HAVING tot >= 100;
Now, create a VIEW with the above query
CREATE VIEW loyalty_program AS
SELECT O.user, u.username, SUM(total-shipcost) as tot, MIN(date) AS mindate,
DATE_ADD(MIN(date),INTERVAL 30 DAY) AS expirationdate,
DATEDIFF(DATE_ADD(MIN(date),INTERVAL 30 DAY), DATE(NOW())) AS daysleft
FROM orders o
LEFT JOIN users u
ON u.userident = o.user
WHERE shipped = 1
AND date >= DATE(NOW() - INTERVAL 30 DAY)
GROUP BY user
HAVING tot >= 100;
It is important to understand that this is only a one-shot action on your database.
Step 2 : Use your new VIEW
Once you have the view, you can get easily, for all users, the "state" of the loyalty program:
SELECT * FROM loyalty_program
user username tot mindate expirationdate daysleft
1 John 153 February, 28 2016 March, 29 2016 9
2 Joe 112 February, 24 2016 March, 25 2016 5
3 Jack 474 February, 23 2016 March, 24 2016 4
4 Averel 115 February, 22 2016 March, 23 2016 3
For a specific user, you can get the date you are looking for like this:
SELECT expirationdate FROM loyalty_program WHERE username='Joe'
You can also request all the users for which the expiration date is today
SELECT user FROM loyalty_program WHERE expirationdate=DATE(NOW))
But there are other easy possibilities that you'll discover after having played with your VIEW.
Conclusion
Make your life easier: learn to use VIEWS !
I am assuming your table looks like this:
user | id | total | date
-------------------------------
12 84 38 2016-03-05
12 76 21 2016-02-29
23 74 49 2016-02-20
23 61 42 2015-12-28
then try this:
SELECT x.user, x.date, x.id, x.cum_sum, d,date, DATEDIFF(NOW(), x.date) from (SELECT a.user, a.id, a.date, a.total,
(SELECT SUM(b.total) FROM order_table b WHERE b.date <= a.date and a.user=b.user ORDER BY b.user, b.id DESC) AS cum_sum FROM order_table a where a.date>=DATE(NOW() - INTERVAL 30 DAY) ORDER BY a.user, a.id DESC) as x
left join
(SELECT c.user, c.date as start_date, c.id from (SELECT a.user, a.id, a.date, a.total,
(SELECT SUM(b.total) FROM order_table b WHERE b.date <= a.date and a.user=b.user ORDER BY b.user, b.id DESC) AS cum_sum FROM order_table a where a.date>=DATE(NOW() - INTERVAL 30 DAY) ORDER BY a.user, a.id DESC) as c WHERE FLOOR(c.cum_sum/100)=MIN(FLOOR(c.cum_sum/100)) and MOD(c.cum_sum,100)=MAX(MOD(c.cum_sum,100)) group by concat(c.user, "_", c.id)) as d on concat(x.user, "_", x.id)=concat(d.user, "_", d.id) where x.date=d.date;
You will get a table something like this:
user | Date | cum_sum | start_date | Time_left
----------------------------------------------------
12 2016-03-05 423 2016-03-05 24
13 2016-02-29 525 2016-02-29 12
23 2016-02-20 944 2016-02-20 3
29 2015-12-28 154 2015-12-28 4
i have not tested this. But what i am trying to do is to create a table in descending order of id and user, and get a cumulative total column along with it. I have created another table by using this table with cumulative total, with relevant date (i.e. date from which date difference is to be calculated) for each user. I have left joined these two tables, and put in the condition x.date=d.date. I have put start_date and date in the table to check if the query is working.
Also, this is not the most optimum way of writing this code, but i have tried to stay as safe as possible by using sub queries, since i did not have the data to test this. Let me know if you face any error.
This question has been asked before but I am facing a slightly different problem.
I have a table which logs events and stores their timestamps (as datetime). I need to be able to break up time into chunks and get number of events that occurred in that interval. The interval can be custom (Say from 5 minutes to 1 hour and even beyond).
The obvious solution is to convert the datetime to unix_timestamp divide it by number of seconds in the interval, take its floor function and multiply it back by the number of seconds. Finally convert the unix_timestamp back to the datetime format.
This works fine for small intervals.
select
from_unixtime(floor(unix_timestamp(event.timestamp)/300)*300) as start_time,
count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by start_time;
This gives the correct output
+---------------------+-------+
| start_time | total |
+---------------------+-------+
| 2012-08-03 00:00:00 | 11 |
| 2012-08-03 00:05:00 | 4 |
| 2012-08-03 00:10:00 | 4 |
| 2012-08-03 00:15:00 | 7 |
| 2012-08-03 00:20:00 | 8 |
| 2012-08-03 00:25:00 | 1 |
| 2012-08-03 00:30:00 | 1 |
| 2012-08-03 00:35:00 | 3 |
| 2012-08-03 00:40:00 | 3 |
| 2012-08-03 00:45:00 | 5 |
~~~~~OUTPUT SNIPPED~~~~~~~~~~~~
But if I increase the interval to say 1 hour (3600 sec)
mysql> select from_unixtime(floor(unix_timestamp(event.timestamp)/3600)*3600) as start_time, count(*) as total from event where timestamp>='2012-08-03 00:00:00' group by start_time;
+---------------------+-------+
| start_time | total |
+---------------------+-------+
| 2012-08-02 23:30:00 | 35 |
| 2012-08-03 00:30:00 | 30 |
| 2012-08-03 01:30:00 | 12 |
| 2012-08-03 02:30:00 | 18 |
| 2012-08-03 03:30:00 | 12 |
| 2012-08-03 04:30:00 | 4 |
| 2012-08-03 05:30:00 | 3 |
| 2012-08-03 06:30:00 | 13 |
| 2012-08-03 07:30:00 | 269 |
| 2012-08-03 08:30:00 | 681 |
| 2012-08-03 09:30:00 | 1523 |
| 2012-08-03 10:30:00 | 911 |
+---------------------+-------+
The reason, as far as I could gauge, for the boundaries not being set properly is that unix_timestamp will convert time from my local timezone (GMT + 0530) to UTC and then output the numerical value.
So a value like 2012-08-03 00:00:00 will actually be 2012-08-02 18:30:00. Dividing and using floor will set the minutes part to 00. But when I use from_unixtime, it will convert it back to GMT + 0530 and hence give me intervals that begin at 30 mins.
How do I ensure the query works correctly irrespective of the timezone? I use MySQL 5.1.52 so to_seconds() is not available
EDIT:
The query should also fire correctly irrespective of the interval (can be hours, minutes, days). A generic solution would be appreciated
You can use TIMESTAMPDIFF to group by intervals of time:
For a specified interval of hours, you can use:
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(HOUR, '2012-08-03 00:00:00', timestamp) / <n>) * <n> HOUR AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
Replace the occurances of 2012-08-03 00:00:00 with your minimum input date.
<n> is your specified interval in hours (every 2 hours, 3 hours, etc.), and you can do the same for minutes:
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(MINUTE, '2012-08-03 00:00:00', timestamp) / <n>) * <n> MINUTE AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
Where <n> is your specified interval in minutes (every 45 minutes, 90 minutes, etc).
Be sure you're passing in your minimum input date (in this example 2012-08-03 00:00:00) as the second parameter to TIMESTAMPDIFF.
EDIT: If you don't want to worry about which interval unit to pick in the TIMESTAMPDIFF function, then of course just do the interval by seconds (300 = 5 minutes, 3600 = 1 hour, 7200 = 2 hours, etc.)
SELECT '2012-08-03 00:00:00' +
INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, '2012-08-03 00:00:00', timestamp) / <n>) * <n> SECOND AS start_time,
COUNT(*) AS total
FROM event
WHERE timestamp >= '2012-08-03 00:00:00'
GROUP BY start_time
EDIT2: To address your comment pertaining to reducing the number of areas in the statement where you have to pass in your minimum parameter date, you can use:
SELECT b.mindate +
INTERVAL FLOOR(TIMESTAMPDIFF(SECOND, b.mindate, timestamp) / <n>) * <n> SECOND AS start_time,
COUNT(*) AS total
FROM event
JOIN (SELECT '2012-08-03 00:00:00' AS mindate) b ON timestamp >= b.mindate
GROUP BY start_time
And simply pass in your minimum datetime parameter once into the join subselect.
You can even make a second column in the join subselect for your seconds interval (e.g. 3600) and name the column something like secinterval... then change the <n>'s to b.secinterval, so you only have to pass in your minimum date parameter AND interval one time each.
SQLFiddle Demo
the easier method would be:
Method1
select date(timestamp) as date_timestamp, hour(timestamp) as hour_timestamp, count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by date_timestamp, hour_timestamp
if you would like to use your original approach.
Method2
select from_unixtime(floor(unix_timestamp(event.timestamp-1800)/3600)*3600+1800) as start_time,
count(*) as total
from event
where timestamp>='2012-08-03 00:00:00'
group by start_time;
EDIT1
for the first method, it also allows user to set different interval.
For example, if user wants the log to group by 15 minutes,
select date(time) as date_timestamp,
hour(time) as hour_timestamp,
floor(minute(time) as minute_timestamp / 15) * 15 as minute_timestamp
count(*) as total
from event
group by date_timestamp, hour_timestamp, minute_timestamp