How to get a rolling data set by week with sql - mysql

I had a sql query I would run that would get a rolling sum (or moving window) data set. I would run this query for every 7 days, increase the interval number by 7 (28 in example below) until I reached the start of the data. It would give me the data split by week so I can loop through it on the view to create a weekly graph.
SELECT *
FROM `table`
WHERE `row_date` >= DATE_SUB(NOW(), INTERVAL 28 DAY)
AND `row_date` <= DATE_SUB(NOW(), INTERVAL 28 DAY)
This is of course very slow once you have several weeks worth of data. I wanted to replace it with a single query. I came up with this.
SELECT *
CONCAT(YEAR(row_date), '/', WEEK(row_date)) as week_date
FROM `table`
GROUP BY week_date
ORDER BY row_date DESC
It appeared mostly accurate, except I noticed the current week and the last week of 2015 was much lower than usual. That's because this query gets a week starting on Sunday (or Monday?) meaning that it resets weekly.
Here's a data set of employees that you can use to demonstrate the behavior.
CREATE TABLE employees (
id INT NOT NULL,
first_name VARCHAR(14) NOT NULL,
last_name VARCHAR(16) NOT NULL,
row_date DATE NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO `employees` VALUES
(1,'Bezalel','Simmel','2016-12-25'),
(2,'Bezalel','Simmel','2016-12-31'),
(3,'Bezalel','Simmel','2017-01-01'),
(4,'Bezalel','Simmel','2017-01-05')
This data will return the last 3 rows on the same data point on the old query (last 7 days) assuming you run it today 2017-01-06, but only the last 2 rows on the same data point on the new query (Sunday to Saturday).
For more information on what I mean by rolling or moving window, see this English stack exchange link.
https://english.stackexchange.com/questions/362791/word-for-graph-that-counts-backwards-vs-graph-that-counts-forwards
How can I write a query in MySQL that will bring me rolling data, where the last data point is the last 7 days of data, the previous point is the previous 7 days, and so on?

I've had to interpret your question a lot so this answer might be unsuitable. It sounds like you are trying to get a graph showing data historically grouped into 7-day periods. Your current attempt does this by grouping on calendar week instead of by 7-day period leading to inconsistent size of periods.
So using a modification of your dataset on sql fiddle ( http://sqlfiddle.com/#!9/90f1f2 ) I have come up with this
SELECT
-- Figure out how many periods of 7 days ago this record applies to
FLOOR( DATEDIFF( CURRENT_DATE , row_date ) / 7 ) AS weeks_ago,
-- Count the number of ids in this group
COUNT( DISTINCT id ) AS number_in_week,
-- Because this is grouped, make sure to have some consistency on what we select instead of leaving it to chance
MIN( row_date ) AS min_date_in_week_in_dataset
FROM `sample_data`
-- Groups by weeks ago because that's what you are interested in
GROUP BY weeks_ago
ORDER BY
min_date_in_week_in_dataset DESC;

Related

How to query available item leases based on a date range in MySQL?

We have a business that rents out international phone numbers to customers when traveling. When a customer makes an order We want to display to the customer the available phone numbers for his booking dates based on his start_date and end_date and numbers which is not occupied yet.
Since these phone numbers are rented out, I need to select from the table ONLY those numbers that are not rented out yet for dates that would interfere with the current customers dates.
I also don't want to rent out any phone number prior to 7 days after its end date. Meaning, If a customer booked a phone number for 1-1-2020 through 1-20-2020, I don't want this phone number to be booked by another customer before 1-27-2020. I want the phone number to have a 7 day window of being clear.
I have a table with the phone numbers and a table with the orders that is related to the phone numbers table via phone_number_id. The orders table has the current customers start_date and end_date for travel without the phone number id saved yet to it. The orders table also has the start_date and end_date for all other customers dates of travel as well as which phone_number_id was assigned/booked up for their travel dates.
How would the MySQL query look like when trying to select the phone numbers that are available for the current customers dates?
I build below query at the moment
SELECT x.id
, x.area_code
, x.phone_number
, y.start_date
, y.end_date
FROM vir_num_table x
LEFT
JOIN orderitemsdetail_table y
ON y.vn_id = x.id
WHERE y.start_date BETWEEN '2020-01-11' AND '2020-01-18'
OR y.start_date IS NULL
I've build this query but stuck here how can I add end_date logic.
Any help would be appreciated! Thanks in advance.
The way I'd approach the problem would be to look at conceptually, is as a cross product of the set of all phone numbers, along with the reservation timeframe, and then exclude those where there's a conflicting reservation.
A conflict would be an overlap, existing reservation that has a start_date before the end of the proposed reservation AND has an end_date on or after the start of the proposed reservation.
I'd do an anti-join pattern, something like this:
SELECT pn.phone_number
FROM phone_number pn
LEFT
JOIN reservation rs
ON rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
WHERE rs.phone_number IS NULL
That essentially says get all rows from phone number, along with matching rows from reservations (rows that overlap), but then exclude all the rows that had a match, leaving just phone_number rows that did not have a match.
We can make the < test a <= or , subtract 8 days, to tailor the "7 day" window before; we can tweak as we run the query through the test cases,
We can achieve an equivalent result using a NOT EXISTS and a correlated subquery. Some people find this easier to comprehend than the ant-join, but its essentially the same query, doing the same thing, get all rows from phone_number but exclude the rows where there is a matching (overlapping) row in reservation
SELECT pn.phone_number
FROM phone_number pn
WHERE NOT EXISTS
( SELECT 1
FROM reservation rs
WHERE rs.phone_number = pn.phone_number
AND rs.start_dt <= '2019-12-27' + INTERVAL +7 DAY
AND rs.end_dt > '2019-12-20' + INTERVAL -7 DAY
)
There are several questions on StackOverflow about checking for overlap, or no overlap, of date ranges.
See e.g.
How to check if two date ranges overlap in mysql?
PHP/SQL - How can I check if a date user input is in between an existing date range in my database?
MySQL query to select distinct rows based on date range overlapping
EDIT
Based on the SQL added as an edit to the question, I'd do the query like this:
SELECT pn.`id`
, pn.`area_code`
, pn.`phone_number`
FROM `vir_num_table` pn
LEFT
JOIN `orderitemsdetail_table` rs
ON rs.vn_id = pn.id
AND rs.start_date <= '2020-01-18' + INTERVAL +7 DAY
AND rs.end_date > '2020-01-11' + INTERVAL -7 DAY
WHERE rs.vn_id IS NULL
The two "tricky" parts. First is the anti-join, understanding how that works. (An outer join, to return all rows from vir_num_table but exclude any rows that have a matching row in reservations. The second tricky part is checking for the overlap, coming up with the conditions: r.start <= p.end AND r.end >= p.start, then tweaking whether we want to include the equals as an overlap, and tweaking the extra seven days (easiest to me to just subtract the 7 days from the beginning of the proposed reservation)
... now occurs to me like we need to add a guard period of 7 days on the end of the reservation period as well, doh!
Here's a query plus sorting algo to choose the optimal phone number selection for maximum utilization efficiency (i.e. getting as close as possible to exactly 7 days before and after each use).
I set it to give open ends a weight of 9, so that "near perfect" fits (7-8 days before or after) would be selected ahead of open-ended numbers. This will yield a slight efficiency improvement, as open numbers can accommodate any reservation. You can adjust this for your needs. If you set this to 0, for example, it would always select open numbers first.
SELECT ph.phone_number,
COALESCE(
MIN(
IF(res.end_date > res.start_date > '2020-01-18',
NULL, -- ignore before-comparison for reservations starting and ending after date range
DATEDIFF('2020-01-11', res.end_date)
), 9) AS open_days_before,
COALESCE(
MIN(
IF(res.start_date < res.end_date < '2020-01-11',
NULL, -- ignore after-comparison for reservations starting and ending before date range
DATEDIFF(res.start_date, '2020-01-18')
), 9) AS open_days_after
FROM phone_number ph
LEFT JOIN reservation res
ON res.phone_number = ph.phone_number
AND res.end_date >= CURRENT_DATE() - INTERVAL 6 DAY
GROUP BY ph.phone_number
HAVING open_days_before >= 7
AND open_days_after >= 7
ORDER BY open_days_before + open_days_after
LIMIT 1
Edit: updated to add grouping, because I realize this is an aggregate problem.
Edit 2: bug fix, changed MAX to MIN
Edit 3: added res.end_date >= CURRENT_DATE - INTERVAL 6 DAY to ignore past reservations, limiting aggregate data and treating phone number with no reservations between 6 days ago and the beginning of the new order as "open on the front-end"
Edit 4: added IF conditions to eliminate reservations outside the given before-or-after comparison ranges (e.g. comparing reservations after the selected range from influencing the "open days before" number), to prevent negative numbers, except when there's overlap with the selected range.
Based on the info you've added then you shouldn't need to check the start date of phone numbers which have been booked out.
You customer provides you with a start date and an end date.
You only rent out phone numbers 7 days after their last lease ended
All you need to do is fetch back phone numbers which either:
- Are not rented out and therefor aren't in the orderitems table
- OR have an end_date which is 7 days before the new customer's start date.
Here you go:
SELECT
`main_table`.`id`,
`main_table`.`area_code`,
`main_table`.`phone_number`,
`orderitemsdetail_table`.`start_date`,
`orderitemsdetail_table`.`end_date`
FROM
`vir_num_table` AS `main_table`
LEFT JOIN
`orderitemsdetail_table` AS `orderitemsdetail_table` ON main_table.id = orderitemsdetail_table.vn_id
WHERE
(DATE_ADD(orderitemsdetail_table.end_date, INTERVAL 7 DAY) < '<CUSTOMER START DATE>'
AND orderitemsdetail_table.start_date > '<CUSTOMER END DATE>')
OR orderitemsdetail_table.id IS NULL

SQL Statement Database

I have a Mysql Table that holds dates that are booked (for certain holiday properties).
Example...
Table "listing_availability"
Rows...
availability_date (this shows the date format 2013-04-20 etc)
availability_bookable (This can be yes/no. "Yes" = the booking changeover day and it is "available". "No" means the property is booked for those dates)
All the other dates in the year (apart from the ones with "No") are available to be booked. These dates are not in the database, only the booked dates.
My question is...
I have to make a SQL Statement that first calls the Get Date Function (not sure if this is correct terminology)
Then removes the dates from "availability_date" WHERE "availability_bookable" = "No"
This will give me the dates that are available for bookings, for the year, for a property.
Can anyone help?
Regards M
Seems like you've almost written the query.
SELECT availability_date FROM listing_availability
WHERE availability_bookable <> 'NO'
AND availability_date >= CURDATE()
AND YEAR(CURDATE()) = YEAR(availability_date)
I think I understand, and you'll obviously confirm. Your "availability_booking" has some records in it, but not every single day of the year, only those that may have had something, and not all are committed, some could have yes, some no.
So, you want to simulate All dates within a given date range... Say April 1 - July 1 as someone is looking to book a party within that time period. Instead of pre-filling your production table, you can't say that April 27th is open and available... since no such record exists.
To SIMULATE a calendar of days for a date range, you can do it using MySQL variables and join to "any" table in your database provided it has enough records to SIMULATE the date range you want...
select
#myDate := DATE_ADD( #myDate, INTERVAL 1 DAY ) as DatesForAvailabilityCheck
from
( select #myDate := '2013-03-31' ) as SQLVars,
AnyTableThatHasEnoughRows
limit
120;
This will just give you a list of dates starting with April 1, 2013 (the original #myDate is 1 day before the start date since the field selection adds 1 day to it to get to April 1, then continues... for a limit of 120 days (or whatever you are looking for range based -- 30days, 60, 90, 22, whatever). The "AnyTableThatHasEnoughRows" could actually be your "availability_booking" table, but we are just using it as a table with rows, no join or where condition, just enough to get ... 120 records.
Now, we can use this to join to whatever table you want and apply your condition. You just created a full calendar of days to compare against. Your final query may be different, but this should get it most of the way for you.
select
JustDates.DatesForAvailabilityCheck,
from
( select
#myDate := DATE_ADD( #myDate, INTERVAL 1 DAY ) as DatesForAvailabilityCheck
from
( select #myDate := '2013-03-31' ) as SQLVars,
listing_availability
limit
120 ) JustDates
LEFT JOIN availability_bookable
on JustDates.DatesForAvailabilityCheck = availability_bookable.availability_date
where
availability_bookable.availability_date IS NULL
OR availability_bookable.availability_bookable = "Yes"
So the above uses the sample calendar and looks to the availability. If no such matching date exists (via the IS NULL), then you want it meaning there is no conflict. However, if there IS a record in the table, you only want those where YES, you CAN book it, the entry on file might not be committed and CAN be in your result query of available dates.

Mysql maximum rows in a variable timeframe

I'm making a fitness logbook where indoor rowers can log there results.
To make it interesting and motivating I'm implementing an achievement system.
I like to have an achievement that if someone rows more than 90 times within 24 weeks they get that achievement.
Does anybody have some hints in how i can implement this in MYSQL.
The mysql-table for the logbook is pretty straightforward: id, userid, date (timestamp),etc (rest is omitted because it doesn't really matter)
The jist is that the first rowdate and the last one can't exceed the 24 weeks.
I assume from your application that you want the most recent 24 weeks.
In mysql, you do this as:
select lb.userid
from logbook lb
where datediff(now(), lb.date) >= 7*24
group by userid
having count(*) >= 90
If you need it for an arbitrary 24-week period, can you modify the question?
Just do a sql query to count the number of rows a user has between now and 24 weeks ago. This is a pretty straight forward query to run.
Look at using something with datediff in mysql to get the difference between now and 24 weeks ago.
After you have a script set up to do this, set up a cron job to run either every day or every week and do some automation on this.
I think you should create a table achievers which you populate with the achievers of each day.
You can set a recurrent(daily, right before midnight) event in which you run a query like this:
delete from achievers;
insert into achievers (
select userid
from logbook
where date < currenttimestamp and date > currenttimestamp - 24weeks
group by userid
having count(*) >= 90
)
For events in mysql: http://dev.mysql.com/doc/refman/5.1/en/events-overview.html
This query will give you the list of users total activity in 24 weeks
select * from table groupby userid where `date` BETWEEN DATE_SUB( CURDATE( ) ,INTERVAL 168 DAY ) AND CURDATE( ) having count(id) >= 90

MySQL - Selecting closest row to a certain time on each day of a month

Say I have a table with two columns
TimeStamp of type TIMESTAMP
A of type FLOAT
This table is created and updated by an external application, so inserts and updates are outside of my control. The table design can't be altered in any way.
What I need to do is select each entry closest to and before 10AM for each day during the entire past month.
Thanks in advance.
The inner pre-query should get on a per year/month basis, prior to the month you are currently in. This is forced by a SQLVariable created by the formatted 'YYYY-MM-01' date, such as today... 2012-03-19, keep just year/month but force 01. This also implies timestamp of 12:00:00 am (midnight). The NEXT # variable is to determine the first of the month PRIOR to the one just computed... thus 2012-02-01. That builds the variables for the WHERE clause queried against your table of timestamp/float values.
Now, you can get the maximum time, grouped by just the common date portion of the timestamp, but retaining the full actual date AND time of the entry where the HOUR() of the entry is before 10am...
From that, re-join back to the original table where the FINAL "LastPerDay" time matches the per-day basis. Now, you MAY get multiple entries if the actual last timestamp entry for the same day actually HAS multiple exact time entries to the granularity of hh:mm:ss (or whatever precision)
select
PreQuery.JustTheDate,
YT2.FloatColumnName
from
( select
Date_Format( YT.TimeStampColumn, '%Y-%m-%d' ) JustTheDate,
max( YT.TimeStampColumn ) as LastPerDay
from
( select #FirstOfThisMonth := Date_Format( '%Y-%m-01' ),
#FirstOfPriorMonth := Date_Sub( #FirstOfThisMonth, interval 1 month ) ) sqlvars,
YourTable YT
where
YT.TimeStampColumn >= #FirstOfPriorMonth
AND YT.TimeStampColumn < #FirstOfThisMonth
AND Hour( YT.TimeStampColumn ) < 10
group by
`JustTheDate`
order by
`JustTheDate` DESC ) PreQuery
JOIN YourTable YT2
ON PreQuery.LastPerDay = YT2.TimeStampColumn

Select Top Viewed From Last 7 Days

I have a table with a date stamp E.g (1241037505). There's also a column with the number of views.
The data stamp resembles when it was created.
So I want to select the top viewed threads from the past week.
How do I do this?
Try this:
SELECT * WHERE
DATEDIFF(NOW(),created_date) < 7
SELECT * FROM table WHERE createdon > SUBDATE(NOW(), '7 day') ORDER BY hits DESC;
See: http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_subdate
The data you're currently tracking isn't going to allow you to select the top viewed in the last week. It will show you the top viewed over all time, or the most viewed items created in the last week. If something was created two weeks ago, but was viewed more than anything else during the last week you cannot determine that from the data you're tracking. One way I can see to do it would be to track the number of hits each content item gets each day of the week.
create table daily_hits {
cid integer, -- content id points to the table you already have
dotw smallint, -- 0-6 or similar
hits integer
PRIMARY KEY (cid, dotw)
}
Whenever you increase the hit count on the content item, you would also update the daily_hits table for the given content id and day of the week. You would need a function that converted the current date/time to a day of the week. MySql provides DAYOFWEEK for this purpose.
To get the most viewed in the last week, you could query like this:
SELECT cid, SUM(hits) FROM daily_hits GROUP BY cid ORDER BY SUM(hits) DESC
You will need some type of scheduled job that deletes the current day of the week at midnight so you aren't accumulating forever and essentially performing the same accumulation happening on the hits column of the current table.
SELECT * FROM table WHERE Date_Created > (7 days ago value) ORDER BY Hits LIMIT 0,100
or you could use this (per WishCow's Answer)
SELECT * FROM table WHERE Date_Created > SUBDATE(NOW(), '7 day') ORDER BY Hits LIMIT 0,100