Finding gaps in concurrent date ranges - MySQL - mysql

I have a table like the one below. In reality there are 50,000 users, and a technically infinite number of ranges for each user. There is no limit on date gaps, starts, ends, overlaps, etc.
User From To
A 2011-01-03 2013-04-09
A 2012-04-16 2012-03-08
A 2012-12-11 2013-06-17
A 2013-07-17
A 2013-09-22 2013-12-24
B 2011-04-06 2013-01-02
B 2012-02-12 2012-02-14
B 2012-11-10 2013-03-16
B 2013-04-16
B 2013-04-22
I need to calculate the number of weekdays in 2013 not covered by these ranges for each user. The blank 'To' date means the range is ongoing.
In the example above it would be the number of weekdays between 2013-06-18 and 2013-07-16 for user A, and between 2013-03-17 and 2013-04-15 for B.
I have a lookup table of individual weekdays, but anything I do to the date ranges using min and max ends up giving me a 'solid' date range from 2013-01-01 to 2013-12-31.
I'm not bright....
Thank you.

SELECT users.User, COUNT(*)
FROM users
CROSS JOIN weekdays
LEFT JOIN userDates ON
userDates.User = users.User
AND userDates.From <= weekdays.date
AND (userDates.To IS NULL OR userDates.To >= weekdays.date)
WHERE weekdays.date >= '2013-01-01'
AND weekdays.date < '2014-01-01'
AND userDates.User IS NULL
GROUP BY users.User

Related

MySQL LEFT JOIN using conditional operators (>= and <) not returning null values for the joined table

I have a simple table for hotel bookings (booking). I'm working with MySQL.
booking_date last_name room_no nights
2016-11-19 McDonnell 207 4
2016-11-20 Jenkins 203 5
2016-11-22 Ross 209 3
2016-11-23 Whitford 207 2
2016-11-27 Berry 207 2
For each day in the period 2016-11-21 to 2016-11-27, I want to know who occupied room 207, excluding anyone who checked out that day. For example, we can see that Whitford checked out on 2016-11-25 so he should not be listed as an occupant. If there are no occupants for a given date, I want the query to return NULL for that date.
I created a simple calendar (hotel_calendar):
StayDate
2016-11-21
2016-11-22
2016-11-23
2016-11-24
2016-11-25
2016-11-26
2016-11-27
I attempted to find the occupant for each day on the calendar by matching:
SELECT StayDate AS 'Date',
last_name AS 'Last Name'
FROM hotel_calendar
LEFT JOIN booking ON
(StayDate >= booking_date AND
StayDate < booking_date + INTERVAL nights DAY)
WHERE room_no = 207 AND
StayDate BETWEEN '2016-11-21' AND '2016-11-27';
My results:
Date Last Name
2016-11-21 McDonnell
2016-11-22 McDonnell
2016-11-23 Whitford
2016-11-24 Whitford
2016-11-27 Berry
The room is unoccupied on 2016-11-25 and 2016-11-26. Even though I used LEFT JOIN, the query didn’t return the NULL values for those dates. Is this due to an incorrect use of conditional operators?
First, you should qualify all your column names, so it is clear where they come from. Second, never use single quotes around column aliases -- just leads to problems.
Your real issue, though, is that conditions on the second table in a LEFT JOIN need to go into the ON clause, not the WHERE clause. Otherwise, the JOIN turns into an INNER JOIN:
SELECT c.StayDate AS Date, b.last_name
FROM hotel_calendar c LEFT JOIN
booking b
ON c.StayDate >= b.booking_date AND
c.StayDate < b.booking_date + INTERVAL nights DAY AND
b.room_no = 207
WHERE c.StayDate BETWEEN '2016-11-21' AND '2016-11-27';

Querying last two days depending on a Workday or not table

We use Mysql and we're trying to get averages from last two workdays from a hourly data set like this.
Date Price
2016-12-13 00:00 187,68
2016-12-13 01:00 201
2016-12-13 02:00 211,66
2016-12-13 03:00 215,84
So we created a table named (Workdays) that shows if the day is a workday or holiday like this:
Date Workday
2016-12-13 1
2016-12-14 1
2016-12-15 0
2016-12-16 0
1 means workday and 0 means weekend or National Holiday
At the and, we have to query Average price of the last two workdays seperately considering Workdays table
Is this possible?
Thanks a lot.
If I understand correctly, the table workdays really has a single row for each date. If so, you can get the most recent two workdays in a subquery and then use join to choose the rows in the first table:
select wd.date, avg(h.Price)
from hourly h join
(select wd.date
from workdays wd
where wd.workday = 1 and wd.date <= curdate() -- you might want <
order by wd.date desc
limit 2
) wd2
on date(h.date) = wd.date
group by wd.date;
try this
select W.Date, AVG(Price)
from Prices p
join Workdays W on
w.Date = DATE(P.Date)
AND Workday=1
and W.Date <= curdate()
group by W.Date
order by W.Date desc;

Summing data for last 7 day look back window

I want a query that can give result with sum of last 7 day look back.
I want output date and sum of last 7 day look back impressions for each date
e.g. I have a table tblFactImps with below data:
dateFact impressions id
2015-07-01 4022 30
2015-07-02 4021 33
2015-07-03 4011 34
2015-07-04 4029 35
2015-07-05 1023 39
2015-07-06 3023 92
2015-07-07 8027 66
2015-07-08 2024 89
I need output with 2 columns:
dateFact impressions_last_7
query I got:
select dateFact, sum(if(datediff(curdate(), dateFact)<=7, impressions,0)) impressions_last_7 from tblFactImps group by dateFact;
Thanks!
If your fact table is not too big, then a correlated subquery is a simple way to do what you want:
select i.dateFact,
(select sum(i2.impressions)
from tblFactImps i2
where i2.dateFact >= i.dateFact - interval 6 day
) as impressions_last_7
from tblFactImps i;
You can achieve this by LEFT OUTER JOINing the table with itself on a date range, and summing the impressions grouped by date, as follows:
SELECT
t1.dateFact,
SUM(t2.impressions) AS impressions_last_7
FROM
tblFactImps t1
LEFT OUTER JOIN
tblFactImps t2
ON
t2.dateFact BETWEEN
DATE_SUB(t1.dateFact, INTERVAL 6 DAY)
AND t1.dateFact
GROUP BY
t1.dateFact;
This should give you a sliding 7-day sum for each date in your table.
Assuming your dateFact column is indexed, this query should also be relatively fast.

MySQL to calculate a value against price at the nearest date in another table

I have 2 tables where I want to calculate a value across. One table provides daily price data (prices) which is historic while the second contains annual data (volumes) which contains both historic and projected data. So far, I can join the two tables so that I can calculate the historic annual value by the price at the corresponding date in the daily one, but not for the future annual values where I would like to calculate against the last daily price in the prices table.
Current table structure
Table 1: Prices (3 fields: date,code,price)
Table 2: Volumes (3 fields: date,code,units)
Current query
SELECT v.date,p.price,v.units,CONCAT(p.price*v.units) AS Value
FROM Prices p,Volumes v
WHERE p.code = v.code AND p.date = v.date
AND v.code = 'X' AND (v.date BETWEEN '2012-12-31' AND '2016-12-31')
GROUP BY v.date
ORDER BY v.date
LIMIT 5;
results in 2012-2013 data rather than 2012-2016:
date 2012-12-31 2013-12-31
price 50 58
units 100 90
Value 5000 5220
My annual table (Volumes) has volume estimates for years 2014,2015 and 2016 (which are not shown) and the last price in the daily table (Prices) is say 65, but the above query will only match the volumes to the corresponding date value in the daily price table rather than using the nearest last value. Can someone advise me on how best to do this please?
example of tables (note: both tables use a composite primary key around 'date' and 'code')
Volumes
date code units
2012-12-31 X 100
2012-12-31 Y 50
2013-12-31 X 90
2013-12-31 Y 45
2014-12-31 X 95
2014-12-31 Y 47
Prices
date code price
2013-12-31 X 50
2013-12-31 Y 25
2014-01-01 X 58
2014-01-01 Y 27
2014-01-02 X 59
2014-01-02 Y 30
-----
2014-03-31 X 48
2014-03-31 Y 26
2014-04-01 X 49
2014-04-01 Y 27
last data point
If you want a query to get the future values by multiplying the future projected unit quantity with the most recent price recorded then this would be the query. see working FIDDLE
SELECT
*,
(future_units*last_price) as projeted_value --- dont need the CONCAT for this calculation
FROM(
SELECT
v.date,
v.units as future_units,
(SELECT
price
FROM prices
ORDER BY date desc --- desc for the most recent
LIMIT 1) as last_price --- just need the first one
FROM volumes v
WHERE v.date > NOW() --- to make it a future date
)as t --- table needs alias
updated query per the OP request.. working FIDDLE
This one calculates the price related with the date.. once you get to future dates it calculates the last price with the future units
SELECT
*,
if(DATE(the_date) > NOW(), --- if its a future date put in last_price, else put in the current_price
(future_units*last_price),
(future_units*current_price))AS value
FROM(
SELECT
v.date AS the_date,
v.units AS future_units,
v.code,
p.price AS current_price,
(SELECT
price
FROM prices
ORDER BY date DESC
LIMIT 1) AS last_price
FROM prices p
LEFT JOIN volumes v ON p.code = v.code
WHERE p.date >= v.date OR p.date < v.date
AND DATE(v.date) BETWEEN '2012-12-31' AND '2016-12-31' -- DATE() used to cut out timestamp for accurate comparison
GROUP BY v.date, v.code, v.units
)AS t

Group by week returning strange intervals

For some odd reason, group by week is returning odd date intervals with a datetime field.
"Completed" is a datetime field, and using this query:
SELECT
Completed,
COUNT( DISTINCT Table1.ID ) AS ActivityCount
FROM Table1
JOIN Table1Items
ON Table1.ID = Table1Items.ID
JOIN database_database.Table2
ON Table2.Item = Table1Items.Item
WHERE Completed != '0000-00-00' AND Completed >= '2012-09-25' AND Completed <= '2012-10-25'
GROUP BY WEEK(Completed)
I'm getting:
Completed ActivityCount CompletedTimestamp
2012-09-25 300 2012-09-25 00:00:00
2012-10-02 764 2012-10-02 00:00:00
2012-10-08 379 2012-10-08 00:00:00
2012-10-17 659 2012-10-17 00:00:00
2012-10-22 382 2012-10-22 00:00:00
some are 7 days apart, others are 6 days apart, others are 5.... and one is 9?
Why does it group the dates by such strange intervals instead of just 7 days?
The week function does not count the difference of the dates.
The week function returns the week number of a date. If you group by it, then in the group will be dates at the start and end of the week and in bettween. The difference betwween the single dates can be greater than 7 days or less.
The answer, as alluded to by juergen d, was to aggregate the date column -- use min or max depending on whether you want to the first day or last day of the week used as the consistent interval; e.g.:
SELECT MIN(Completed), COUNT( DISTINCT Table1.ID ) AS ActivityCount FROM Table1 JOIN Table1Items ON Table1.ID = Table1Items.ID JOIN database_database.Table2 ON Table2.Item = Table1Items.Item WHERE Completed != '0000-00-00' AND Completed >= '2012-09-25' AND Completed <= '2012-10-25' GROUP BY WEEK( Completed)