MySQL query to select distinct rows based on date range overlapping - mysql

Let's say we have a table (table1) in which we store 4 values (user_id, name, start_date, end_date)
table1
------------------------------------------------
id user_id name start_date end_date
------------------------------------------------
1 1 john 2016-04-02 2016-04-03
2 2 steve 2016-04-06 2016-04-06
3 3 sarah 2016-04-03 2016-04-03
4 1 john 2016-04-12 2016-04-15
I then enter a start_date of 2016-04-03 and end_date of 2016-04-03 to see if any of the users are available to be scheduled for a job. The query that checks for and ignores overlapping dates returns the following:
table1
------------------------------------------------
id user_id name start_date end_date
------------------------------------------------
2 2 steve 2016-04-06 2016-04-06
4 1 john 2016-04-12 2016-04-15
The issue I am having is that John is being displayed on the list even though he is already booked for a job for the dates I am searching for. The query returns TRUE for the other entry because the dates don't conflict, but i would like to hide John from the list completely since he will be unavailable.
Is there a way to filter the list and prevent the user info from displaying if the dates entered conflict with another entry for the same user?
An example of the query:
SELECT DISTINCT id, user_id, name, start_date, end_date
FROM table1
WHERE ('{$startDate}' NOT BETWEEN start_date AND end_date
AND '{$endDate}' NOT BETWEEN start_date AND end_date
AND start_date NOT BETWEEN '{$startDate}' AND '{$endDate}'
AND end_date NOT BETWEEN '{$startDate}' AND '{$endDate}');

The "solution" in the question doesn't look right at all.
INSERT INTO table1 VALUES (5,2,'steve', '2016-04-01','2016-04-04')
Now there's a row with Steve having an overlap.
And the query proposed as a SOLUTION in the question will return 'steve'.
Here's a demonstration of building a query to return the users that are "available" during the requested period, because there is no row in table1 for that user that "overlaps" with the requested period.
First problem is getting the users that are not available due to the existence of a row that overlaps the requested period. Assuming that start_date <= end_date for all rows in the table...
A row overlaps the requested period, if the end_date of the row is on or after the start of the requested period, and the start_date of the row is on or before the ed of the requested period.
-- users that are "unavailable" due to row with overlap
SELECT t.user_id
FROM table1 t
WHERE t.end_date >= '2016-04-03' -- start of requested period
AND t.start_date <= '2016-04-03' -- end of requested_period
GROUP
BY t.user_id
(If our assumption that start_date <= end_date doesn't hold, we can add that check as a condition in the query)
To get a list of all users, we could query a table that has a distinct list of users. We don't see a table like that in the question, so we can get a list of all users that appear in table1 instead
SELECT l.user_id
FROM table1 l
GROUP BY l.user_id
To get the list of all users excluding the users that are unavailable, there are couple of ways we can write that. The simplest is an anti-join pattern:
SELECT a.user_id
FROM ( -- list of all users
SELECT l.user_id
FROM table1 l
GROUP BY l.user_id
) a
LEFT
JOIN ( -- users that are unavailable due to overlap
SELECT t.user_id
FROM table1 t
WHERE t.end_date >= '2016-04-03' -- start of requested period
AND t.start_date <= '2016-04-03' -- end of requested_period
GROUP
BY t.user_id
) u
ON u.user_id = a.user_id
WHERE u.user_id IS NULL

will this work?
SELECT user_id DISTINCT FROM table1 WHERE (DATEDIFF(_input_,start_date) > 0 AND
DATEDIFF(_input_,end_date) > 0) OR
(DATEDIFF(_input_,start_date) < 0);

Related

User Churn - Final outer statement in a cte

I have a table below as
timestamp | user_id | activity
2021-02-01 03:21:11 mike12 read
2021-02-02 03:45:22 bob55 like
2021-02-03 04:21:33 sarah22 post
2021-02-01 04:11:33 cindy11 sign-in
I want to calculate # users churned in last 7 days as =
number of all users - active users (where active are those who like, read, comment, or post
with active_users as
(
select count(distinct user_id)
from table
where activity IN ('comment','post','read','like')
and date_diff(timestamp, current_date()) <= 7
)
, inactive_users as
(select count(distinct user_id)
from table
where activity IN ('sign-in')
and date_diff(timestamp, current_date()) <= 7)
What would be the correct way to subtract the two above? I am unsure of how to join the two ctes in the final query, thanks for helping!

MySQL - Group By Latest and Join First Instance

I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.
Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time
Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle
you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id

group by date, even if there is no entry for the date

I want to visualize my entries by counting how many have been created at the same day.
SELECT dayname(created_at), count(*) FROM logs
group by day(created_at)
ORDER BY created_at desc
LIMIT 7
So I get something like:
Thursday 4
Wednesday 12
Monday 4
Sunday 1
Saturday 20
Friday 23
Thursday 10
But I also want to have the Tuesday in there with 0 so I have it for one week.
Is there a way to do this with full mysql or do I need to update the result before I can give it to the chart?
EDIT:
This is the final query:
SELECT
DAYNAME(date_add(NOW(), interval days.id day)) AS day,
count(logs.id) AS amount
FROM days LEFT OUTER JOIN
(SELECT *
FROM logs
WHERE TIMESTAMPDIFF(DAY,DATE(created_at),now()) < 7) logs
on datediff(created_at, NOW()) = days.id
GROUP BY days.id
ORDER BY days.id desc;
The table days includes numbers from 0 to -6
You only need a table of offsets which could be a real table or something built on the fly like select 0 ofs union all select -1 ....
create table days (ofs int);
insert into days (ofs) values
(0), (-1), (-2), (-3),
(-4), (-5), (-6), (-7);
select
date_add('20160121', interval days.ofs day) as created_at,
count(data.id) as cnt
from days left outer join logs data
on datediff(data.created_at, '20160121') = days.ofs
group by days.ofs
order by days.ofs;
http://sqlfiddle.com/#!9/3e6bc7/1
For performance it would probably be better to limit the search in the data (logs) table:
select
date_add('20160121', interval days.ofs day) as created_at,
count(data.id) as cnt
from days left outer join
(select * from logs where created_at between <start> and <end>) data
on datediff(data.created_at, '20160121') = days.offset
group by days.offset
order by days.offset;
One downside is that you do have to parameterize this with a fixed anchor date in a couple of expressions. It might be better to have a table of real dates sitting in a table somewhere so you don't have to do the calculations.
Use RIGHT JOIN to a dates table, so you can request data for each and all days, no matter if some days have data or not, simply, mull days will show as CERO or NULL.
You can create a dates table, some sort of calendar table.
id_day | day_date |
--------------------
1 | 2016-01-01 |
2 | 2016-01-02 |
.
.
365 | 2016-12-31 |
With this table, you can relate date, then extract day, month, week, whatever you want with MYSQL DATE AND TIME FUNCTIONS
SELECT t2.dayname(day_date), count(t1.created_at) FROM logs t1 right join dates_table t2 on t1.created_at=t2.day_date group by t2.day_date ORDER BY t1.created_at desc LIMIT 7

MYSQL get latest job number

I have this table:
TableNumber(Int 0 to 25)|Name(varchar 100)|Project(varchar 15)|StartDate(Datetime)
1 |David |P1 |'2015-02-06 08:00:00'
2 |Sebastien |P2 |'2015-02-06 08:00:00'
1 |David |P4 |'2015-02-06 12:00:00'
2 |Sebastien |P3 |'2015-02-07 08:00:00'
And I am looking to get the latest job for each person on a set day.
I would like to have:
TableNumber(Int 0 to 25)|Name(varchar 100)|Project(varchar 15)|StartDate(Datetime)
2 |Sebastien |P2 |'2015-02-06 08:00:00'
1 |David |P4 |'2015-02-06 12:00:00'
So I want to exclude P3 since its not '2015-02-06' and I want to exclude P1 cause its not the latest job for David (its P4).
Please consider that NOW() returns '2015-02-06 15:00:00' in the following exemples.
Here is what I tried:
SELECT * FROM MyTable WHERE DATEDIFF(startdate, NOW()) = 0 ORDER BY tablenum DESC;
But this only excluded P3
So I tried this instead:
SELECT * FROM MyTable AS p WHERE DATEDIFF(p.startdate, NOW()) = 0 AND TIMEDIFF(p.startdate, NOW()) = (SELECT MAX(TIMEDIFF(p2.startdate, NOW())) FROM MyTable AS p2 WHERE p2.startdate = p.startdate) ORDER BY tablenum DESC;
But Its still doesn't exclude P1.
Anyone know how I could achieve this? BTW startdate will always be a round hour (08:00:00 or 12:00:00 or 22:00:00...)
UPDATE
Yeah since it wasn't very clear what I wanted I will clarify here:
I need to know the last project worked on by every person.
so in my table I need to know that Sebastien work on P2 on table number 2 and that David work on P4 on table number 1. I don't want P1 because its not the last project that David worked on (by last project I also include the project he is working on right now). I also want to rule out everything in the future so P3 (who is tomorrow) must not display.
The following query will provide you with the date/time of the earliest job for a given name on a given day. In the following example I assumed you want the earliest jobs of each user on 2015-02-06.
SELECT Name, min(StartDate)
FROM MyTable
WHERE StartDate >= '2015-02-06'
AND StartDate < '2015-02-07'
GROUP BY Name
Using the above query, you can trivially get the final solution:
SELECT t1.project, t2.name, t2.StartDate
FROM MyTable t1 INNER JOIN
(SELECT Name, min(StartDate)
FROM MyTable
WHERE StartDate >= '2015-02-06'
AND StartDate < '2015-02-07'
GROUP BY Name) t2 ON t1.Name = t2.Name AND t1.StartDate = t2.StartDate

SQL Query for columns with a unique value

I have a table which looks like this
courseid session_date title published
1 2012-07-01 Training Course A 0
1 2012-07-02 Training Course A 0
2 2012-07-04 Training Course B 1
2 2012-07-07 Training Course B 1
3 2012-07-05 Training Course C 1
3 2012-07-06 Training Course C 1
4 2012-07-07 Training Course D 1
4 2012-07-10 Training Course D 1
The table has two entries for each ID and Title because the session_date column shows the start date and the end date of the course.
I am trying to create a query that will pull the next five courses without showing any courses in the past.
I have gotten this far
SELECT session_date, title, courseid
FROM table
WHERE published = 1 AND session_date > DATE(NOW())
ORDER BY session_date ASC LIMIT 0,5
This pulls rows from the table for the next five session-dates but it includes both start dates and finish dates whereas I need the next five courses ordered by start date.
I need to create a query that will pull the earliest session_date for each courseid but ignore the row with the latest session_date for that same courseid but I am at a complete loss of how to do this.
Any help or advice would be most gratefully received.
If you group your results by course and select the MAX(session_date), you will get the latest of the dates associated with each course (i.e. the finish date):
SELECT courseid, MIN(session_date) AS start_date
FROM `table`
WHERE published = 1
GROUP BY courseid
HAVING start_date > CURRENT_DATE
ORDER BY start_date ASC
LIMIT 5
See it on sqlfiddle.
What you need to do is retrieve only the rows with the minimum session_date per courseid group and order by that resulting set:
SELECT
b.*
FROM
(
SELECT courseid, MIN(session_date) AS mindate
FROM tbl
GROUP BY courseid
) a
INNER JOIN
tbl b ON a.courseid = b.courseid AND a.mindate = b.session_date
WHERE
b.session_date > NOW() AND
b.published = 1
ORDER BY
b.session_date
LIMIT 5
But a much better design would be to only have one row per courseid and have two columns specifying start and end dates:
tbl
------------------
courseid [PK]
start_date
end_date
title
published
Then you can simply do:
SELECT *
FROM tbl
WHERE start_date > NOW() AND published = 1
ORDER BY start_date
LIMIT 5
Since values of all the columns in your SELECT clause are repeating, just use DISTINCT
SELECT distinct session_date, title, courseid
FROM table
WHERE published = 1 AND session_date > DATE(NOW())
ORDER BY session_date ASC LIMIT 0,5