MySQL groupwise query - mysql

I'm attempting to calculate the CURRENT location of a person based on a schedule items table (schedules).
The basic premise is, you can schedule a person to be in an office for a certain period of time (let's say start_date=2015-10-01, end_date=2015-12-31). That is a schedule item. It has a 1toM relationship with a location - that's no problem, I have that part sorted.
However, whilst they're scheduled to be in that office, they may also be scheduled to attend an offsite/client office. So there will be another schedule entry for, say, start_date=2015-12-03, end_date=2015-12=04.
Here's the table structure.
Person table
----------------------------------------------
|person_id |person_name |person_email |
----------------------------------------------
|1 |John |john#example.org |
|2 |Jane |jane#example.org |
----------------------------------------------
Schedule table
--------------------------------------------------------------
|schedule_id |person_id |location_id |start_date |end_date |
--------------------------------------------------------------
|1 |1 |1 |2015-10-01 |2015-12-31 |
|2 |2 |2 |2015-10-15 |2016-01-15 |
|3 |1 |5 |2015-12-03 |2015-12-10 |
|4 |2 |7 |2015-12-04 |2015-12-12 |
--------------------------------------------------------------
When I'm querying a single record, I'm easily able to calculate where the person currently is. It's not so complex.
SELECT * FROM schedules
WHERE person_id = 1 AND start_date <= CURDATE() AND end_date >= CURDATE
ORDER BY end_date ASC, start_date DESC
LIMIT 0,1
However, when I need to generate a list of all people with their current schedule item, I'm running into issues. I had initially thought of just using a GROUP BY statement in the query, but that will only ever return the earliest schedule item that matches the query.
The problem therein, is that there are MULTIPLE schedule items that will match the query (this is part of the domain logic). However, I will always select the SHORTEST current stint as their CURRENT location.
I've used a groupwise query in the past to calculate the status of a person's employment based on the most recent status entry. However, because the schedule item has some slightly more complex logic in and around it (it has future scheduled items in it) I'm really just talking myself in circles as to the best approach.

A method using a sub query with substring_index. This gets all the schedule ids ordered by the length of time between the end and start dates, then uses SUBSTRING_INDEX to just get the first one. Then joins this against schedules to get the rest of the details.
SELECT *
FROM schedules
INNER JOIN
(
SELECT person_id, SUBSTRING_INDEX(GROUP_CONCAT(schedule_id ORDER BY DATEDIFF(end_date, start_date)), ',', 1) AS best_schedule_id
FROM schedules
WHERE person_id = 1
AND start_date <= CURDATE() AND end_date >= CURDATE
GROUP BY person_id
) sub0
ON schedules.schedule_id = sub0.best_schedule_id
AND schedules.person_id = sub0.person_id
Note, I have also returned the person id from the sub query. Not strictly necessary as the query is at the moment, but put it in place so if you start to want to bring back multiple people it will need little change.

You want to select all records with a starting date before and an end date after the current date. You can get one person multiple times. From that person you want to select the occurrence with the earliest end date. That means you have to order those record by end date and number them within the person group. Try this:
select * from (
select a.scheduled_id
, a.person_id
, a.location_id
, a.start_date
, a.end_date
, row_number() over (partition by a.person_id order by a.end_date) as rn
from schedules a
where getdate() between a.start_date and a.end_date
) tab
where rn=1
I added this afterwards because I realized that the row_number function is not available in MySQL. So this is the MySQL version. A bit more complicated but it should work:
select * from (
select #row_num := if(#prev_value=a.person_id,#row_num+1,1) as rn
, a.scheduled_id
, a.person_id
, a.location_id
, a.start_date
, a.end_date
, #prev_value := a.person_id as asgmnt
from schedules a,
(select #row_num:=1) x,
(select #prev_value:=0) y
where a.start_date<=curdate() and a.end_date>=curdate()
order by a.person_id, a.end_date
) tab
where rn=1

I took some motivation from what you gave me and simply decided to do another subquery on the result set, prior to doing a GROUP BY on the output.
SELECT s.schedule_id, s.person_id, s.location_id FROM (
SELECT * FROM schedules
WHERE person_id = 1 AND start_date <= CURDATE() AND end_date >= CURDATE
ORDER BY end_date ASC, start_date DESC
) AS s GROUP BY s.person_id
This appears to have given me the result set that I was after, unless anybody can think of a reason this would fail?

Related

MySQL - Group By Latest and Join First Instance

I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.
Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time
Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle
you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id

MySQL query to select distinct rows based on date range overlapping

Let's say we have a table (table1) in which we store 4 values (user_id, name, start_date, end_date)
table1
------------------------------------------------
id user_id name start_date end_date
------------------------------------------------
1 1 john 2016-04-02 2016-04-03
2 2 steve 2016-04-06 2016-04-06
3 3 sarah 2016-04-03 2016-04-03
4 1 john 2016-04-12 2016-04-15
I then enter a start_date of 2016-04-03 and end_date of 2016-04-03 to see if any of the users are available to be scheduled for a job. The query that checks for and ignores overlapping dates returns the following:
table1
------------------------------------------------
id user_id name start_date end_date
------------------------------------------------
2 2 steve 2016-04-06 2016-04-06
4 1 john 2016-04-12 2016-04-15
The issue I am having is that John is being displayed on the list even though he is already booked for a job for the dates I am searching for. The query returns TRUE for the other entry because the dates don't conflict, but i would like to hide John from the list completely since he will be unavailable.
Is there a way to filter the list and prevent the user info from displaying if the dates entered conflict with another entry for the same user?
An example of the query:
SELECT DISTINCT id, user_id, name, start_date, end_date
FROM table1
WHERE ('{$startDate}' NOT BETWEEN start_date AND end_date
AND '{$endDate}' NOT BETWEEN start_date AND end_date
AND start_date NOT BETWEEN '{$startDate}' AND '{$endDate}'
AND end_date NOT BETWEEN '{$startDate}' AND '{$endDate}');
The "solution" in the question doesn't look right at all.
INSERT INTO table1 VALUES (5,2,'steve', '2016-04-01','2016-04-04')
Now there's a row with Steve having an overlap.
And the query proposed as a SOLUTION in the question will return 'steve'.
Here's a demonstration of building a query to return the users that are "available" during the requested period, because there is no row in table1 for that user that "overlaps" with the requested period.
First problem is getting the users that are not available due to the existence of a row that overlaps the requested period. Assuming that start_date <= end_date for all rows in the table...
A row overlaps the requested period, if the end_date of the row is on or after the start of the requested period, and the start_date of the row is on or before the ed of the requested period.
-- users that are "unavailable" due to row with overlap
SELECT t.user_id
FROM table1 t
WHERE t.end_date >= '2016-04-03' -- start of requested period
AND t.start_date <= '2016-04-03' -- end of requested_period
GROUP
BY t.user_id
(If our assumption that start_date <= end_date doesn't hold, we can add that check as a condition in the query)
To get a list of all users, we could query a table that has a distinct list of users. We don't see a table like that in the question, so we can get a list of all users that appear in table1 instead
SELECT l.user_id
FROM table1 l
GROUP BY l.user_id
To get the list of all users excluding the users that are unavailable, there are couple of ways we can write that. The simplest is an anti-join pattern:
SELECT a.user_id
FROM ( -- list of all users
SELECT l.user_id
FROM table1 l
GROUP BY l.user_id
) a
LEFT
JOIN ( -- users that are unavailable due to overlap
SELECT t.user_id
FROM table1 t
WHERE t.end_date >= '2016-04-03' -- start of requested period
AND t.start_date <= '2016-04-03' -- end of requested_period
GROUP
BY t.user_id
) u
ON u.user_id = a.user_id
WHERE u.user_id IS NULL
will this work?
SELECT user_id DISTINCT FROM table1 WHERE (DATEDIFF(_input_,start_date) > 0 AND
DATEDIFF(_input_,end_date) > 0) OR
(DATEDIFF(_input_,start_date) < 0);

Complicated MysQL query to find each time a user appears more than once on the same day

I am trying to query a table. There are 3 important fields: attendant_id, client_id, and date.
Each time an attendant works with a client, they add an entry which includes their id, the client's id, and the date. Occasionally, an attendant will work with more than one client on the same day. I would like to capture when this happens. Here is what I have so far:
SELECT *
FROM timesheet_lines tsl1
WHERE EXISTS
(
SELECT *
FROM timesheet_lines tsl2
WHERE tsl1.date = tsl2.date
AND tsl1.attendant_id = tsl2.attendant_id
AND tsl1.client_id <> tsl2.client_id
AND tsl1.date between '2014-04-01' AND '2014-06-30'
LIMIT 2,5
)
I only want to display results where an attendant worked with at least 2 different clients. I don't expect it to be possible to have more than 5 on a single day. This is why I am using LIMIT 2,5.
I am also only interested in April through June of this year.
I think I may have the right syntax, but the query seems to be taking forever to run. Is there a faster query? There should be only about 42000+ entries all together for this particular date range. I am not expecting to get more than about 500-600 results that meet the criteria.
I ended up using the following:
create TEMPORARY table tempTSL1
(date1 date, start1 time, end1 time, attend1 varchar(50), client1 varchar(50), type1 tinyint);
insert into tempTSL1(date1, start1, end1, attend1, client1, type1)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
create TEMPORARY table tempTSL2
(date2 date, start2 time, end2 time, attend2 varchar(50), client2 varchar(50), type2 tinyint);
insert into tempTSL2(date2, start2, end2, attend2, client2, type2)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
SELECT *
FROM tempTSL1
WHERE (attend1,date1) IN (
SELECT attend2
,date2
FROM tempTSL2 tsl2
GROUP BY attend2
,date2
HAVING COUNT(date2) > 1
)
GROUP BY attend1
,client1
,date1
HAVING COUNT(client1) = 1
ORDER BY date1,attend1,start1
You are likely making it much more complex than it needs to be. Try something like this:
SELECT attendant_id
,client_id
,date
FROM timesheet_lines
WHERE (attendant_id,date) IN (
SELECT attendant_id
,date
FROM timesheet_lines tsl1
GROUP BY attendant_id
,date
HAVING COUNT(date) > 1
)
GROUP BY attendant_id
,client_id
,date
HAVING COUNT(client_id) = 1
The subquery returns results only of attendants performing multiple activities on the same date. The top query will pull from the same table, matching the attendant and dates of activity, and filter the result set to items where there is only 1 client in the grouping. Example:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
2 C 2014-01-01
2 D 2014-01-02
Will return:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
Untested, but I think it should be in line with what you are looking for, assuming the following two statements are true:
You are not trying to capture two different attendants working the same client on the same day
An attendant can only perform one activity per client per day
If the second point is not true, then you will need to incorporate additional fields into the subquery (such as an activity_id or something).
Hope this helps.

General time range with SQL

I'm trying to do something in SQL and I just can't figure out how I should do that. I have this table
----------------------------------
|id_visit | visit_date | ssn |
----------------------------------
|1 |1940-01-07 |123125789|
----------------------------------
|2 |1975-03-15 |987743271|
----------------------------------
| ... | ... | ... |
and I need to select SSN's of patients that were visited more than five times within a year. How do I do that? I know it involves a 'HAVING COUNT(id_visit)' but for time part... that's a different story because my goal isn't to select ssn's in a specific time range but within a general range.
From #Gordon Linoff answer, I modified the query a bit for eliminating repetition in the results and getting maximum result only.
select p_ssn as SSN, max(visits_within_one_year) as "Maximum number of visits"
from (select t.p_ssn,count(*) as visits_within_one_year
from t join
t tyr
on t.p_ssn = tyr.p_ssn and
tyr.visit_date between t.visit_date and adddate(t.visit_date, 365)
group by t.p_ssn,t.visit_date
having visits_within_one_year > 5)results
group by p_ssn;
Assuming that you mean calendar year, the following query retrieves all SSNs and year combinations where the SSN appears more than five times during the year:
select ssn, year(visit_date) as yr
from t
group by ssn, year(visit_date)
having count(*) > 5;
If the question is about an arbitrary year period, then you can use a self join and aggregation:
select t.ssn, t.visit_date, count(*) as visits_within_one_year
from t join
t tyr
on t.ssn = tyr.ssn and
tyr.visit_date between t.visit_date and adddate(t.visit_date, 365)
group by t.ssn, t.visit_date
having visits_within_one_year > 5;
If you mean to get those ssn within a solar year (jan/dec):
select ssn
from tablename
group by ssn,year(visit_date)
having count(ssn)>5

mysql get state on particular date

How could I get row from table with dates nearest to some date? If if insert log on February 27th, state will remain the same for some time and no other record will be added. How can I tell which state was on March 8th for instance?
I have two tables: State and History
State: History:
id|name id| date |id_state
1 |works 1 |2010-08-06 |1
2 |broken 2 |2011-05-10 |1
3 |active 3 |2009-27-01 |2
If I draw timeline of records when when were put in database...
2009-08-06 2010-08-06
---|--------------------------------------------|---------------->
'active' 'broken'
So it was active this entire time. I need all rows from History when state was active and date was March 8th 2010. Thank you
Simple query. Considering your mentioned date of 8th March 2010.
select h.id, h.date,h.id_state from history h
left outer join State s on s.id = h.id_state
where
h.date = '2010-03-08' and s.id = 3
You can rephrase the where clause as below according to your need.
where h.date = '2010-03-08' and s.name = 'active'
This may work
SELECT state.id, history.id, name, date
FROM state
JOIN history ON state.id = history.id_state
WHERE date = '2010-08-06'
Simple joining of 2 tables...
Edit:
To retrieve the last closest date to the given date, use this...
SELECT state.id, history.id, name, date
FROM state
JOIN history ON state.id = history.id_state
WHERE date <= '2012-04-10'
ORDER by date DESC
LIMIT 1
You get exactly ONE, but the closest date...
Edit2:
To retrieve the last closest date to the given date, that IS ACTIVE...
SELECT state.id, history.id, name, date
FROM state
JOIN history ON state.id = history.id_state
WHERE date <= '2012-04-10' AND name = 'active'
ORDER by date DESC
LIMIT 1
You get exactly ONE, but the closest date...
To get the last state before a given date (which will give you the state at the given date), use this query:
select * from (
select *
from log_table
where `date` < $1
and name = 'active'
order by `date` desc) x
limit 1
You can add to the where clause as you like to find the most recent row with some particular condition.