SQL - Select Closest Preceding Date - mysql

I have a database table that contains one or more entries for each patient. These contain free text and additional information about a test request. Querying on a patient would for example return:-
TestID PatientID RequestMade FreeText
1 23 13/12/2015 11:00:00 Feeling breathless
1125 23 07/04/2016 09:31:15 Unexplained fractures
2556 23 04/12/2016 16:20:21 Check liver function – on statins
When viewing test results I have to pull up the request information relating to the test which will be the last one prior to the test. The results have a TestDate so a TestDate of '13/04/2016 14:21:30' should display the request of '07/04/2016 09:31:15'. I am unsure how to code this efficiently as returning every entry for a patient and doing a date comparison on each one seems not the best way to tackle it.

If you want the one test before another test for a single patient and the test you are looking for only appears once, then you can do this with a single query as:
select t.*
from tests t
where t.patientid = 23 and
t.requestmade < (select t2.requestmade
from tests t2
where t2.patientid = t.patientid and
t2.testid = ?
)
order by t.requestmade desc
limit 1;

Related

Efficiently get latest appointment for every person sorted by oldest first

I already asked this question earlier but forgot a few (important) details or got them wrong.
My table in MySQL 8.0.29 looks like this
UserID
Appointment
Description
Bob
2022-06-01
Cleaning
Bob
2022-06-03
Toothache
John
2022-06-02
Braces
I'm trying to get the latest appointment for every person sorted by oldest first.
The query should return
UserID
Appointment
Description
John
2022-06-02
Braces
Bob
2022-06-03
Toothache
Using one of the previous answers I get
SELECT Name, Appointment, Description
FROM (
SELECT Name, Appointment, Description, ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Appointment DESC) rn) t1
WHERE rn = 1
The problem is the database currently has 3 million rows and it'll continue to grow so this query ends up being pretty slow.
My plan is to consume the data in chunks so I'd prefer the query having "pagination". Something like a LIMIT 0, 5000 to get 5000 records at a time.
I'm open to even re-architecting the database if it comes to that.
For now i've resorted to creating a new table that just keeps the latest appointment for each user.
You are halfway there. Use that query as a 'derived table' instead of making it permanent:
SELECT b.*
FROM ( SELECT user_id, MAX(appointment) AS last_date)
FROM tbl
GROUP BY user_id ) AS x
JOIN tbl AS b ON b.user_id = x.user_id
AND b.appointment = x.last_date
And be sure to have INDEX(user_id, appointment)
I would be interested to see if this and the "OVER" approach both give the same results and which is faster.

SQL Capture duplicate records across two DIFFERENT columns

I am writing an Exception Catching Page using MySQL for catching duplicate billing entries the following scenario.
Items details are entered in a table which has the following two columns (among others).
ItemCode VARCHAR(50), BillEntryDate DATE
It often happens that same item's bill is entered multiple times, but over a period of few days. Like,
"Football","2019-01-02"
"Basketball","2019-01-02"
...
...
"Football","2019-01-05"
"Rugby","2019-01-05"
...
"Handball","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
In the above example, the item Football is billed twice - first on 2Jan and again on 5Jan. Similarly, item Rugby is billed thrice on 5,7,10Jan.
I am looking to write simple SQL which can pickup each item [say, using distinct(ItemCode) clause], and then display all the records which are duplicates over a period of 30 days.
In the above case, the expected output should be the following 5 records:
"Football","2019-01-02"
"Football","2019-01-05"
"Rugby","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
I am trying to run the following SQL:
select * from tablen a, tablen b, where a.ItemCode=b.ItemCode and a.BillEntryDate = b.BillEntryDate+30;
However, this seems to be highly inefficient as it is running for long without displaying any records.
Is there any possibility for getting a less complex and faster method?
I did explore existing topics (like How do I find duplicates across multiple columns?), but it is catching duplicates where BOTH columns have same value. My requirement is one column same value, and second column varying over a month-long date range.
You can use:
select t.*
from tablen t
where exists (select 1
from tablen t2
where t2.ItemCode = t.ItemCode and
t2.BillEntryDate <> t.BillEntryDate and
t2.BillEntryDate >= t1.BillEntryDate - interval 30 day and t2.BillEntryDate <= t1.BillEntryDate + interval 30 day
);
This will pick up both duplicates in the pair.
For performance, you want an index on (ItemCode, BillEntryDate).
With EXISTS:
select ItemCode, BillEntryDate
from tablename t
where exists (
select 1 from tablename
where
ItemCode = t.ItemCode
and
abs(datediff(BillEntryDate, t.BillEntryDate)) between 1 and 30
)

Can SQL query do this?

I have a table "audit" with a "description" column, a "record_id" column and a "record_date" column. I want to select only those records where the description matches one of two possible strings (say, LIKE "NEW%" OR LIKE "ARCH%") where the record_id in each of those two matches each other. I then need to calculate the difference in days between the record_date of each other.
For instance, my table may contain:
id description record_id record_date
1 New Sub 1000 04/14/13
2 Mod 1000 04/14/13
3 Archived 1000 04/15/13
4 New Sub 1001 04/13/13
I would want to select only rows 1 and 3 and then calculate the number of days between 4/15 and 4/14 to determine how long it took to go from New to Archived for that record (1000). Both a New and an Archived entry must be present for any record for it to be counted (I don't care about ones that haven't been archived). Does this make sense and is it possible to calculate this in a SQL query? I don't know much beyond basic SQL.
I am using MySQL Workbench to do this.
The following is untested, but it should work asuming that any given record_id can only show up once with "New Sub" and "Archived"
select n.id as new_id
,a.id as archive_id
,record_id
,n.record_date as new_date
,a.record_date as archive_date
,DateDiff(a.record_date, n.record_date) as days_between
from audit n
join audit a using(record_id)
where n.description = 'New Sub'
and a.description = 'Archieved';
I changed from OR to AND, because I thought you wanted only the nr of days between records that was actually archived.
My test was in SQL Server so the syntax might need to be tweaked slightly for your (especially the DATEDIFF function) but you can select from the same table twice, one side grabbing the 'new' and one grabbing the 'archived' then linking them by record_id...
SELECT
newsub.id,
newsub.description,
newsub.record_date,
arc.id,
arc.description,
arc.record_date,
DATEDIFF(day, newsub.record_date, arc.record_date) AS DaysBetween
FROM
foo1 arc
, foo1 newsub
WHERE
(newsub.description LIKE 'NEW%')
AND
(arc.description LIKE 'ARC%')
AND
(newsub.record_id = arc.record_id)

How can I find days between different paired rows?

I've been racking my brain about how to do this in one query without PHP code.
In a nutshell, I have a table that records email activity. For the sake of this example, here is the data:
recipient_id activity date
1 delivered 2011-08-30
1 open 2011-08-31
2 delivered 2011-08-30
3 delivered 2011-08-24
3 open 2011-08-30
3 open 2011-08-31
The goal: I want to display to users a single number that tells how many recipients open their email within 24 hours.
E.G. "Users that open their email within 24 hours: 13 Readers"
In the case of the sample data, above, the value would be "1". (Recipient one was delivered an email and opened it the next day. Recipient 2 never opened it and recipient 3 waited 5 days.)
Can anyone think of a way to express the goal in a single query?
Reminder: In order to count, the person must have a 'delivered' tag and at least one 'open' tag. Each 'open' tag only counts once per recipient.
** EDIT ** Sorry, I'm using MySQL
Here is a version in mysql.
select count(distinct recipient_id)
from email e1
where e1.activity = 'delivered'
and exists
(select * from email e2
where e1.recipient_id = e2.recipient_id
and e2.activity = 'open'
and datediff(e2.action_date,e1.action_date) <= 1)
The basic principle is that you want to find a delivered row for a recipient that also has an open within 24 hours.
The datediff() is a good way to do the date arithmetic in mysql -- other dbs will vary on exact methods for this step. The rest of the sql will work anywhere.
SQLFiddle here: http://sqlfiddle.com/#!2/c9116/4
Untested, but should work ;) Don't know which SQL dialect you use, so I've used TSQL DATEDIFF function.
select distinct opened.recipient_id -- or count(distinct opened.recipient_id) if you want to know number
from actions as opened
inner join actions as delivered
on opened.recipient_id = delivered.recipient_id and delivered.activity = 'delivered'
where opened.activity = 'open' and DATEDIFF(day, delivered.date, opened.date) <= 1
Edit: I'd confused opened with delivered - now replaced.
Assumptions: MySql, table is called "TABLE"
Ok, I am not 100% on this, because I don't have a copy of the table to run it against, but I think that you could do something like this:
SELECT COUNT(DISTINCT t1.recipient_id) FROM TABLE t1
INNER JOIN TABLE t2 ON t1.recipient_id = t2.recipient_id AND t1.activity != t2.activity
WHERE t1.activity in ('delivered', 'open') AND t2.activity in ('delivered', 'open')
AND ABS(DATEDIFF(t1.date, t2.date)) = 1
Basically, you are joining a table onto itself, where the activities don't match, but recipient_ids do, and the status is either 'delivered' or 'open'. What you would end up getting, is a result that looks like this:
1 delivered 2011-08-30 1 open 2011-08-31
You are then doing a diff between the two dates (with an absolute value, because we don't know which order they will be in) and making sure that it is equal to 1 (or 24 hours).

How can I find the correct prior status row in this table with a SQL query?

Imagine a workflow for data entry. Some forms come in, they are typed into a system, reviewed, and hopefully approved. However, they can be rejected by a manager and will have to be entered again.
So, an ideal workflow would go like this:
recieved > entered > approved
But this COULD happen:
received > entered > rejected > entered > rejected > approved
At each stage, we record who updated the form to its current status - who entered it, who rejected it, or who approved it. So the forms status table looks like this:
form_id status updated_by updated_at
1 received Bob (timestamp)
1 entered Bob (timestamp)
1 approved Susan (timestamp)
2 received Bob (timestamp)
2 entered Bob (timestamp)
2 rejected Susan (timestamp)
2 entered Carla (timestamp)
2 rejected Susan (timestamp)
2 entered Sam (timestamp)
2 approved Susan (timestamp)
Here's what I'm trying to do: write a rejection report. I want a row for each rejection, and joined to that row, I want to see who did the work that got rejected.
As a human, I can see that, for a given status row with status 'rejected', the row that will tell me who did the faulty work will be the one that
shares the same form_id and
has a prior timestamp closest to the rejection.
But I'm having trouble telling MySQL that.
Can anybody see how to construct this query?
A subselect ended up working for me.
SELECT
`s1`.`form_id`,
(
SELECT
`s2`.`updated_by`
FROM
statuses s2
WHERE
`s2`.`form_id` = `s1`.`form_id`
AND
`s2`.`updated_at` < `s1`.`updated_at`
ORDER BY
`s2`.`updated_at` DESC
LIMIT 1
) AS 'made_rejected_change'
FROM
statuses s1
WHERE
`s1`.`status` = 'rejected'
Another solution that uses subselect (this time not a correlated subquery):
SELECT
w1.*,
w2.entered_by
FROM (
SELECT
wr.form_id,
wr.updated_at AS rejected_at,
wr.updated_by AS rejected_by,
MAX(we.updated_at) AS entered at
FROM workflow wr
INNER JOIN workflow we ON we.status = 'entered'
AND wr.form_id = we.form_id
AND wr.updated_at > we.updated_at
WHERE wr.status = 'rejected'
GROUP BY
wr.form_id,
wr.updated_at,
wr.updated_by
) w1
INNER JOIN workflow w2 ON w1.form_id = w2.form_id
AND w1.entered_at = w2.updated_at
The subselect lists all the rejecters and the immediately preceding entered timestamps. Then the table is joined once again to extract the names corresponding to the entered_at timestamps.
You want to get the rejected timestamp and then figure out the entry that appeared right before it based on the timestamp. I'm assuming that timestamp actually holds a date/time and isn't an SQL server timestamp field (completely different).
declare #rejectedTimestamp timestamp
select #rejectedTimestamp = timestamp
from table
where status = 'rejected'
select top 1 *
from table
where timestamp < #rejectedtimestamp
order by timestamp desc