Minimum difference between timestamps with window function - sql-server-2014

I have the following base data sets.
I would like to know the ProcessStepLast time stamp for each ProcessStepOne record.
The latest ProcessStepOne record for each PersonId should have the latest ProcessStepLast record for the same PersonId.
The next latest ProcessStepOne record should have the next latest ProcessStepLast record and so on.
A ProcessStepLast record can only belong to one ProcessStepOne record.
There will always be at least one ProcessStepOne per PersonId.
There may be zero, one or more ProcessStepLast records per PersonId.
I'd prefer not to join on timestamps if possible and wanted to know if this was a candidate for a window function? I've given it a go, but can't quite get there.
Any help would be much appreciated.
Id PersonId ProcessStepOne
1084465 11802 2019-01-18 15:45:44.000
1084507 11802 2019-01-18 16:07:22.000
Id PersonId ProcessStepLast
1016970 11802 2019-01-24 12:51:52.600
1016996 11802 2019-01-24 12:55:21.953
1013472 11802 2019-01-24 12:51:45.803
Id PersonId ProcessStepOne ProcessStepLast
1084465 11802 2019-01-18 15:45:44.000 2019-01-24 12:51:52.600
1084507 11802 2019-01-18 16:07:22.000 2019-01-24 12:55:21.953

Your expected output does not seem to completely line up with what you describe, because the 2019-01-24 12:51:45.803 record from the second data set does not appear anywhere. But a general solution here which might work would be to union together your two tables, and then just aggregate in pairs:
WITH cte AS (
SELECT Id, PersonId, ProcessStepOne AS ProcessStep, 1 AS source,
ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY ProcessStepOne) rn
FROM table1
UNION ALL
SELECT Id, PersonId, ProcessStepTwo, 2,
ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY ProcessStepTwo)
FROM table2
)
SELECT
MAX(CASE WHEN source = 1 THEN Id END) AS Id,
PersonId,
MIN(ProcessStep) AS ProcessStepOne,
MAX(ProcessStep) AS ProcessStepTwo
FROM cte
GROUP BY
PersonId,
rn;
Demo
As a side note, the computed column source which I introduce in the union query CTE is there so that we can remember which Id value corresponds to the first table. This is to meet your requirement of using the first source Id labels in the expected output.

Related

Efficiently get latest appointment for every person sorted by oldest first

I already asked this question earlier but forgot a few (important) details or got them wrong.
My table in MySQL 8.0.29 looks like this
UserID
Appointment
Description
Bob
2022-06-01
Cleaning
Bob
2022-06-03
Toothache
John
2022-06-02
Braces
I'm trying to get the latest appointment for every person sorted by oldest first.
The query should return
UserID
Appointment
Description
John
2022-06-02
Braces
Bob
2022-06-03
Toothache
Using one of the previous answers I get
SELECT Name, Appointment, Description
FROM (
SELECT Name, Appointment, Description, ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Appointment DESC) rn) t1
WHERE rn = 1
The problem is the database currently has 3 million rows and it'll continue to grow so this query ends up being pretty slow.
My plan is to consume the data in chunks so I'd prefer the query having "pagination". Something like a LIMIT 0, 5000 to get 5000 records at a time.
I'm open to even re-architecting the database if it comes to that.
For now i've resorted to creating a new table that just keeps the latest appointment for each user.
You are halfway there. Use that query as a 'derived table' instead of making it permanent:
SELECT b.*
FROM ( SELECT user_id, MAX(appointment) AS last_date)
FROM tbl
GROUP BY user_id ) AS x
JOIN tbl AS b ON b.user_id = x.user_id
AND b.appointment = x.last_date
And be sure to have INDEX(user_id, appointment)
I would be interested to see if this and the "OVER" approach both give the same results and which is faster.

mysql highly selective query

I have a data set like this:
User Date Status
Eric 1/1/2015 4
Eric 2/1/2015 2
Eric 3/1/2015 4
Mike 1/1/2015 4
Mike 2/1/2015 4
Mike 3/1/2015 2
I'm trying to write a query in which I will retrieve users whose MOST RECENT transaction status is a 4. If it's not a 4 I don't want to see that user in the results. This dataset could have 2 potential results, one for Eric and one for Mike. However, Mike's most recent transaction was not a 4, therefore:
The return result would be:
User Date Status
Eric 3/1/2015 4
As this record is the only record for Eric that has a 4 as his latest transaction date.
Here's what I've tried so far:
SELECT
user, MAX(date) as dates, status
FROM
orders
GROUP BY
status,
user
This would get me to a unqiue record for every user for every status type. This would be a subquery, and the parent query would look like:
SELECT
user, dates, status
WHERE
status = 4
GROUP BY
user
However, this is clearly flawed as I don't want status = 4 records IF their most recent record is not a 4. I only want status = 4 when the latest date is a 4. Any thoughts?
SELECT user, date
, actualOrders.status
FROM (
SELECT user, MAX(date) as date
FROM orders
GROUP BY user) AS lastOrderDates
INNER JOIN orders AS actualOrders USING (user, date)
WHERE actualOrders.status = 4
;
-- Since USING is being used, there is not a need to specify source of the
-- user and date fields in the SELECT clause; however, if an ON clause was
-- used instead, either table could be used as the source of those fields.
Also, you may want to rethink the field names used if it is not too late and user and date are both found here.
SELECT user, date, status FROM
(
SELECT user, MAX(date) as date, status FROM orders GROUP BY user
)
WHERE status = 4
The easiest way is to include your order table a second time in a subquery in your from clause in order to retrieve the last date for each user. Then you can add a where clause to match the most recent date per user, and finally filter on the status.
select orders.*
from orders,
(
select ord_user, max(ord_date) ord_date
from orders
group by ord_user
) latestdate
where orders.ord_status = 4
and orders.ord_user = latestdate.ord_user
and orders.ord_date = latestdate.ord_date
Another option is to use the over partition clause:
Oracle SQL query: Retrieve latest values per group based on time
Regards,

Retrieve rows that have a first entry in 2014 in MySQL

I want to retrieve all rows from a table that have their first entry on or after 01/01/2014 but no later than 31/12/2014
Example of the table:
OID FK_OID Treatment Trt_DATE
1 100 19304 2011-05-24
2 100 19304 2011-08-01
3 100 19306 2014-03-05
4 200 19305 2012-02-02
5 300 19308 2014-01-20
6 400 19308 2014-06-06
For example. I would like to pull all entries that have STARTED treatment in 2014. So above i would to extract FK_OID's 300 and 400 because their first entry is in 2014, but i would like to omit FK_OID 100 because they have 2 entries prior to 2014.
How do i go about this? I can extract all entries within a date range etc but that brings back all entries for that date and doesn't omit anyone who has an entry prior to the start of the date range. It just returns their first entry in 2014.
For the ones who need to see that i have tried something. See below.
I am not an experienced coder and this is the best i can get because i don't have the knowledge.
SELECT
mod,
(select NHSNum from person p
WHERE
p.oid = t.fk_oid) as 'NHS'
FROM
timeline t
Where trt_date BETWEEN '2014-01-01' AND '2014-12-31'
ORDER BY trt_date ASC
This returns every treatment for 2014 regardless of whether it is the first ever one for that person. I want to omit anyone from this list who has had treatment before 01/01/2014 as well as only return the first treatment per person. For example, this code returns all treatments for all people in 2014. I only want their first one and only if it is their first one ever.
Thanks.
create table aThing
( oid int auto_increment primary key,
fk_oid int not null,
treatment int not null,
trt_date date not null
);
insert aThing (fk_oid,treatment,trt_date) values
(100, 19304, '2011-05-24'),
(100, 19304, '2011-08-01'),
(100, 19306, '2014-03-05'),
(200, 19305, '2012-02-02'),
(300, 19308, '2014-01-20'),
(400, 19308, '2014-06-06');
select fk_oid,dt
from
( select fk_oid,min(trt_date) as dt
from aThing
group by fk_oid
) xDerived
where year(dt)=2014;
+--------+------------+
| fk_oid | dt |
+--------+------------+
| 300 | 2014-01-20 |
| 400 | 2014-06-06 |
+--------+------------+
The inner part, the nested one, become a derived table, and is given a name xDerived. This means that even though it is just a result set, by making it a derived table, it can be referred to by name. So it is not a physical table, but a derived one, or virtual one.
So that derived table is a very simple group by with an aggregate function. It says, for every fk_oid, bring back one row and only 1 row, with its minimum value for trt_date.
So if you have 10 million rows in that table called aThing, but only 17 distinct values for fk_oid, it will return only 17 rows. Each row being the minimum of trt_date for its fk_oid.
So now that that is achieved, the outer wrapper says just show me those two columns (but with a year check). There is a complicated to explain reason why I had to do that, so I will try to do it here.
But I might need a little time to explain it well, so bear with me.
This will be a shortcut way to say it. I had to get the min into an alias, and I only had access to that alias if resolved in a derived table, to cleanse it so to speak, and then access it with an outer wrapper.
An alias of aggregate column, like as dt, is not available (as a pseudo like column name which is what an alias is) ... it is not available in a where clause. But by wrapping it in a derived table name, I cleanse it so to speak, and then I can access it in a where clause.
So I can't access it directly in its own query in the where clause, but when I wrap it in an envelope (a derived table), I can access it on the outside.
I will try better to explain it later, maybe, but I would have to show alternative attempts to gain access to results, and the syntax errors that would result.
There's probably a more elegant solution, but this seems to satisfy the requirement...
SELECT x.*
FROM my_table x
JOIN
( SELECT fk_oid
, MIN(trt_date) min_date
FROM my_table
GROUP
BY fk_oid
HAVING min_date > '2014-01-01'
) a
ON a.fk_oid = x.fk_oid
LEFT
JOIN my_table b
ON b.fk_oid = a.fk_oid
AND b.trt_date > '2014-12-31'
WHERE b.oid IS NULL;
Having a few years a experience with this, i decided to revisit it. The solution i now use regularly is:
SELECT t1.column1, t1.column2
FROM MyTable AS t1
LEFT OUTER JOIN MyTable AS t2
ON t1.fkoid = t2.fkoid
AND (t1.date > t2.date
OR (t1.date = t2.date AND t1.oid > t2.oId))
WHERE t2.fkoid IS NULL and t1.date >= '2014-01-01'

Multiple distinct counts with where

I am having an issue creating most efficient query for multiple distinct counts of a column with different where clauses. My MYSQL table looks like this:
id client_id result timestamp
---------------------------------------------------
1 1234566 escalated 2014-01-02 00:00:00
2 1233344 approved 2014-02-03 00:00:00
3 1234566 escalated 2014-01-02 01:00:00
What I am trying to achieve is to build the following data in the return:
Total number of unique client IDs processed from the beginning of time.
Total number of unique client IDs processed escalated from the beginning of time.
Total number of unique client IDs processed approved from the beginning of time.
Count of unique client IDs approved within specified timeframe using between statement on timestamp.
Count of unique client IDs escalated within specified timeframe using between statement on timestamp.
I have thought about running multiple selects, but I think it would be a waste of resources, and possibly if this could be done with a single query it would the best way to handle it, unfortunately my experience is lacking in this area. What I would like would the return to simple contain an alias and the count.
Any help would be appreciated.
You want conditional aggregation, something like:
select count(distinct ClientId) as NumClients,
count(distinct case when result = 'Approved' then ClientId end) as NumApproved,
count(distinct case when result = 'Escalated' then ClientId end) as NumEscalated,
count(distinct case when result = 'Approved' and timestamp between #Time1 and #Time2
then ClientId end) as NumApproved,
count(distinct case when result = 'Escalated' and timestamp between #Time1 and #Time2
then ClientId end) as NumEscalated,
from table t;

mysql first record retrieval

While very easy to do in Perl or PHP, I cannot figure how to use mysql only to extract the first unique occurence of a record.
For example, given the following table:
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-08-12 09:22:37 2000
John 2010-09-13 10:22:22 500
Sue 2010-09-01 09:07:21 1000
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
I would like to extract the first record for each individual in asc time order.
After sorting the table is ascending time order, I need to extract the first unique record by name.
So the output would be :
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
Is this doable in an easy fashion with mySQL?
I think that something along the lines of
select name, date, time, sale from mytable order by date, time group by name;
will get you what you're looking for
you need to perform a groupwise max or groupwise min
see below or http://pastie.org/973117 for an example
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
In databases, there really is no "first" or "last" record; think of each record as its own, non-positional entity in the table. The only positions they have are when you give them one, say, using ORDER BY.
This will give you what you want. It might not be efficient, but it works.
select Name, Date, Time, Sale from
(select Name, Date, Time, Sale from MyTable
order by Date asc, Time asc) MyTable_subquery_name
group by Name
Note: MyTable_subquery_name is just a dummy name for the subquery. MySQL will give the error ERROR 1248 (42000): Every derived table must have its own alias without it.
If only GROUP BY and ORDER BY were communicative operations, then this wouldn't have to be a subquery.