TSQL Determine Duplicates Respecting Date Ranges

TSQL Determine Duplicates Respecting Date Ranges - sql-server-2008

I am struggling with the appropriate query to find duplicates while at the same time respecting the effective start and end dates for the record. Example data below.
ClientName ClientID EffectiveStart EffectiveEnd
A 1 1900-01-01 2100-01-01
A 1 1900-01-01 2100-01-01
B 2 1900-01-01 2012-05-01
C 2 2012-05-01 2100-01-01
D 3 1900-01-01 2012-05-01
E 3 2012-04-30 2100-01-01
F 4 2012-04-15 2100-01-01
The output I am looking for is below.
ClientName ClientID
A 1
D 3
E 3
The logic is that Client A has ID 1 duplicated. Client B and C have a duplicate (2) BUT the date ranges are such that the two duplicates DO NOT overlap each other, which means they should not be considered duplicates. Client D and E have ID 3 duplicated AND the date ranges overlap so it should be considered a duplicate. Client F does not have a duplicate so should not show in the output.
Any thoughts? Questions?

There are two versions. Exists is simpler but likely slower than join. Exists checks for each record if there is an overlapping record per same clientid; it is bound to find at least one, itself, hence group by and having.
select distinct ClientName, ClientID
from Table1
where exists
(
select null
from table1 test1
where test1.clientid = table1.clientid
and test1.EffectiveStart < table1.EffectiveEnd
and test1.EffectiveEnd > table1.EffectiveStart
group by test1.ClientID
having count (*) > 1
)
Join does the same, but as grouping is done on all records its having has to count them all.
select test1.clientid
from table1 test1
join table1 test2 on test1.clientid = test2.clientid
where test1.EffectiveStart < test2.EffectiveEnd
and test1.EffectiveEnd > test2.EffectiveStart
group by test1.clientid
having count (*) > (select count (*)
from table1
where clientid = test1.clientid)
I omitted retrieval of clientname because people hate to see nested queries.
Live test is at Sql Fiddle.

Will need a PK
select c1.name, c2.name, c1.id
from client c1
join client c2 on c1.id = c2.id and c1.PK < c2.PK
where c1.Start > c2.End or c1.End < c2.Start
Determine Whether Two Date Ranges Overlap please give him a +1

Related

Nested queries and Join

As a beginner with SQL, I’m ok to do simple tasks but I’m struggling right now with multiple nested queries.
My problem is that I have 3 tables like this:
a Case table:
id nd date username
--------------------------------------------
1 596 2016-02-09 16:50:03 UserA
2 967 2015-10-09 21:12:23 UserB
3 967 2015-10-09 22:35:40 UserA
4 967 2015-10-09 23:50:31 UserB
5 580 2017-02-09 10:19:43 UserA
a Value table:
case_id labelValue_id Value Type
-------------------------------------------------
1 3633 2731858342 X
1 124 ["864","862"] X
1 8981 -2.103 X
1 27 443 X
... ... ... ...
2 7890 232478 X
2 765 0.2334 X
... ... ... ...
and a Label table:
id label
----------------------
3633 Value of W
124 Value of X
8981 Value of Y
27 Value of Z
Obviously, I want to join these tables. So I can do something like this:
SELECT *
from Case, Value, Label
where Case.id= Value.case_id
and Label.id = Value.labelValue_id
but I get pretty much everything whereas I would like to be more specific.
What I want is to do some filtering on the Case table and then use the resulting id's to join the two other tables. I'd like to:
Filter the Case.nd's such that if there is serveral instances of the same nd, take the oldest one,
Limit the number of nd's in the query. For example, I want to be able to join the tables for just 2, 3, 4 etc... different nd.
Use this query to make a join on the Value and Label table.
For example, the output of the queries 1 and 2 would be:
id nd date username
--------------------------------------------
1 596 2016-02-09 16:50:03 UserA
2 967 2015-10-09 21:12:23 UserB
if I ask for 2 different nd. The nd 967 appears several times but we take the oldest one.
In fact, I think I found out how to do all these things but I can't/don't know how to merge them.
To select the oldest nd, I can do someting like:
select min((date)), nd,id
from Case
group by nd
Then, to limit the number of nd in the output, I found this (based on this and that) :
select *,
#num := if(#type <> t.nd, #num + 1, 1) as row_number,
#type := t.nd as dummy
from(
select min((date)), nd,id
from Case
group by nd
) as t
group by t.nd
having row_number <= 2 -- number of output
It works but I feel it's getting slow.
Finally, when I try to make a join with this subquery and with the two other tables, the processing keeps going on for ever.
During my research, I could find answers for every part of the problem but I can't merge them. Also, for the "counting" problem, where I want to limit the number of nd, I feel it's kind of far-fetch.
I realize this is a long question but I think I miss something and I wanted to give details as much as possible.

to filter the case table to eliminate all but oldest nds,
select * from [case] c
where date = (Select min(date) from case
where nd = c.nd)
then just join this to the other tables:
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
to limit it to a certain number of records, there is a mysql specific command, I think it called Limit
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
Limit 4 -- <=== will limit return result set to 4 rows
if you only want records for the top N values of nd, then the Limit goes on a subquery restricting what values of nd to retrieve:
select * from [case] c
join value v on v.Case_id = c.Id
join label l on l.Id = v.labelValue_id
where date = (Select min(date) from [case]
where nd = c.nd)
and nd In (select distinct nd from [case]
order by nd desc Limit N)

So finally, here is what worked well for me:
select *
from (
select *
from Case
join (
select nd as T_ND, date as T_date
from Case
where nd in (select distinct nd from Case)
group by T_ND Limit 5 -- <========= Limit of nd's
) as t
on Case.nd = t.T_ND
where date = (select min(date)
from Case
where nd = t.T_ND)
) as subquery
join Value
on Value.context_id = subquery.id
join Label
on Label.id = Value.labelValue_id
Thank you #charlesbretana for leading me on the right track :).

Join two subqueries and have a field: division of the results of two subqueries

I have a table like this:
userid | trackid | path
123 70000 ad
123 NULL abc.com
123 NULL Apply
345 70001 Apply
345 70001 Apply
345 NULL Direct
345 NULL abc.com
345 NULL cdf.com
And I want a query like this. When path='abc.com', num_website +1; when path='Apply', num_apply +1
userid | num_website | num_Apply | num_website/num_Apply
123 1 1 1
345 1 2 0.5
My syntax looks like this:
select * from
(select userid,count(path) as is_CWS
from TABLE
where path='abc.com'
group by userid
having count(path)>1) a1
JOIN
(select userid,count(userid) as Apply_num from
where trackid is not NULL
group by userid) a2
on a1.userid=a2.userid
My question is
1. how to have the field num_website/num_apply in term of my syntax above?
2. is there any other easier way to get the result I want?
Any spots shared will appreciate.

The simplest way to do it would be to change the select line:
SELECT a1.userid, a1.is_CWS, a2.Apply_num, a1.is_CWS/a2.Apply_num FROM
(select userid,count(path) as is_CWS
from TABLE
where path='abc.com'
group by userid
having count(path)>1) a1
JOIN
(select userid,count(userid) as Apply_num
from TABLE
where trackid is not NULL
group by userid) a2
on a1.userid=a2.userid
and then continue with the rest of your query as you have it. The star means "select everything." If you wanted to select only a few things, you would just list those things in place of the star, and if you wanted to select some other values based on those things, you would put those in the stars as well. In this case a1.is_CWS/a2.Apply_num is an expression, and MySql knows how to evaluate it based on the values of a1.is_CWS and a2.Apply_num.
In the same vein, you can do a lot of what those subqueries are doing in a single expression instead of a subquery. objectNotFound has the right idea. Instead of doing a subquery to retrieve the number of rows with a certain attribute, you can select SUM(path="abc.com") as Apply_num and you don't have to join anymore. Making that change gives us:
SELECT a1.userid,
SUM(path="abc.com") as is_CWS,
a2.Apply_num,
is_CWS/a2.Apply_num FROM
TABLE
JOIN
(select userid,count(userid) as Apply_num
FROM TABLE
where trackid is not NULL
group by userid) a2
on a1.userid=a2.userid
GROUP BY userid
Notice I moved the GROUP BY to the end of the query. Also notice instead of referencing a1.is_CWS I now reference just is_CWS (it's no longer inside the a1 subtable so we can just reference it)
You can do the same thing to the other subquery then they can share the GROUP BY clause and you won't need the join anymore.

to get you started ... you can build on top of this :
select
userid,
SUM(CASE WHEN path='abc.com'then 1 else 0 end ) as num_website,
SUM(CASE WHEN path='Apply' and trackid is not NULL then 1 else 0 end ) as Apply_Num
from TABLE
WHERE path='abc.com' or path='Apply' -- may not need this ... play with it
group by userid

MySQL calculate only the row-differences for rows in the same group

I have the following table (call it trans):
issue_id: state_one: state_two: timer:
1 A B 1
1 B C 3
2 A B 2
2 B C 4
2 C D 7
I'd like the get the difference in 'timer' between consecutive rows, but only those with the same issue_id.
Expected result:
issue_id: state_one: state_two: timer: time_diff:
1 B C 3 2
2 B C 4 2
2 C D 7 3
When taking the time difference between two rows, I'd like the result displayed next to the later row.
If we only had one, time-ordered issue in the table, the following code works fine:
select
X.issue_id,
X.timer as X_timer,
Y.timer as Y_timer,
(X.timer - Y.timer) as time_diff
from trans X
cross join trans Y
where Y.timer in (
select
max(Z.timer)
from trans Z
where Z.timer < X.timer);
I want to generalize this approach to handle MANY issues with time-ordered state changes.
My idea was to add the following condition, but it only works if consecutive events belong to the same issue (not the case in the real world):
... where Z.timer < X.timer)
and X.issue_id = Y.issueid;
Question: In MySQL, can I do this iteratively (i.e. calculate differences for issue_id=1, then for issue_id=2, and so on)? A function or subquery?
Other strategies? Constraint: I have read-only privileges. I truly appreciate the help!
EDIT: I added expected output, added a row to my example table, and clarified.

select
issue_id, (MAX(timer)-MIN(timer)) as diff from trans
group by issue_id

Assuming timer or (issue_id,timer) is PRIMARY...
SELECT a.*, a.timer-MAX(b.timer)
FROM trans a
JOIN trans b
ON b.issue_id = a.issue_id
AND b.timer < a.timer
GROUP
BY a.issue_id
, a.timer;

Select * from #Temp
Select T1.Issuerid,T1.stateone,T1.statetwo,MAX(T1.timer)-MIN(T.timer) as Time_Diff from #Temp T1
left join #Temp T2 on
T1.issuerid=T2.IssuerId
group by T1.Issuerid,T1.stateone,T1.statetwo
Please Give me Reply

MYSQL counting occurrences separately from 2 tables

I got 3 tables in my MYSQL bases and I have to compare how many time there are each user_ID in each of the 2 first table (table 1 and table 2)
here is my table 1:
user_ID
A
B
A
D
...
here is my table 2 :
user_ID
A
C
A
...
here is my table 3 (with link between user_ID and nickname) :
user_ID // nickname
A // Bob
B // Joe
C // Tom
...
I would like to get a result like this:
Nickname // count occurrences from Table 1 // count occurrences from table 2
Bob // 1 // 2
Joe // 4 // 0
Tom // 0 // 2
I did not succeed for instant to count separately from each table, I got a global result for each nickname :(
Could you help me to find the right MYSQL request ?
- ...

This type of query is a little tricky, because some names may not be in the first table and others may not be in the second. To really solve this type of problem, you need to pre-aggregate the results for each query. To get all the names, you need a left outer join:
select t3.name, coalesce(cnt1, 0) as cnt1, coalesce(cnt2, 0) as cnt2
from table3 t3 left outer join
(select name, count(*) as cnt1
from table1
group by name
) t1n
on t3.name = t1n.name left outer join
(select name, count(*) as cnt2
from table2
group by name
) t2n
on t3.name = t1n.name;

joining same table twice in a mysql query

I'm trying to understand what's the purpose of the join in this query.
SELECT
DISTINCT o.order_id
FROM
`order` o,
`order_product` as op
LEFT JOIN `provider_order_product_status_history` as popsh
on op.order_product_id = popsh.order_product_id
LEFT JOIN `provider_order_product_status_history` as popsh2
ON popsh.order_product_id = popsh2.order_product_id
AND popsh.provider_order_product_status_history_id <
popsh2.provider_order_product_status_history_id
WHERE
o.order_id = op.order_id
AND popsh2.last_updated IS NULL
LIMIT 10
What bothering me is that provider_order_product_status_history has joined 2 times and I'm not sure the purpose of it. Highly appreciate if someone can help

It's a technique to retrieve the latest order status.
Because of
AND popsh.provider_order_product_status_history_id < popsh2.provider_order_product_status_history_id
and
AND popsh2.last_updated IS NULL
Only those order status that doesn't have any newer status are returned.
For a minimum set example, consider the following status history table:
id status order_id last_updated
--------------------------------
1 A X 1:00
2 B X 2:00
The self join will result in:
id status order_id last_updated id status order_id last_updated
-------------------------------- --------------------------------
1 A X 1:00 2 B X 2:00
2 B X 2:00 NULL NULL NULL
The first row will be filtered out by the IS NULL condition, leaving only the second raw, which is the latest one.
For a 3-row case the self join result will be:
id status order_id last_updated id status order_id last_updated
-------------------------------- --------------------------------
1 A X 1:00 2 B X 2:00
1 A X 1:00 3 C X 3:00
2 B X 2:00 3 C X 3:00
3 C X 3:00 NULL NULL NULL
And only the last one will pass the IS NULL condition, leaving the latest one again.
It looks like an unnecessarily complicated way to do the job, but it actually works quite well as RDBMS engines do joins very efficiently.
BTW, as the query retrieves only order_id, the query is not useful as it is. I guess the OP omitted other fields in the select clause. It should be something like SELECT o.order_id, popsh.* FROM ...

Wait, you have an error:
SELECT
DISTINCT o.order_id
FROM
`order` o,
`order_product` as op
LEFT JOIN `provider_order_product_status_history` as popsh
on op.order_product_id = popshs.order_product_id
** YOU HAVE EXCESS 's' HERE ^
LEFT JOIN `provider_order_product_status_history` as popsh2
ON popsh.order_product_id = popsh2.order_product_id
AND popsh.provider_order_product_status_history_id < popsh2.provider_order_product_status_history_id
WHERE
o.order_id = op.order_id
AND popsh2.last_updated IS NULL
LIMIT 10
Based from my analysis, the query is trying to extract the first o.order_id or first entry (based on provider_order_product_status_history.provider_order_product_status_history_id) of the provider_order_product_status_history. However, the joins semantic used in this query is not recomendable.

Both joins being an inner-kind restrict resultset on conditions. It's like "gimme only values from table 1 that have corresponding row in table 2 on condition1 and at the same time a row in table2 on condition2".

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

TSQL Determine Duplicates Respecting Date Ranges - sql-server-2008

Will need a PK select c1.name, c2.name, c1.id from client c1 join client c2 on c1.id = c2.id and c1.PK < c2.PK where c1.Start > c2.End or c1.End < c2.Start Determine Whether Two Date Ranges Overlap please give him a +1

Related

Nested queries and Join

Join two subqueries and have a field: division of the results of two subqueries

MySQL calculate only the row-differences for rows in the same group

MYSQL counting occurrences separately from 2 tables

joining same table twice in a mysql query

Categories

Resources