Reduce mysql database to elminiate duplicates - mysql

I have a table called "lane" with the following properties.
CREATE TABLE `lane` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`origRegion` varchar(45) NOT NULL,
`origState` char(2) NOT NULL,
`destRegion` varchar(45) NOT NULL,
`destState` char(2) NOT NULL
PRIMARY KEY (`id`)
)
There are duplicate rows in this table of the following columns: origState, origRegion, destState, destRegion. I'd like to be able to select all rows JOINED to what the min(id) is of the first occurance.
For example, with data:
1 ALL MA ALL OH
2 ALL MA ALL OH
3 ALL MA ALL OH
and a SQL similar to this (which misses all the duplicate rows):
select l.*, l2.count, l2.minId from tmpLane l
JOIN (SELECT id, min(ID) as minId from tmpLane
GROUP BY origRegion, origState, destRegion, destState) l2 on l.id = l2.id;
Result (note the count and minId at the end):
1 ALL MA ALL OH 3 1
2 ALL MA ALL OH 3 1
3 ALL MA ALL OH 3 1
Note, that the query used above is an adaptation of the solution here (which doesn't work in this situation)

SELECT ID,
origRegion,
origState,
destRegion,
destState,
(SELECT COUNT(*)
FROM Lane l3
WHERE l.origRegion = l3.origRegion
and l.origState = l3.origState
and l.destRegion = l3.destRegion
and l.destState = l3.destState) as 'Count',
(SELEcT MIN(ID)
FROM Lane l2
WHERE l.origRegion = l2.origRegion
and l.origState = l2.origState
and l.destRegion = l2.destRegion
and l.destState = l2.destState) as minID
FROM lane l

You can run this query to remove all the duplicate rows from your database:-
ALTER IGNORE TABLE `lane`
ADD UNIQUE INDEX (`origRegion`, `origState`, `destRegion`, `destState`);
This will add unique index to your table and remove all dulicate rows and will make sure that no duplicate rows being inserted in future.

Related

MySQL complex semi-join without group by

Summary
I am looking for a semi-join(ish) query that selects a number of customers and joins their most recent data from other tables.
At a later time, I wish to directly append conditions to the end of the query: WHERE c.id IN (1,2,3)
Problem
As far as I am aware, my requirement rules out GROUP BY:
SELECT * FROM customer c
LEFT JOIN customer_address ca ON ca.customer_id = c.id
GROUP BY c.id
# PROBLEM: Cannot append conditions *after* GROUP BY!
With most subquery-based attempts, my problem is the same.
As an additional challenge, I cannot strictly use a semi-join, because I allow at least two types of phone numbers (mobile and landline), which come from the same table. As such, from the phone table I may be joining multiple records per customer, i.e. this is no longer a semi-join. My current solution below illustrates this.
Questions
The EXPLAIN result at the bottom looks performant to me. Am I correct? Are each of the subqueries executed only once? Update: It appears that DEPENDENT SUBQUERY is executed once for each row in the outer query. It would be great if we could avoid this.
Is there a better solution to what I am doing?
DDLs
DROP TABLE IF EXISTS customer;
CREATE TABLE `customer` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
);
DROP TABLE IF EXISTS customer_address;
CREATE TABLE `customer_address` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`customer_id` bigint(20) unsigned NOT NULL,
`street` varchar(85) DEFAULT NULL,
`house_number` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`)
);
DROP TABLE IF EXISTS customer_phone;
CREATE TABLE `customer_phone` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`customer_id` bigint(20) unsigned NOT NULL,
`phone` varchar(32) DEFAULT NULL,
`type` tinyint(3) unsigned NOT NULL COMMENT '1=mobile,2=landline',
PRIMARY KEY (`id`)
);
insert ignore customer values (1);
insert ignore customer_address values (1, 1, "OldStreet", 1),(2, 1, "NewStreet", 1);
insert ignore customer_phone values (1, 1, "12345-M", 1),(2, 1, "12345-L-Old", 2),(3, 1, "12345-L-New", 2);
SELECT * FROM customer;
+----+
| id |
+----+
| 1 |
+----+
SELECT * FROM customer_address;
+----+-------------+-----------+--------------+
| id | customer_id | street | house_number |
+----+-------------+-----------+--------------+
| 1 | 1 | OldStreet | 1 |
| 2 | 1 | NewStreet | 1 |
+----+-------------+-----------+--------------+
SELECT * FROM customer_phone;
+----+-------------+-------------+------+
| id | customer_id | phone | type |
+----+-------------+-------------+------+
| 1 | 1 | 12345-M | 1 |
| 2 | 1 | 12345-L-Old | 2 |
| 3 | 1 | 12345-L-New | 2 |
+----+-------------+-------------+------+
Solution so far
SELECT *
FROM customer c
# Join the most recent address
LEFT JOIN customer_address ca ON ca.id = (SELECT MAX(ca.id) FROM customer_address ca WHERE ca.customer_id = c.id)
# Join the most recent mobile phone number
LEFT JOIN customer_phone cphm ON cphm.id = (SELECT MAX(cphm.id) FROM customer_phone cphm WHERE cphm.customer_id = c.id AND cphm.`type` = 1)
# Join the most recent landline phone number
LEFT JOIN customer_phone cphl ON cphl.id = (SELECT MAX(cphl.id) FROM customer_phone cphl WHERE cphl.customer_id = c.id AND cphl.`type` = 2)
# Yay conditions appended at the end
WHERE c.id IN (1,2,3)
Fiddle
This fiddle gives the appropriate result set using the given solution. See my questions above.
http://sqlfiddle.com/#!9/98c57/3
I would avoid those dependent subqueries, instead try this:
SELECT
*
FROM customer c
LEFT JOIN (
SELECT
customer_id
, MAX(id) AS currid
FROM customer_phone
WHERE type = 1
GROUP BY
customer_id
) gm ON c.id = gm.customer_id
LEFT JOIN customer_phone mobis ON gm.currid = mobis.id
LEFT JOIN (
SELECT
customer_id
, MAX(id) AS currid
FROM customer_phone
WHERE type = 2
GROUP BY
customer_id
) gl ON c.id = gl.customer_id
LEFT JOIN customer_phone lands ON gl.currid = lands.id
WHERE c.id IN (1, 2, 3)
;
or, perhaps:
SELECT
*
FROM customer c
LEFT JOIN (
SELECT
customer_id
, MAX(case when type = 1 then id end) AS mobid
, MAX(case when type = 2 then id end) AS lndid
FROM customer_phone
GROUP BY
customer_id
) gp ON c.id = gp.customer_id
LEFT JOIN customer_phone mobis ON gp.mobid = mobis.id
LEFT JOIN customer_phone lands ON gp.lndid = lands.id
WHERE c.id IN (1, 2, 3)
;
see: http://sqlfiddle.com/#!9/ef983/1/

SQL RIGHT JOIN misunderstanding

I'm working on ASP.NET application whose SQL backend (MySQL 5.6) has 4 tables:
The first table is defined in this way:
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`descr` varchar(45) NOT NULL,
`modus` varchar(8) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
These are the items managed in the application.
the second table:
CREATE TABLE `files` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`file_path` varchar(255) NOT NULL,
`id_item` int(11) NOT NULL,
`id_type` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
these are files that are required for items management. Each 'item' can have 0 or multiple files ('id_item' field is filled with a valid 'id' of 'items' table).
the third table:
CREATE TABLE `file_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`file_type` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
this table describe the type of the file.
the fourth table:
CREATE TABLE `checklist` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_type` int(11) NOT NULL,
`modus` varchar(8) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
);
this table, as suggested by its name, is a checklist. It describe what types of files needs to be collected for a particular 'modus', 'modus' field holds the same values as for 'modus' in 'items' table, 'id_type' holds valid 'id' values from 'file_types' table.
Let's suppose that the first table holds those items:
id descr modus
--------------------
1 First M
2 Second P
3 Third M
4 Fourth M
--------------------
The second:
id file_path id_item id_type
--------------------------------------
1 file1.jpg 1 1
2 file2.jpg 1 2
3 file3.jpg 2 1
4 file4.jpg 1 4
5 file5.jpg 1 1
--------------------------------------
The third:
id file_type
--------------
1 red
2 blue
3 green
4 default
--------------
The fourth table:
id id_type modus
--------------------
1 1 M
2 2 M
3 3 M
4 4 M
5 1 P
6 4 P
--------------------
What I need to obtain is a table with such items (referred to id_item = 1):
id_item file_path id_type file_type
--------------------------------------------
1 file1.jpg 1 red
1 file5.jpg 1 red
1 file2.jpg 2 blue
1 file4.jpg 4 default
<null> <null> 3 green
--------------------------------------------
While the result table for id_item = 2 should be the following:
id_item file_path id_type file_type
--------------------------------------------
2 file3.jpg 1 red
<null> <null> 4 default
--------------------------------------------
where of course 'id_item' is the 'id' of 'items' table, 'id_type' is the 'id' of the 'types' table etc.
In short I need to have a table that depicts the checklist status for a particularm 'item' id i.e. which files have been collected but also which of them are missing.
I tried to use RIGHT JOIN clause without success:
SELECT
items.id AS id_item,
files.file_path AS file_path,
file_types.id AS id_type,
file_types.file_type AS file_type
FROM
files
RIGHT JOIN
checklist ON (files.id_type = checklist.id_type )
INNER JOIN
items ON (files.id_item = items.id)
AND (items.modus = checklist.modus)
INNER JOIN
file_types ON (checklist.id_type = file_types.id)
WHERE (items.id = 1);
the result of this query is:
id_item file_path id_type file_type
------------------------------------------
1 file1.jpg 1 red
1 file5.jpg 1 red
1 file2.jpg 2 blue
1 file4.jpg 4 default
it lacks of the last row (the missing file from the checklist).
Following query gives you status of each item as following (kind of checklist). I had to change some of the column names which were reserved words in my test environment.
select item_id,
fp filepath,
m_type,
item_desc,
modee,
(select t.type from typess t where t.id = m_type)
from (select null item_id,
i.descr item_desc,
c.modee modee,
c.id_type m_type,
null fp
from items i, checklist c
where c.modee = i.modee
and i.id = 0
and c.id_type not in
(select f.id_type from files f where f.id_item = i.id)
union all
select i.id item_id,
i.descr item_desc,
c.modee modee,
c.id_type m_type,
f.file_path fp
from items i, checklist c, files f
where c.modee = i.modee
and i.id = 0
and f.id_item = i.id
and f.id_type = c.id_type)
order by item_id asc, m_type asc
Try this:
SELECT
files.file_path,
types.type
FROM files
LEFT JOIN checklist ON (files.id_type = checklist.id_type )
LEFT JOIN items ON (files.id_item = items.id)
AND (items.mode = checklist.mode)
LEFT JOIN types ON (checklist.id_type = types .id)
WHERE (items.id = 0);
I have created and populated your tables, but I a discrepancy between what you request (for each item) and your example output (for each item type). However, I have created a query based on the output:
;with cte as (
SELECT i.id, f.file_path, f.id_type
from checklist ck
JOIN files f on f.id_type = ck.id_type
JOIN items i on i.id = f.id_item AND i.mode = ck.mode AND i.id = 0
)
SELECT cte.id, cte.file_path, T.id, T.[type]
FROM types T
LEFT JOIN cte on cte.id_type = T.id
[edit]
My result is the following (SQL):
id file_path id type
---------------------------------
0 file1.jpg 0 red
0 file5.jpg 0 red
0 file2.jpg 1 blue
NULL NULL 2 green
0 file4.jpg 3 default
No CTE version:
SELECT cte.id, cte.file_path, T.id, T.[type]
FROM types T
LEFT JOIN (
SELECT i.id, f.file_path, f.id_type
from checklist ck
JOIN files f on f.id_type = ck.id_type
JOIN items i on i.id = f.id_item AND i.mode = ck.mode AND i.id = 0
) cte on cte.id_type = T.id

Query is very slow when I add a third LEFT JOIN

Hi there I have been playing with this query hours and I can't get it return results in a reasonable execution time.
Here is the case:
I have three tables -
Table 1 called : rowsall
1 id int(11)
2 masterCaseId varchar(50)
3 RowNum int(11)
4 fullCaseNumber varchar(50)
5 rowKtavNameFull varchar(250)
6 DateOpen varchar(50)
7 DateProccess varchar(50)
8 rowStatus varchar(50)
9 rowCourt varchar(100)
10 rowProcedure varchar(50)
11 rowCaseType varchar(50)
12 rowIntrest varchar(50)
13 rowDetailsGen varchar(250)
14 rowTypeTeanot varchar(50)
15 rowHisayon varchar(50)
16 rowAmount varchar(50)
17 rowZacautPtor varchar(50)
18 rowZacautApproove varchar(50)
19 rowStatIravon varchar(50)
20 rowDateClose varchar(50)
21 rowCloseReason varchar(50)
22 rowResultTaken varchar(50)
23 rowOldFile varchar(50)
24 rowOpenedInCourse varchar(50)
25 rowGniza varchar(50)
26 rowReasonDeposit varchar(50)
27 rowTypeJudgeType varchar(50)
28 rowJudgeTypeDate
29 rowJudgeTypeName varchar(50)
30 rowGishurType varchar(50)
31 rowGishurDetails varchar(250)
Total rows: 13001, size 11.7mb
Indexes:
PRIMARY BTREE Yes No id 13001 A No
RowNum BTREE No No RowNum 12 A No
rowStatus 12 A No
rowResultTaken 12 A No
rowJudgeTypeName BTREE No No rowJudgeTypeName 1083 A No
masterCaseId BTREE No No masterCaseId 13001 A No
RowNum_2 BTREE No No rowJudgeTypeName 1857 A No
RowNum 1857 A No
fullCaseNumber BTREE No No fullCaseNumber 203 A No
Table 2 called : casses_rows
1 id int(11)
2 caseFullNum varchar(50)
3 statusCrawl varchar(50)
4 courtPlace text
5 rowsNum int(11)
6 caseJudge varchar(50)
7 caseFullName text
8 whenCrawled datetime
9 yearVal varchar(5)
10 monthVal varchar(5)
11 caseVal int(11)
Total rows: ~23,846, size 4.8mb
Indexes:
PRIMARY BTREE Yes No id 26302 A No
Table 3 called : casedocs
1 id int(11)
2 caseNum varchar(20)
3 DocTitle varchar(250)
4 DocDateStr varchar(20)
5 KeyWords text
6 content text
7 DocDateParsed timestamp
Total rows: ~1,163,669, size 4.1g
Indexes:
PRIMARY BTREE Yes No id 895132 A No
caseNum BTREE No No caseNum 895132 A No
My goal:
I need to join those tables to get most of the cols in table1 + one col in table 2 + one col in table 3 or NULL if there is no match:
My Query is:
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`, A.`rowStatus`, A.`rowIntrest` ,A.`DateOpen` ,A.`DateProccess`, A.`rowDateClose`, A.`rowJudgeTypeDate`
FROM (SELECT * FROM `rowsall` WHERE `rowJudgeTypeName` LIKE '%#value1%' AND `RowNum` ='1' ) A
INNER JOIN ( SELECT `id`,`caseFullName` FROM `casses_rows` ) B
ON A.`masterCaseId` = B.`id`
LEFT JOIN (SELECT `caseNum` FROM `casedocs` GROUP BY `caseNum` ORDER BY NULL ) C
ON A.`fullCaseNumber` = C.`caseNum`
The result is as I wanted, but the problem is that it takes 1 min to return the results...
Here is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 121
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 24185 Using where; Using join buffer
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 343438
4 DERIVED casedocs index NULL caseNum 62 NULL 768024 Using index
3 DERIVED casses_rows ALL NULL NULL NULL NULL 29872
2 DERIVED rowsall ref RowNum RowNum 4 6500 Using where
As you can see I'm grouping table 3 to prevent the join creating duplicate rows in the results - actually the third join is to test if there are docs that correspond to the case or not (will be NULL).
More information:
If I remove the third join the query take 1 sec
If I execute only the third join select statement it take 0.003 sec.
When profiling the query the " sending data " is 99.9% of the time.
Any Ideas why it takes so long to execute the third join????
Mission accomplished!
Thanks to #Turophile and #Joel Coehoorn new test results are arount 0.004 sec!!!
Here is the finall query:
SELECT DISTINCT A.`id` AS idRowCase, C.`caseNum` AS isPaperAva, A.`rowCaseType` , A.`fullCaseNumber` , A.`rowProcedure` , B.`caseFullName` , A.`rowCourt` , A.`rowAmount` , A.`rowResultTaken` , A.`rowStatus` , A.`rowIntrest` , A.`DateOpen` , A.`DateProccess` , A.`rowDateClose` , A.`rowJudgeTypeDate`
FROM `rowsall` A
INNER JOIN `casses_rows` B ON A.`masterCaseId` = B.`id`
LEFT JOIN `casedocs` C ON A.`fullCaseNumber` = C.`caseNum`
WHERE A.`rowJudgeTypeName` LIKE '%#value1%'
AND A.`RowNum` = '1'
My advice would be to not sort and group unnecessarily. So, something like this:
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`,
A.`rowStatus`,
A.`rowIntrest`,
A.`DateOpen` ,
A.`DateProccess`,
A.`rowDateClose`,
A.`rowJudgeTypeDate`
FROM `rowsall` AS A
INNER JOIN `casses_rows` AS B
ON A.`masterCaseId` = B.`id`
LEFT JOIN `casedocs` AS C
ON A.`fullCaseNumber` = C.`caseNum`
WHERE `rowJudgeTypeName` LIKE '%#value1%'
AND `RowNum` ='1'
(may return different results (multiple rows) if caseNum isn't unique).
You could also turn the LEFT JOIN into a sub-select:
SELECT
A.`id` AS idRowCase,
A.`fullCaseNumber` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`,
A.`rowStatus`,
A.`rowIntrest`,
A.`DateOpen` ,
A.`DateProccess`,
A.`rowDateClose`,
A.`rowJudgeTypeDate`
FROM `rowsall` AS A
INNER JOIN `casses_rows` AS B
ON A.`masterCaseId` = B.`id`
WHERE `rowJudgeTypeName` LIKE '%#value1%'
AND `RowNum` ='1'
AND A.`fullCaseNumber` in (SELECT `caseNum` FROM `casedocs` )
But this shows that using table casedocs is kind of redundant - is it really needed?
Firstly, the first two tables have no need for subqueries at all. This can be better expressed directly through join conditions and the WHERE clause.
Also, the last join uses a sub query with a group by:
LEFT JOIN (SELECT caseNum FROM casedocs GROUP BY caseNum ORDER BY NULL )
This breaks MySql's ability to use any indexes when computing that last join. If you can re-write this to join the table first, and do the GROUP BY in the outer query, so that you get the same results, it might perform much better, because you'll have better use of indexes.
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`, A.`rowStatus`, A.`rowIntrest` ,A.`DateOpen` ,A.`DateProccess`, A.`rowDateClose`, A.`rowJudgeTypeDate`
FROM `rowsall` A
INNER JOIN `casses_rows` B ON A.`masterCaseId` = B.`id`
LEFT JOIN (SELECT `caseNum` FROM `casedocs` GROUP BY `caseNum` ) C ON c.`caseNum` = A.`fullCaseNumber`
WHERE A.`rowJudgeTypeName` LIKE '%#value1%' AND A.`RowNum` ='1'

mysql update multiple tables with single sql to get sum(qty)

I have two tables with data
CREATE TABLE `MASTER` (
`NAME` VARCHAR(10) NOT NULL,
`QTY` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`NAME`)
);
NAME | QTY
----------
'ABC' | 0
'XYZ' | 0
CREATE TABLE `DETAIL` (
`NAME` VARCHAR(10) NOT NULL,
`QTY` INT(10) UNSIGNED NOT NULL,
`FLAG` TINYINT(1) UNSIGNED NOT NULL
);
NAME | QTY| FLAG
--------------------
'ABC' | 10 | 0
'ABC' | 20 | 0
'PQR' | 15 | 0
'PQR' | 25 | 0
i want to update sum(detail.qty) to master and set its flag to 1
so i have written query
UPDATE MASTER M, DETAIL D
SET M.QTY = M.QTY + D.QTY,
D.FLAG =1
WHERE M.NAME = D.NAME;
i have guesed MASTER.QTY should be 30 (10 + 20) from detail table.
but it only updates the first value
actual value is MASTER.QTY =10 (only updtaed first value from table)
How can i get MASTER.QTY =30?
Try this query:
update `MASTER` m,`DETAIL` d,
(
SELECT `NAME`, SUM( `QTY` ) as `QTY`
FROM `DETAIL`
GROUP BY `NAME`
) s
SET m.QTY = s.QTY,
d.FLAG = 1
WHERE
m.NAME = s.NAME
AND m.NAME = d.NAME
;
SQLFiddle demo --> http://www.sqlfiddle.com/#!2/ab355/1
IMO, your Master table is unnecessary. You don't need it if the amount of rows ain't in a > 5-digit range.
This equals the MASTER table:
SELECT NAME, SUM(QTY), FLAG FROM DETAIL GROUP BY NAME;
You can create a view from that easily.
Your answer anyways:
UPDATE MASTER m
JOIN DETAIL d ON m.NAME = d.NAME
SET
d.FLAG = 1,
m.QTY = (SELECT SUM(QTY) FROM DETAIL WHERE NAME = d.NAME GROUP BY NAME)
WHERE m.NAME = d.NAME
Also, always follow normalization rules: https://en.wikipedia.org/wiki/Database_normalization

Lost on creating a query for this scenario

Here is a simplified version of the table structure.
Employee
(
ID GUID NOT NULL
OldID GUID NULL
Name VARCHAR(50) NOT NULL
CreationDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
)
It contains employee information as well as any changes been made to employee attributes. This way we can get a complete audit of changes been made. When OldID is NULL, that basically mean the latest data. Here is an example, I am using integer values for identifier to make this example easier to understand.
ID OldId Name CreationDate
13 NULL John 15-July-2013
12 13 John1 14-July-2013
11 12 John2 13-July-2013
10 11 John3 12-July-2013
121 NULL Smith 15-July-2013
To start with I can get the unique employees from table by
SELECT ID, Name FROM Employee WHERE OldId IS NULL
I am looking to get latest ID but its earliest name. So that result should be two rows
ID Name
13 John3
121 Smith
I am not sure how can I get these results. Any help will be highly appreciated.
Here's one approach that works for your data:
with groups as
(
select groupID = ID, *
from Employee
where OldID is null
union all
select g.groupID, e.*
from groups g
inner join Employee e on g.ID = e.OldID
)
, ranks as
(
select *
, firstRank = row_number () over (partition by groupID order by creationDate)
from groups
)
select ID = groupID
, Name
from ranks
where firstRank = 1
SQL Fiddle with demo.