SQL Teradata - mark duplicate records in the column - duplicates

So I know how to identify a dup row but now also need to identify the row linked to it and mark as duplicate.
Ex:
Row Name ID State Date Dup
---------------------------------------
001 Jim 001 NJ jan2020
002 Jim 001 NJ jan2020
003 Tan 002 NY feb2020
004 Allen 003 CA Feb2020
Output should like:
Row Name ID State Date Dup
---------------------------------------
001 Jim 001 NJ jan2020 Y
002 Jim 001 NJ jan2020 Y
003 Tan 002 NY feb2020 N
004 Allen 003 CA Feb2020 N
I can use partition using row_number but it will not flag the record 001 as Y. What could be an approach?

If you have a small number of columns, you can do something like this:
SELECT Row, Name, ID, State, Date,
CASE
WHEN COUNT(*) OVER(PARTITION BY Name, ID, State, Date) > 1 THEN 'Y'
ELSE 'N'
END AS Dup
FROM MyTable
This marks a given row as a duplicate based on the columns specified in the PARTITION BY expression. Also, be careful with your column names (i.e. Row, Date), as they may be reserved words.

Related

How to join based on max timestamp in SQL?

So I have a df like this:
ID fruit
001 grapes
002 apples
002 mangos
003 bananas
004 oranges
004 grapes
And I want to join the following onto it:
ID store_time
001 2021-04-02 03:02:00.321
002 2021-04-02 02:02:00.319
002 2021-04-03 12:02:00.319
002 2021-04-04 13:02:00.312
003 2021-04-02 19:02:00.313
004 2021-04-02 15:02:00.122
004 2021-04-01 11:02:00.121
So all I want to do is join based on just the most recent timestamp. So leave the others behind and have only the number of rows as there are in the fruit df.
Final output:
ID fruit timestamp
001 grapes 2021-04-02 03:02:00.321
002 apples 2021-04-04 13:02:00.312
002 mangos 2021-04-04 13:02:00.312
003 bananas 2021-04-02 19:02:00.313
004 oranges 2021-04-02 15:02:00.122
004 grapes 2021-04-02 15:02:00.122
Aggregate in the 2nd table to get the most recent store_time for each ID and then join to the 1st table:
SELECT t1.ID, t1.fruit, t2.timestamp
FROM table1 t1
LEFT JOIN (
SELECT ID, MAX(store_time) timestamp
FROM table2
GROUP BY ID
) t2 ON t2.ID = t1.ID
I used a LEFT join just in case table2 does not contain all the IDs of table1.
If this is not the case then you can change it to an INNER join.
you need a subquery for max tme stamp
select a.id, a.fruit, b.max_time
from my_table_fruit a
inner join (
select id, max(store_time) max_time
from my_table_time
) b on b.id = a.id

Output isn't ordered properly

How to order the result by C, D, A, B and pincode chronologically?
Original result :
S Pincode
== =======
A 001
B 002
C 003
D 004
D 005
C 006
B 007
A 008
Expected result:
S Pincode
== =======
C 003
C 006
D 004
D 005
A 001
A 008
B 002
B 007
Code:
SELECT
id,
sector,
pincode
FROM
sh_av_spform
WHERE
type='ticket' and
status='new' and
date(`createdate`) = CURDATE()
ORDER BY
FIELD( sector, 'C','D','A','B' ) ASC
limit 5
Above SQL, Gives sometimes, not correctly ordered pincode in chronological sector
Invalid output i get such as:
S Pincode
== =======
C 003
C 006
D 005
D 004 <<< ???
A 001
A 008
B 007
B 002 <<< ???
Anyone know how to fix this?
You only order by one column. Add the second one too:
SELECT
id,
sector,
pincode
FROM
sh_av_spform
WHERE
type='ticket' and
status='new' and
date(`createdate`) = CURDATE()
ORDER BY
FIELD( sector, 'C','D','A','B' ) ASC,
pincode ASC
You didn't include pincode in your ORDER BY clause :
ORDER BY
FIELD( sector, 'C','D','A','B' ) , pincode
No need to write ASC as it is the default ordering.

How to get highest salary city wise

I was asked this question in an interview (hopefully you guys can help me; thanks in advance).
In Hive, how would you get the highest salary city wise from employee table?
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
Try this:
SET #rank:=0;
SET #dept:='';
SET #desiredrank=8; --For example.
SELECT ename, rank, salary
FROM
(
SELECT
ename, salary,
#rank:=CASE WHEN #dept=deptid THEN #rank+1 ELSE 1 END AS rank,
#dept:=deptid AS department
FROM employees e
JOIN departments d
ON e.deptid=d.deptid
ORDER BY d.deptid, salary
)
WHERE rank=#desiredrank
Basically you have to use two extra variables. One to simulate the grouping, and one to keep track of the rank. When that query is done, filter it for the rank you want.
Select Id , max(salary) from employee group by city;
In that case, every city with highest salary with respective id will be displayed.

Datediff in date format

I have 2 tables with structure as
Emp Table
id name
001 Smith
002 Jerry
Leave
sr.no reason from_date to_date request_by status
1 PL 2011-12-11 2011-12-15 001 Declined
2 PL 2011-11-13 2011-11-13 001 Approved
3 PL 2011-10-02 2011-10-05 002 Declined
Now I have written this query
select DATEDIFF(Leave.from_date,Leave.to_date)as cdate,
Emp.id as emp
from Leave left join Emp
on Leave.request_by=Emp.id
gives me difference between these 2 dates like...
cdate emp
-4 001
0 001
-3 002
The first thing about this output difference between '2011-12-11 & 2011-12-15 ' need to be 5 as for 5 consecutive days employee is absent. That we achieve it.
But I need this cdate in date format like('%Y%m%d') and + if date difference is say -4 then 4 records should be displayed for that.
So I want to write a query which gives output like this......
cdate emp
2011-12-11 001
2011-12-12 001
2011-12-13 001
2011-12-14 001
2011-12-15 001
2011-11-13 001
2011-10-02 002
2011-10-03 002
2011-10-04 002
2011-10-05 002
So can anybody tell me what how should I need to write my query to get this output?
Try this query -
CREATE TABLE temp_days(d INT(11));
INSERT INTO temp_days VALUES
(0),(1),(2),(3),(4),(5),
(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15); -- maximum day difference, add more days here
SELECT l.from_date + INTERVAL td.d DAY cdate, e.id emp
FROM
`leave` l
LEFT JOIN Emp e
ON l.request_by = e.id
JOIN temp_days td
ON DATEDIFF(l.to_date, l.from_date) >= td.d
ORDER BY
e.id

Mysql Query Studied Days

I have Emp table with following values
Emp_Id Emp_Name Subject Dates
001 Smith Java 07-02-2012
001 Smith oracle 08-02-2012
001 smith C++ 10-02-2012
002 john java 01-01-2012
002 john SE 10-01-2012
002 john c 10-01-2012
001 smith physics 04-01-2012
001 smith c# 07-02-2012
001 smith javascript 07-02-2012
Now as we can see here smith studied only 3 days for month February and 1 for month Jan
while john studied only 2 days for month January.
How can we calculate this count for any employee?
As a e.g:Output should be in following way.
Emp_Id Emp_Name Month_Year No_Of_Days_Studied_In_Month
001 smith Feb12 3
001 smith Jan12 1
002 john Jan12 2
You can GROUP BY YEAR(Dates), MONTH(Dates) and do a COUNT.
TRY:
SELECT emp_id,
emp_name,
Date_format(DATE, '%b%y') AS dates,
COUNT(*) AS No_Of_Days_Studied_In_Month
FROM emp
GROUP BY Date_format(DATE, '%b%y'), emp_name
ORDER BY emp.emp_id