I am writing this query using windows function Row_Number() which will find out duplicates and i am trying to delete those duplicates.
To do this i have written CTE and included window function it and attempting to delete duplicate row. However, i am getting error saying delete is not updatable.
select * from housingdata;
.
.
.
with rownumcte as (
select * ,row_number() over (partition by ParcelID, PropertyAddress,
SalePrice,saledate,LegalReference order by UniqueID) as rownum
from housingdata)
delete
from rownumcte
where rownum>1;
if i use select instead of delete i am getting following output containing duplicates which is 104 rows
Yes CTE are for many things very good, but for your purpose not.
Use instead a INNER JOIN.
CREATE TABLE housingdata (UniqueID int,
ParcelID int
, PropertyAddress varchar(50)
,
SalePrice DECIMAL(10,2)
,saledate Date
,LegalReference int)
INSERT INTO housingdata VALUES (1,1,'test',1.1, NOW(), 1),(2,1,'test',1.1, NOW(), 1)
delete hd
FROM housingdata hd INNER JOIN
(
select UniqueID ,row_number() over (partition by ParcelID, PropertyAddress,
SalePrice,saledate,LegalReference order by UniqueID) as rownum
from housingdata) t1 ON hd.UniqueID = t1.UniqueID
WHERE t1.rownum>1;
SELECT * FROM housingdata
UniqueID | ParcelID | PropertyAddress | SalePrice | saledate | LegalReference
-------: | -------: | :-------------- | --------: | :--------- | -------------:
1 | 1 | test | 1.10 | 2022-02-25 | 1
db<>fiddle here
UPDATE
You could have used also the CTE as joined table
with rownumcte as (
select UniqueID ,row_number() over (partition by ParcelID, PropertyAddress,
SalePrice,saledate,LegalReference order by UniqueID) as rownum
from housingdata)
delete hd
from housingdata hd INNER JOIN rownumcte r ON hd.UniqueID = r.UniqueID
where rownum>1;
SELECT * FROM housingdata
UniqueID | ParcelID | PropertyAddress | SalePrice | saledate | LegalReference
-------: | -------: | :-------------- | --------: | :--------- | -------------:
1 | 1 | test | 1.10 | 2022-02-25 | 1
db<>fiddle here
Related
I have to do a query and I can't figure it out. I have an actions table ( user_id , action, created_at ), and I need to retrieve all users who performed the same actions as current_user ( in exact order).
ex.
current_user delete 2022/03/19 13:40
current_user add_post 2022/03/19 13:45
current_user write_comment 2022/03/22 13:48
Query result:
user_5 delete 2021/03/15 14:50
user_5 add_post 2021/05/15 13:50
user_5 write_comment 2022/06/06 14:30
user_6 delete 2021/03/15 14:50
user_6 add_post 2021/05/15 13:50
user_6 write_comment 2022/06/06 14:30
( all users with same actions )
You don't stipulate that the exact matching has to be in groups of 3. The following query will identify exact action sequences of 1, 2, 3 or even more then 3:
CREATE TABLE actions(
user_id VARCHAR(50) NOT NULL
,action VARCHAR(50) NOT NULL
,created_at DATE NOT NULL
);
INSERT INTO actions(user_id,action,created_at)
VALUES
('current_user','delete','2022/03/19 13:40')
, ('current_user','add_post','2022/03/19 13:45')
, ('current_user','write_comment','2022/03/22 13:48')
, ('other_user','delete','2022/02/19 13:40')
, ('other_user','add_post','2022/02/19 13:45')
, ('other_user','write_comment','2022/02/22 13:48')
, ('diff_user','delete','2022/02/19 12:40')
, ('diff_user','add_post','2022/02/19 12:42')
, ('diff_user','other_action','2022/02/19 12:45')
, ('diff_user','write_comment','2022/02/22 12:48')
;
It appears (through a comment) that you can use the lag() so I suggest using row_number() to match the action sequences of current user to other users, as follows:
SELECT ou.*
FROM (
SELECT *
, row_number() OVER (
PARTITION BY user_id ORDER BY created_at
) AS rn
FROM actions
WHERE user_id <> 'current_user'
) AS OU
INNER JOIN (
SELECT *
, row_number() OVER (
ORDER BY created_at
) AS rn
FROM actions
WHERE user_id = 'current_user'
) AS CU ON ou.rn = cu.rn
AND ou.action = cu.action
result
+------------+---------------+---------------------+----+
| user_id | action | created_at | rn |
+------------+---------------+---------------------+----+
| diff_user | delete | 2022-02-19 12:40:00 | 1 |
| diff_user | add_post | 2022-02-19 12:42:00 | 2 |
| other_user | delete | 2022-02-19 13:40:00 | 1 |
| other_user | add_post | 2022-02-19 13:45:00 | 2 |
| other_user | write_comment | 2022-02-22 13:48:00 | 3 |
+------------+---------------+---------------------+----+
Now if you really did want to limit this to sequence match of just 3, then you could subsequently count(*) over(partition by user_id) then filter for when that calculation is 3:
SELECT *
FROM (
SELECT ou.*
, count(*) OVER (PARTITION BY ou.user_id) AS cn
FROM (
SELECT *
, row_number() OVER (
PARTITION BY user_id ORDER BY created_at
) AS rn
FROM actions
WHERE user_id <> 'current_user'
) AS OU
INNER JOIN (
SELECT *
, row_number() OVER (
ORDER BY created_at
) AS rn
FROM actions
WHERE user_id = 'current_user'
) AS CU ON ou.rn = cu.rn
AND ou.action = cu.action
) d
WHERE cn = 3
result
+------------+---------------+---------------------+----+----+
| user_id | action | created_at | rn | cn |
+------------+---------------+---------------------+----+----+
| other_user | write_comment | 2022-02-22 13:48:00 | 1 | 3 |
| other_user | add_post | 2022-02-19 13:45:00 | 2 | 3 |
| other_user | delete | 2022-02-19 13:40:00 | 3 | 3 |
+------------+---------------+---------------------+----+----+
for reference db<>fiddle (nb: using postgres as MySQL 8 wasn't available at the time)
btw: you probably also need to introduce the concept of "session" into this logic but that is left for you to consider.
create table dt
(
id varchar(20),
user_id int,
name varchar(20),
td DATE,
amount float
);
INSERT INTO dt VALUES('blah',1, 'Rodeo', '2018-01-20', 10.12);
INSERT INTO dt VALUES('blahblah',1, 'Rodeo', '2019-01-01', 40.44);
INSERT INTO dt VALUES('sas',2, 'Janice', '2018-02-05', 18.18);
INSERT INTO dt VALUES('dsdcd',3, 'Sam', '2019-01-26', 16.13);
INSERT INTO dt VALUES('sdc',2, 'Janice', '2018-02-01', 12.19);
INSERT INTO dt VALUES('scsc',2, 'Janice', '2017-12-06', 5.10);
+----------+---------+--------+------------+--------+
| id | user_id | name | td | amount |
+----------+---------+--------+------------+--------+
| blah | 1 | Rodeo | 2018-01-20 | 10.12 |
| blahblah | 1 | Rodeo | 2019-01-01 | 40.44 |
| sas | 2 | Janice | 2018-02-05 | 18.18 |
| dsdcd | 3 | Sam | 2019-01-26 | 16.13 |
| sdc | 2 | Janice | 2018-02-01 | 12.19 |
| scsc | 2 | Janice | 2017-12-06 | 5.1 |
+----------+---------+--------+------------+--------+
For the above table how i can get this output. I can achieve this by windowing function but not sure how to do this by correlated subquery. Appreciate any help!
Output
Basically difference of users first transaction amount from their latest transaction amount. If the user has only one transaction then the difference is 0
User_id name amount
1 Rodeo 30.32 [40.44(latest trans) - 10.12 (min trans)]
3 Sam 0
2 Janice 13.08 [18.18 (latest trans) - 5.1 (min trans)]
With 2 subqueries to get the latest and earliest amounts:
select distinct t.user_id, t.name,
(select amount from dt
where user_id = t.user_id
order by td desc limit 1
)
-
(select amount from dt
where user_id = t.user_id
order by td limit 1
) amount
from dt t
See the demo.
Or:
select t.user_id, t.name,
max(t.latest * t.amount) - max(t.earliest * t.amount) amount
from (
select d.user_id, d.name, d.amount,
d.td = g.earliestdate earliest, d.td = g.latestdate latest
from dt d inner join (
select user_id, min(td) earliestdate, max(td) latestdate
from dt
group by user_id
) g on d.user_id = g.user_id and d.td in (earliestdate, latestdate)
) t
group by t.user_id, t.name
See the demo.
Results:
| user_id | name | amount |
| ------- | ------ | ------ |
| 1 | Rodeo | 30.32 |
| 2 | Janice | 13.08 |
| 3 | Sam | 0 |
This is similar to SQL select only rows with max value on a column, but you need to do it twice: once for the earliest row, again for the latest row.
SELECT t1.user_id, t1.name, t1.amount - t2.amount ASA amount
FROM (
SELECT dt1.user_id, dt1.name, dt1.amount
FROM dt AS dt1
JOIN (
SELECT user_id, name, MAX(td) AS maxdate
FROM dt
GROUP BY user_id, name) AS dt2
ON dt1.user_id = dt2.user_id AND dt1.name = dt2.name AND dt1.td = dt2.maxdate
) AS t1
JOIN (
SELECT dt1.user_id, dt1.name, dt1.amount
FROM dt AS dt1
JOIN (
SELECT user_id, name, MIN(td) AS mindate
FROM dt
GROUP BY user_id, name) AS dt2
ON dt1.user_id = dt2.user_id AND dt1.name = dt2.name AND dt1.td = dt2.mindate
) AS t2
ON t1.user_id = t2.user_id AND t1.name = t2.name
Approach using Correlated Subquery:
Query
SELECT user_id,
name,
Round(Coalesce ((SELECT t1.amount
FROM dt t1
WHERE t1.user_id = dt.user_id
ORDER BY t1.td DESC
LIMIT 1) - (SELECT t2.amount
FROM dt t2
WHERE t2.user_id = dt.user_id
ORDER BY t2.td ASC
LIMIT 1), 0), 2) AS amount
FROM dt
GROUP BY user_id,
name;
| user_id | name | amount |
| ------- | ------ | ------ |
| 1 | Rodeo | 30.32 |
| 2 | Janice | 13.08 |
| 3 | Sam | 0 |
View on DB Fiddle
You can try this as well
Select t3.user_id, t3.name, max(t3.new_amount) FROM (
Select t1.user_id, t2.name, (t1.amount - t2.amount) as new_amount
FROM dt t1
INNER JOIN dt t2
ON t1.user_id=t2.user_id
Order by t1.user_id ASC, t1.td DESC, t2.user_id ASC, t2.td ASC
) as t3
group by t3.user_id,t3.name;
Demo
I got a DATETIME to store when the values where introduced, like this example shows:
CREATE TABLE IF NOT EXISTS salary (
change_id INT(11) NOT NULL AUTO_INCREMENT,
emp_salary FLOAT(8,2),
change_date DATETIME,
PRIMARY KEY (change_id)
);
I gonna fill the example like this:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 1 | 200.00 | 2018-06-18 13:17:17 |
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
I want to get the last inserted value of each month for every year.
So for the example I made, this should be the output of the Select:
+-----------+------------+---------------------+
| change_id | emp_salary | change_date |
+-----------+------------+---------------------+
| 2 | 700.00 | 2018-06-25 15:20:30 |
| 3 | 300.00 | 2018-07-02 12:17:17 |
+-----------+------------+---------------------+
1 won't appear because is an outdated version of 2
You could use a self join to pick group wise maximum row, In inner query select max of change_date by grouping your data month and year wise
select t.*
from your_table t
join (
select max(change_date) max_change_date
from your_table
group by date_format(change_date, '%Y-%m')
) t1
on t.change_date = t1.max_change_date
Demo
If you could use Mysql 8 which has support for window functions you could use common table expression and rank() function to pick row with highest change_date for each year and month
with cte as(
select *,
rank() over (partition by date_format(change_date, '%Y-%m') order by change_date desc ) rnk
from your_table
)
select * from cte where rnk = 1;
Demo
The below query should work for you.
It uses group by on month and year to find max record for each month and year.
SELECT s1.*
FROM salary s1
INNER JOIN (
SELECT MAX(change_date) maxDate
FROM salary
GROUP BY MONTH(change_date), YEAR(change_date)
) s2 ON s2.maxDate = s1.change_date;
Fiddle link : http://sqlfiddle.com/#!9/1bc20b/15
I have a single SQL table that contains multiple entries for each customerID (some customerID's only have one entry which I want to keep). I need to remove all but the most recent entry per customerID, using the invoiceDate field as my marker.
So I need to go from this:
+------------+-------------+-----------+
| customerID | invoiceDate | invoiceID |
+------------+-------------+-----------+
| 1 | 1393995600 | xx |
| 1 | 1373688000 | xx |
| 1 | 1365220800 | xx |
| 2 | 1265220800 | xx |
| 2 | 1173688000 | xx |
| 3 | 1325330800 | xx |
+------------+-------------+-----------+
To this:
+------------+-------------+-----------+
| customerID | invoiceDate | invoiceID |
+------------+-------------+-----------+
| 1 | 1393995600 | xx |
| 2 | 1265220800 | xx |
| 3 | 1325330800 | xx |
+------------+-------------+-----------+
Any guidance would be greatly appreciated!
Write a query to select all the rows you want to delete:
SELECT * FROM t
WHERE invoiceDate NOT IN (
SELECT MAX(invoiceDate)
-- "FROM t AS t2" isn't supported by MySQL, see http://stackoverflow.com/a/14302701/227576
FROM (SELECT * FROM t) AS t2
WHERE t2.customerId = t.customerId
GROUP BY t2.customerId
)
This may take a long time on a big database.
If you're satisfied, change the query to a DELETE statement:
DELETE FROM t
WHERE invoiceDate NOT IN (
SELECT MAX(invoiceDate)
-- "FROM t AS t2" isn't supported by MySQL, see http://stackoverflow.com/a/14302701/227576
FROM (SELECT * FROM t) AS t2
WHERE t2.customerId = t.customerId
GROUP BY t2.customerId
)
See http://sqlfiddle.com/#!9/6e031/1
If you have multiple rows whose date is the most recent for the same customer, you would have to look for duplicates and decide which one you want to keep yourself. For instance, look at customerId 2 on the SQL fiddle link above.
Try out this one
with todelete as
(
select
CustomerId, InvoiceId, InvoiceDate, Row_Number() over (partition by CustomerId order by InvoiceDate desc) as Count
from DeleteDuplicate
)
delete from todelete
where count > 1
Let us asume that the table name is transaction_table.
create table test1 AS
select * from (
select * from transaction_table order by customerID, invoiceDate desc) temp
group by customerID
You will have the output data in test1 table.
delete from ex_4 where
rowid in
(select rowid
from ex_4 a
where to_date(invoicedate,'DDMMYYYY') = (select max(to_date(invoicedate,'DDMMYYYY')) from ex_4 b where a.customerid != b.customerid))
This is how it will be done in oracle.This query will delete all but most recently added row.Looking at your table structure i am assuming that the invoicedate column is varchar2 type so converting it to date used to_date function here
I have a table like this:
+----+---------+------------+
| id | conn_id | read_date |
+----+---------+------------+
| 1 | 1 | 2010-02-21 |
| 2 | 1 | 2011-02-21 |
| 3 | 2 | 2011-02-21 |
| 4 | 2 | 2013-02-21 |
| 5 | 2 | 2014-02-21 |
+----+---------+------------+
I want the second highest read_date for particular 'conn_id's i.e. I want a group by on conn_id. Please help me figure this out.
Here's a solution for a particular conn_id :
select max (read_date) from my_table
where conn_id=1
and read_date<(
select max (read_date) from my_table
where conn_id=1
)
If you want to get it for all conn_id using group by, do this:
select t.conn_id, (select max(i.read_date) from my_table i
where i.conn_id=t.conn_id and i.read_date<max(t.read_date))
from my_table t group by conn_id;
Following answer should work in MSSQL :
select id,conn_id,read_date from (
select *,ROW_NUMBER() over(Partition by conn_id order by read_date desc) as RN
from my_table
)
where RN =2
There is an intresting article on use of rank functions in MySQL here : ROW_NUMBER() in MySQL
If your table design as ID - date matching (ie a big id always a big date), you can group by id, otherwise do the following:
$sql_max = '(select conn_id, max(read_date) max_date from tab group by 1) as tab_max';
$sql_max2 = "(select tab.conn_id,max(tab.read_date) max_date2 from tab, $sql_max
where tab.conn_id = tab_max.conn_id and tab.read_date < tab_max.max_date
group by 1) as tab_max2";
$sql = "select tab.* from tab, $sql_max2
where tab.conn_id = tab_max2.conn_id and tab.read_date = tab_max2.max_date2";