I have a mysql table with each row having like 20 fields. Among others, it has:
table: origin, destination, date, price
Now I want to remove any rows that are duplicate regarding only one set of specific fields: origin, destination, date.
I tried:
delete from mytable where id not in
(select id from (
SELECT MAX(p.id) as id from mytable p group by p.origin, p.destination, p.date
) x)
Problem: this retains the rows with the highest id (means: last added).
Instead I'd like to retain only the row that has the lowest price. But how?
Sidenote: I cannot add an unique index, as the table is used for mass inserts by LOAD DATA and should there not throw errors. At time of load I don't know which row is the "bestprice" one.
Also I would not want to introduce any additional or temp tables copying one to another. Just modify the existing table.
Self-join solution:
delete t1
from yourtable t1
join yourtable t2
on t1.origin = t2.origin
and t1.destination = t2.destination
and t1.date = t2.date
and t1.price > t2.price
delete t1
from mytable t1
left join
(
SELECT origin, destination, date, min(price) as price
from mytable
group by origin, destination, date
) t2 on t1.origin = t2.origin
and t1.destination = t2.destination
and t1.date = t2.date
and t1.price = t2.price
where t2.origin is null
Related
I have a table named consignment which has some duplicate rows against column "service" where service='CLRC'.
select * from consignment where service='CLRC'
When i select the rows, i have total 2023 rows which includes duplicates.
I wrote the below query to delete the rows but i want to select them first to make sure its deleting the correct records.
When the select runs it returns 64431 records. Is that correct?
select t1.hawb FROM consignment t1
INNER JOIN consignment t2
WHERE
t1.id < t2.id AND
t1.hawb = t2.hawb
and t1.service='CLRC'
If you expect your query to return the number of duplicates then no it is not correct.
The condition t1.id < t2.id will join every id of t1 with all ids from t2 that are greater resulting on more rows or less rows (in the case of only 2 duplicates) and rarely in the expected number.
See the demo.
If you want to see all the duplicates:
select * from consignment t
where t.service = 'CLRC'
and exists (
select 1 from consignment
where service = t.service and id <> t.id and hawb = t.hawb
)
See the demo.
If you want to delete the duplicates and keep only the one ones with the max id for each hawb then:
delete from consignment
where service='CLRC'
and id not in (
select id from (
select max(id) id from consignment
where service='CLRC'
group by hawb
) t
);
See the demo.
Include all the columns in the matching condition except id column, as being primary key :
delete t1
from consignment t1
join consignment t2
where t1.id < t2.id
and t1.hawb = t2.hawb
and t1.col1=t2.col1
and t1.col2=t2.col2
......
and t1.service='CLRC';
Demo
You can check the number of duplicates by
select count(*) from
(
select distinct hawb, col1, col2, service -- (all columns except `id`)
from consignment
) q
check whether this number equals number of deleted records just before commiting the changes.
I'm trying to delete duplicate rows from a mysql table, but still keep one.
However the following query seemingly deletes every duplicate row and I'm not sure why. Basically I want to delete the row if the outputID, title and type all matches.
DELETE DupRows.*
FROM output AS DupRows
INNER JOIN (
SELECT MIN(Output_ID) AS Output_ID, Title, Type
FROM output
GROUP BY Title, Type
HAVING COUNT(*) > 1
) AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND SaveRows.Output_ID = DupRows.Output_ID;
Just :
DELETE DupRows
FROM output AS DupRows
INNER JOIN output AS SaveRows
ON SaveRows.Title = DupRows.Title
AND SaveRows.Type = DupRows.Type
AND DupRows.Output_ID > SaveRows.Output_ID
This will delete all duplicates on Title and Type while keeping the record with the lowest value.
If you are running MySQL 8.0, you can use window function ROW_NUMBER() to assign a rank to each record in Title/Type groups, ordered by id. Then you can delete all records whose row number is not 1.
DELETE FROM output
WHERE Output_ID IN (
SELECT Output_ID
FROM (
SELECT Output_ID, ROW_NUMBER() OVER(PARTITION BY Title, Type ORDER BY Output_ID) rn
FROM output
) x
WHERE rn > 1
)
Delete From output Where Output_ID NOT IN (
Select MIN(Output_ID) from output Group By Title, Type Having COUNT(*)>1
)
By below query duplicate rows with matching condition get deleted and keeps one oldest unique row.
NOTE:- In my query I used id column is auto increment column.
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id>t2.id
If you want to keep newly inserted unique row just change the last condition as:
DELETE t1
FROM output t1, output t2
WHERE t1.Title = t2.Title
AND t1.Type = t2.Type
AND t1.Output_ID = t2.Output_ID
AND t1.id<t2.id
I have a mysql table with columns: customer, dateOrder.
One customer can have orders in multiple dates. I want to add a new column with the farthest date order for each customer. So far i tried this:
UPDATE mytable
SET MINDATE = (SELECT min(DATEORDER)
FROM (SELECT *
FROM mytable
GROUP
BY CUSTOMER
) tblTmp
)
, where tblTmp is a temporary table;The problem is that it brings the same date for all my customers (the farthest date in the table). Any ideas?
Use a JOIN to match the original table with the subquery:
UPDATE mytable AS t1
JOIN (SELECT customer, MIN(dateorder) AS mindate
FROM mytable
GROUP BY customer) AS t2 ON t1.customer = t2.customer
SET t1.mindate = t2.mindate
I'm writing a complex MySQL query and I'm having trouble figuring out how to finish it.
Here's the part that's giving me trouble (it's only a part of my query):
SELECT * FROM table AS t1
WHERE date < (
SELECT date FROM table AS t2
WHERE phase="B" AND t2.target = t1.target
)
Basically, I have items, each one with a date, a phase (A,B,C) and a target. For a target, there are several items of type A, then an single and optional item of type B, then items with type C.
For each target, I want to select all the rows following these conditions:
If there is an item with phase "B" (lets call him itemX), I want to return all items with a date inferior to the date of itemX
If there is no item with phase "B", I want to return all rows
The date parameter is very important. In most cases, the 3 phases are distinct, and cannot overlap, but there are some cases in which that happens.
The problem here, is that my subquery does not return any rows in case 1, and a single cell in case 2.
If we are in case 1, the whole condition WHERE date < (...) is irrelevant and should not be applied in the query.
I tried several possibilities with IFNULL and EXISTS, but I think I did it wrong because I keep getting syntax errors.
SELECT m.*
FROM (
SELECT target, MAX(date) AS maxdate
FROM mytable
) md
JOIN mytable m
ON m.target = md.target
AND m.date <
COALESCE
(
(
SELECT date
FROM mytable mb
WHERE mb.target = md.target
AND mb.phase = 'B'
ORDER BY
mb.target, pmb.phase, mb.date
LIMIT 1
),
maxdate + INTERVAL 1 SECOND
)
Create two indexes:
mytable (target, date)
mytable (target, phase, date)
for this to work fast.
Perhaps
SELECT *
FROM table AS t1
LEFT JOIN table AS t2 ON t2.target = t1.target AND (t1.date < t2.date)
WHERE (phase = 'B')
I'm assuming the table in your query is actually two tables and you're not doing a self join? If so, then you'll have to specify which table's phase you're referring to.
You might try
SELECT * FROM table AS t1
left join
table as t2
on t1.Target = t2.Target
and t2.phase="B"
where t2.target is null OR
OR t1.date < t2.Date
The code you posted is called "One subquery per condition anti-pattern". Use the CASE-WHEN-THEN.
SELECT t1.*
FROM table t1
LEFT
JOIN ( SELECT t.target
, MIN(t.date) AS b_date
FROM table t
WHERE t.phase = 'B'
GROUP BY t.target
) t2
ON t1.target = t2.target AND t1.date < t2.b_date
If there is some guarantee that a given target will have no more than one row with "phase"='B' at most, you can get by without the MIN and GROUP BY, like this:
SELECT t1.*
FROM table t1
LEFT
JOIN ( SELECT t.target
, t.date AS b_date
FROM table t
WHERE t.phase = 'B'
) t2
ON t1.target = t2.target AND t1.date < t2.b_date
I have the following two tables:
Table1 {T1ID, Name}
Table2 {T2ID, T1ID, Date, Value}
Date is of type DATE.
and I am looking for a SQL query to fetch only the latest value (by Date) for each T1ID for which the Name matches a specific string.
SELECT`Table2`.`T1ID`,
`Table2`.`Value`,
`Table2`.`Date`,
`Table1`.`Name`,
FROM `Table1`
INNER JOIN `Table2` ON `Table2`.`T1ID` = `Table1`.`T1ID`
WHERE `Table1`.`Name` LIKE 'Smith'
but this returns the value for several dates for the same T1ID.
How do I get only the latest value by Date?
Edit:
I am using MySQL 5.5.8
If I've understodd the question correctly:
Assuming MySQL:
SELECT`Table2`.`T1ID`,
`Table2`.`Value`,
`Table2`.`Date`,
`Table1`.`Name`
FROM `Table1`
INNER JOIN `Table2` ON `Table2`.`T1ID` = `Table1`.`ID`,
(SELECT T1ID, MAX(Date) AS 'Date' FROM Table2 GROUP BY T1ID) Table3
WHERE
`Table3`.`T1ID` = `Table2`.`T1ID`
AND
`Table3`.`Date` = `Table2`.`Date`
AND
`Table1`.`Name` LIKE 'Smith'
EDIT: Updated the code to bring back the correct result set. Removed MSSQL answer as it wasn't relevant
You have two options.
select t1.t1id, max(t1.Name) Name, max(t2.date) Date,
(select Value from table2 t22
where t22.date = max(t2.date) and t22.t1id = t2.t1id) Value
from table1 t1 left join table2 t2 on t1.t1id = t2.t1id
where Name like '%Smith%'
group by t2.t1id order by 2
OR
select mx.t1id, mx.Name, mx.Date, t2.Value
from
(
select t1.t1id, max(t1.Name) Name, max(t2.date) Date
from table1 t1 left join table2 t2 on t1.t1id = t2.t1id
where Name like '%Smith%'
group by t2.t1id
) mx left join table2 t2 on (t2.t1id = mx.t1id and t2.date = mx.date)
order by 2
Both will produce the same result. The first one takes less code but you might have performance issues with a huge set of data. The second one takes a little more code, but it is also a little more optimized. Notes on the JOIN option:
If you go LEFT JOIN (as the example shows), items in Table1 with no correspondent records on Table2 will be displayed in the result, but the values in columns Date and Value will be NULL
If you go INNER JOIN, items in Table1 with no correspondent records on Table2 will not be displayed.
EDIT
I missed one of the requirements, which was the Name matching a specific string. The code is now updated. The '%' acts like a wildcard, so it will match names like 'Will Smith' and 'Wail Smithers'. If you want a exact match, remove the wildcards ('%').
Add this to your SQL:
ORDER BY 'Date' DESC LIMIT 1