Deleting duplicate entries with search criteria - mysql

I have table like
table_id item_id vendor_id category_id
1 1 33 4
2 1 33 4
3 1 33 2
4 2 33 4
5 2 33 2
6 3 33 4
7 3 33 4
8 1 34 4
9 1 34 4
10 3 35 4
Here table_id is primary key and table having total 98000 entries including 61 duplicate entries which I found by executing query
SELECT * FROM my_table
WHERE vendor_id = 33
AND category_id = 4
GROUP BY item_id having count(item_id)>1
In above table table_id 1,2 and 6,7 duplicate. I need to delete 2 and 7 from my table( Total 61 Duplicate Entries). How can I delete duplicate entries from my table using query with where clause vendor_id = 33 AND category_id = 4 ? I don't want delete other duplicate entries such as table_id 8,9
I cannot index the table, since I need to kept some duplicate entries which required. I need to delete duplicate with certain criteria

Please always take backup before running any deletion query.
Try using LEFT JOIN like this:
DELETE my_table
FROM my_table
LEFT JOIN
(SELECT MIN(table_id) AS IDs FROM my_table
GROUP BY `item_id`, `vendor_id`, `category_id`
)A
ON my_table.table_id = A.IDs
WHERE A.ids IS NULL;
Result after deletion:
| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
| 1 | 1 | 33 | 4 |
| 3 | 1 | 33 | 2 |
| 4 | 2 | 33 | 4 |
| 5 | 2 | 33 | 2 |
| 6 | 3 | 33 | 4 |
See this SQLFiddle
Edit: (after OP's edit)
If you want to add more conditions, you can add it in outer WHERE condition like this:
DELETE my_table
FROM my_table
LEFT JOIN
(SELECT MIN(table_id) AS IDs FROM my_table
GROUP BY `item_id`, `vendor_id`, `category_id`
)A
ON my_table.table_id = A.IDs
WHERE A.ids IS NULL
AND vendor_id = 33 --< Additional conditions here
AND category_id = 4 --< Additional conditions here
See this SQLFiddle

What about this:
DELETE FROM my_table
WHERE table_id NOT IN
(SELECT MIN(table_id)
FROM my_table
GROUP BY item_id, vendor_id, category_id)

try below code...
DELETE FROM myTable
WHERE table_ID NOT IN (SELECT MAX (table_ID)
FROM myTable
GROUP BY table_ID
HAVING COUNT (*) > 1)

Try
DELETE m
FROM my_table m JOIN
(
SELECT MAX(table_id) table_id
FROM my_table
WHERE vendor_id = 33
AND category_id = 4
GROUP BY item_id, vendor_id, category_id
HAVING COUNT(*) > 1
) q ON m.table_id = q.table_id
After delete you'll have
| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
| 1 | 1 | 33 | 4 |
| 3 | 1 | 33 | 2 |
| 4 | 2 | 33 | 4 |
| 5 | 2 | 33 | 2 |
| 6 | 3 | 33 | 4 |
| 8 | 1 | 34 | 4 |
| 9 | 1 | 34 | 4 |
| 10 | 3 | 35 | 4 |
Here is SQLFiddle demo

From your Question, I guess you need to remove the duplicate rows which has same values for the item_id,vendor_id and category_id like the rows having tabled_id 1 and 2. So it can be done by making the mentioned three columns unique together. So try the following,
alter ignore table table_name add unique index(item_id, vendor_id, category_id);
Note: I didnt test this yet, Will give sqlfiddle in sometime

Related

How to select rows, using group by with minimum field values?

Today I have posted a question and got a good answer: Stuck in building mysql query.
I though it helped me, but I've discovered that it returns wrong data. So I'm reposting the question here, with an answer I received, as well I will explain the problem why it is not working for me.
Example of data:
id | item_id | user_id | bid_price
----------------------------------
1 | 1 | 11 | 1
2 | 1 | 12 | 2
3 | 1 | 13 | 3
4 | 1 | 14 | 1
5 | 1 | 15 | 4
6 | 2 | 16 | 2
7 | 2 | 17 | 1
8 | 3 | 18 | 2
9 | 3 | 19 | 3
10 | 3 | 18 | 2
Expected result:
id | item_id | user_id | bid_price
----------------------------------
1 | 1 | 11 | 1
7 | 2 | 17 | 1
8 | 3 | 18 | 2
Offered solution:
select m.id, m.item_id, m.user_id, m.bid_price
from my_table m
inner join (
select item_id, min(id) min_id, min(bid_price) min_price
from my_table
where item_id IN (1,2,3)
group by item_id
) t on t.item_id = m.item_id
and t.min_price= m.bid_price
and t.min_id = m.id
The problem:
In the sub query the minimum ID is selected entire the group by (item_id) statement and doesn't reflects according to minimum bid_price.
In other words, the minimum id is selected not depending on the price field at all. So, in the result I will get minimum price and minimum id of the group, but this will not be the same row! The id can be related to the row with another bet_price value.
How this query can be adjusted? Thank you in advance!
SELECT min(m.id) AS id, m.item_id, m.user_id, m.bid_price
FROM my_table m
INNER JOIN (
SELECT item_id, min(bid_price) AS min_price
FROM my_table
GROUP BY item_id
) t ON t.item_id = m.item_id
AND t.min_price= m.bid_price
GROUP BY item_id
Output
id item_id user_id bid_price
1 1 11 1
7 2 17 1
8 3 18 2
Live Demo
http://sqlfiddle.com/#!9/a52dc6/13
SELECT DISTINCT
t1.item_id,
t1.bid_price
FROM tab1 t1
WHERE NOT exists(SELECT 1
FROM tab1 t2
WHERE t2.item_id = t1.item_id
AND t2.bid_price < t1.bid_price)
AND t1.item_id IN (1, 2, 3);
http://sqlfiddle.com/#!9/615e0a/5

Latest datetime from unique mysql index

I have a table. It has a pk of id and an index of [service, check, datetime].
id service check datetime score
---|-------|-------|----------|-----
1 | 1 | 4 |4/03/2009 | 399
2 | 2 | 4 |4/03/2009 | 522
3 | 1 | 5 |4/03/2009 | 244
4 | 2 | 5 |4/03/2009 | 555
5 | 1 | 4 |4/04/2009 | 111
6 | 2 | 4 |4/04/2009 | 322
7 | 1 | 5 |4/05/2009 | 455
8 | 2 | 5 |4/05/2009 | 675
Given a service 2 I need to select the rows for each unique check where it has the max date. So my result would look like this table.
id service check datetime score
---|-------|-------|----------|-----
6 | 2 | 4 |4/04/2009 | 322
8 | 2 | 5 |4/05/2009 | 675
Is there a short query for this? The best I have is this, but it returns too many checks. I just need the unique checks at it's latest datetime.
SELECT * FROM table where service=?;
First you need find out the biggest date for each check
SELECT `check`, MAX(`datetime`)
FROM YourTable
WHERE `service` = 2
GROUP BY `check`
Then join back to get the rest of the data.
SELECT Y.*
FROM YourTable Y
JOIN ( SELECT `check`, MAX(`datetime`) as m_date
FROM YourTable
WHERE `service` = 2
GROUP BY check) as `filter`
ON Y.`service` = `filter`.service
AND Y.`datetime` = `fiter`.m_date
WHERE Y.`service` = 2

SQL select rows which are identical in two values in a way that retains Edit features in output

Apologies if the answer is dead obvious but in spite of a lot of research and trying out different commands, the solution escapes me (I'm more of a lexicographer than a dev).
We have a table which for various reasons has ended up with some rows which have duplicated values in critical cells. A mockup looks like this:
Unique_ID | E_ID | Date | User_ID | V_value
1 | 500 | 2012-05-12 | 23 | 3
2 | 501 | 2012-05-12 | 23 | 3
3 | 501 | 2012-05-13 | 23 | 1
4 | 502 | 2012-05-13 | 23 | 2
5 | 503 | 2012-05-12 | 23 | 2
6 | 7721 | 2012-05-22 | 8845 | 3
7 | 7722 | 2012-05-22 | 8845 | 3
8 | 7722 | 2012-05-22 | 8845 | 3
9 | 7723 | 2012-05-22 | 8845 | 3
So the rows I need as output are Unique_ID 2 & 3 and 7 & 8 as they are identical as regards the E_ID and User_ID field. The values of the other fields are not relevant to our problem. So what I want is this, ideally:
Unique_ID | E_ID | Date | User_ID | V_value
2 | 501 | 2012-05-12 | 23 | 3
3 | 501 | 2012-05-13 | 23 | 1
7 | 7722 | 2012-05-22 | 8845 | 3
8 | 7722 | 2012-05-22 | 8845 | 3
For reasons to do with the data, I need the output to appear with the Edit features (in particular the tick-box or at least the Delete feature) because I need to go through the table manually and discard one or the other duplicate based on decisions/conditions that can't be determined with SQL commands.
The closest I have come is this:
SELECT *
FROM ( SELECT E_ID, User_ID, COUNT(Unique_ID)
AS V_Count
FROM TableName
GROUP BY E_ID, User_ID
ORDER BY E_ID )
AS X
WHERE V_Count > 1
ORDER BY User_ID ASC, E_ID ASC
which does give me the rows with the duplications but because I'm creating the V_Count column to give me the duplicates:
E_ID | User_ID | V_Count
501 | 23 | 2
7722 | 8845 | 2
the output does not give me the Delete option I need - it says it's because there is no unique ID and I get that, as it puts them together in the same row. Is there a way to do this without losing the Unique_ID so I don't lose the Delete function?
You can use aggregation to check for a given user_id and e_id if there are more than one rows. Then join it with your table to get all the columns in the result.
select t1.*
from tablename t1
join (
select e_id,
user_id
from tablename
group by e_id,
user_id
having count(*) > 1
) t2
on t1.e_id = t2.e_id
and t1.user_id = t2.user_id
Which can be more cleanly expressed using the USING clause as:
select *
from tablename t1
join (
select e_id,
user_id
from tablename
group by e_id,
user_id
having count(*) > 1
) t2 using (e_id, user_id)
A sort-of simple method uses exists:
select t.*
from tablename t
where exists (select 1
from tablename t2
where t2.e_id = t.e_id and t2.date = t.date and
t2.user_id = t.user_id and t2.v_value = t.v_value and
t2.unique_id <> t.unique_id
);
An alternative way that puts each combination on a single row with all the ids is:
select e_id, date, user_id, v_value,
group_concat(unique_id) as unique_ids
from tablename t
group by e_id, date, user_id, v_value
having count(*) > 1;

Find columns which come in pair with multiple/various other-column values

I have a points table, where important columns are:
id userid orderid
1 10 150
2 10 150
3 15 151
4 12 152
5 11 152
I need to find all orderid which have multiple/various userid. The result would be:
id userid orderid
4 12 152
5 11 152
I can do it in PHP, but I hope someone have time to help me with mysql query. What I have tried so far is probably irrelevant.
Use COUNT(DISTINCT) and HAVING to find orderid with multiple various userid.
SqlFiddleDemo
SELECT t.*
FROM tab t
JOIN (SELECT orderid, COUNT(DISTINCT userid)
FROM tab
GROUP BY orderId
HAVING COUNT(DISTINCT userid) > 1) AS sub
ON t.orderid = sub.orderid
ORDER BY t.id
If you want to get just the rows that have same orderid but different userid, use this:
SELECT P1.* FROM points P1
INNER JOIN points P2
ON P1.orderid = P2.orderid and P1.id != P2.id and P1.userid != p2.userid;
Note that this first select returns what you expect in your question:
+----+--------+---------+
| id | userid | orderid |
+----+--------+---------+
| 4 | 12 | 152 |
| 5 | 11 | 152 |
+----+--------+---------+
Now, if you want to return ANY orderid that is the same, regardless of userid, use this:
SELECT P1.* FROM points P1
INNER JOIN points P2
ON P1.orderid = P2.orderid and P1.id != P2.id;
In this case, it won't exclude the result with same id, returning
+----+--------+---------+
| id | userid | orderid |
+----+--------+---------+
| 1 | 10 | 150 |
| 2 | 10 | 150 |
| 4 | 12 | 152 |
| 5 | 11 | 152 |
+----+--------+---------+

Remove duplicates from one column keeping whole rows

id | userid | total_points_spent
1 | 1 | 10
2 | 2 | 15
3 | 2 | 50
4 | 3 | 5
5 | 1 | 15
With the above table, I would first like to remove duplicates of userid keeping the rows with the largest total_points_spent, like so:
id | userid | total_points_spent
3 | 2 | 50
4 | 3 | 5
5 | 1 | 15
And then I would like to sum the values of total_points_spent, which would be the easy part, resulting in 70.
I am not really sure the "remove" you meant is to delete or to select. Here is the query for select only max totalpointspend record respectively.
SELECT tblA.*
FROM ( SELECT userid, MAX(totalpointspend) AS maxtotal
FROM tblA
GROUP BY userid ) AS dt
INNER JOIN tblA
ON tblA.userid = dt.userid
AND tblA.totalpointspend = dt.maxtotal
ORDER BY tblA.userid