Remove Duplicates from MYsql table using MinID and Complex Select - mysql

I found this here to delete records with min ID:
DELETE FROM Table WHERE id NOT IN (SELECT MIN(id) FROM Table GROUP BY FieldA)
However I don't want all the found dupes in the table to have the one with the lower ID removed, only a subset of them. I have other criteria for other dupes patterns. So I made my select to get the subset of records that have dupes AND other conditions where I then DO want the min ids removed:
Select min(Z),Max(Z),count(*) from Table
group by P,N
having count(*)>1 and Min(Z)!=Max(Z) and Min(Z)>0
I am unclear how to first get that subset of records and THEN remove the minID from the dupes in that subset

In MySQL use LEFT JOIN:
delete t
from table t left join
(Select min(Z), Max(Z), count(*)
from Table
group by P, N
having count(*) > 1 and Min(Z) <> Max(Z) and Min(Z) > 0
) tt
on t.? = tt.?
where tt.? is null;
It is unclear how the id is defined in your expression. It is also unclear whether the subquery generates the ids to keep or to delete. The version above assumes it is generating the ids to keep. (If it generates the ids to delete, then use inner join and get rid of the where clause.)

Related

In query with joins and multi-table/field ORDER BY, how to set LIMIT offset to start from a particular row identified by a unique id field?

Suppose I have four tables: tbl1 ... tbl4. Each has a unique numerical id field. tbl1, tbl2 and tbl3 each has a foreign key field for the next table in the sequence. E.g. tbl1 has a tbl2_id foreign key field, and so on. Each table also has a field order (and other fields not relevant to the question).
It is straightforward to join all four tables to return all rows of tbl1 together with corresponding fields from the other three fields. It is also easy to order this result set by a specific ORDER BY combination of the order fields. It is also easy to return just the row that corresponds to some particular id in tbl1, e.g. WHERE tbl1.id = 7777.
QUESTION: what query most efficiently returns (e.g.) 100 rows, starting from the row corresponding to id=7777, in the order determined by the specific combination of order fields?
Using ROW_NUMBER or (an emulation of it in MySQL version < 8) to get the position of the id=7777 row, and then using that in a new version of the same query to set the offset in the LIMIT clause would be one approach. (With a read lock in between.) But can it be done in a single query?
# FIRST QUERY: get row number of result row where tbl1.id = 7777
SELECT x.row_number
FROM
(SELECT #row_number:=#row_number+1 AS row_number, tbl1.id AS id
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <some conditions>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
WHERE id=7777;
Store the row number from the above query and use it to bind :offset in the following query.
# SECOND QUERY : Get 100 rows starting from the one with id=7777
SELECT x.field1, x.field2, <etc.>
FROM
(SELECT #row_number:=#row_number+1 AS row_number, field1, field2
FROM (SELECT #row_number:=0) AS t, tbl1
INNER JOIN tbl2 ON tbl2.id = tbl1.tbl2_id
INNER JOIN tbl3 ON tbl3.id = tbl2.tbl3_id
INNER JOIN tbl4 ON tbl4.id = tbl3.tbl4_id
WHERE <same conditions as before>
ORDER BY tbl4.order, tbl3.order, tbl2.order, tbl1.order
) AS x
LIMIT :offset, 100;
Clarify question
In the general case, you won't ask for WHERE id1 > 7777. Instead, you have a tuple of (11,22,33,44) and you want to "continue where you left off".
Two discussions, with
That is messy, but not impossible. See Iterating through a compound key . Ig gives an example of doing it with 2 columns; 4 columns coming from 4 tables is an extension of such.
A variation
Here is another discussion of such: https://dba.stackexchange.com/questions/164428/should-i-store-data-pre-ordered-rather-than-ordering-on-the-fly/164755#164755
In actually implementing such, I have found that letting the "100" (LIMIT) be flexible can be easier to think through. The idea is: reach forward 100 rows (with LIMIT 100,1). Let's say you get (111,222,333,444). If you are currently at (111, ...), then deal with id2/3/4. If it is, say, (113, ...), then do WHERE id1 < 113 and leave off any specification of id2/3/4. This means fetching less than 100 rows, but it lands you just shy of starting id1=113.
That is, it involves constructing a WHERE clause with between 1 and 4 conditions.
In all cases, your query says ORDER BY id1, id2, id3, id4. And the only use for LIMIT is in the probe to figure out how far ahead the 100th row is (with LIMIT 100,1).
I think I can dig out some old Perl code for that.

Delete a MySQL selection

I would like to delete my MySQL selection.
Here is my MySQL selection request:
SELECT *
FROM Items
WHERE id_user=1
ORDER
BY id_user
LIMIT 2,1
With this working request, I select the third item on my table which has as id_user: 1.
Now, I would like to delete the item that has been selected by my request.
I am looking for a same meaning request which would look like this :
DELETE FROM Items (
SELECT * FROM Items WHERE id_user=1 ORDER BY id_user LIMIT 2,1
)
The first thing to note is that there is an issue with your query. You are filtering on a unique value of id_user and sorting on the same column. As all records in the resultset will have the same id_user, the actual order of the resultset is undefined, and we cannot reliably tell which record comes third.
Assuming that you have another column to disanbiguate the resultset (ie some value that is unique amongst each group of records having the same id_user), say id, here is a solution to your question, that uses a self-join with ROW_NUMBER() to locate the third record in each group.
DELETE i
FROM items i
INNER JOIN (
SELECT
id,
id_user,
ROW_NUMBER() OVER(PARTITION BY id_user ORDER BY id) rn
FROM items
) c ON c.id = i.id AND c.id_user = i.id_user AND c.rn = 3
WHERE i.id_user=1 ;
Demo on DB Fiddle
You didn't provide the definition of your table. I guess it has a primary key column called id.
In that case you can use this
CREATE TEMPORARY TABLE doomed_ids
SELECT id FROM Items WHERE id_user = 1 ORDER BY id_user LIMIT 2,1;
DELETE FROM Items
WHERE id IN ( SELECT id FROM doomed_ids);
DROP TABLE doomed_ids;
It's a pain in the neck, but it works around the limitation of MySQL and MariaDB disallowing LIMITs in ... IN (SELECT ...) clauses.
You can use the select query to create a derived table and join it back to your main table to determine which record(s) to delete. Derived tables can use the limit clause.
Assuming that the PK is called id, the query would look as follows:
delete i from items i
inner join (SELECT id FROM Items
WHERE id_user=1
ORDER BY id_user LIMIT 2,1) i2 on i.id=i2.id
You need to substitute your PK in place of id. If you have a multi-column PK, then you need to select all the PK fields in the derived table and join on all of them.

Deleting duplicate rows on MySQL, getting a max row error

I am deleting duplicate rows on MySQL and only leaving behind the old row (least id) but I am getting a max row error
DELETE n1
FROM item_audit n1, item_audit n2
WHERE n1.id > n2.id AND n1.description = n2.description
Keep in mind, with that join condition you are joining each row to every row before it (with the same description). This is one of those cases where a subquery will be much more effective than a join.
DELETE a
FROM item_audit a
WHERE (a.id, a.description) NOT IN
(SELECT * FROM
(
SELECT MIN(id), description
FROM item_audit
GROUP BY description
) AS realSubQ
)
Actually, assuming id is unique, it can even be simplier:
DELETE a
FROM item_audit a
WHERE a.id NOT IN
(SELECT * FROM
( SELECT MIN(id)
FROM item_audit
GROUP BY description
) AS realSubQ
)
As you discovered, MySQL needs to be "tricked" into being able to use the delete target in a subquery with the extra select * wrapper.
Alternatively, a join on the subquery could be used to reduce the size of the intermediate result set created behind the scenes.
DELETE a
FROM item_audit a
LEFT JOIN (SELECT MIN(id) AS firstId FROM item_audit GROUP BY description) AS aFirst
ON a.id = aFirst.firstId
WHERE aFirst.firstId IS NULL
;
If that fails, you can insert the first id's into a temp table, and should be able to do subquery version with that.
CREATE TEMPORARY TABLE `old_ids`
SELECT MIN(ID) AS id
FROM item_audit
GROUP BY description;
DELETE a
FROM item_audit a
LEFT JOIN old_ids ON a.id = old_ids.id
WHERE old_ids.id IS NULL
;
In any of these cases, a LIMIT clause can be placed very last to accomplish an incremental delete. The last, temp table, version has the benefit that the subquery will not need re-evaluated after every incremental delete (and the temporary table can be indexed to speed things up as well).

Proper way to use MySQL GROUP BY for returning one result from a referenced table

I often have a situation with two tables in MySQL where I need one record for each foreign key. For example:
table post {id, ...}
table comment {id, post_id, ...}
SELECT * FROM comment GROUP BY post_id ORDER BY id ASC
-- Oldest comment for each post
or
table client {id, ...}
table payment {id, client_id, ...}
SELECT * FROM payment GROUP BY client_id ORDER BY id DESC
-- Most recent payment from each client
These queries often fail because the "SELECT list is not in GROUP BY clause" and contains nonaggregated columns.
Failed Solutions
I can usually work around this with a min()/max() but that creates a very slow query with mis-matched results (row with min(id) isn't equal to row with min(textfield))
SELECT min(id), min(textfield), ... FROM table GROUP BY fk_id
Adding all the columns to GROUP BY results in duplicate records (from the fk_id) which defeats the purpose of GROUP BY.
SELECT id, textfield, ... FROM table GROUP BY fk_id, id, textfield
Same idea as #GurV but using a join instead of a correlated subquery. The basic idea here is that the subquery finds, for each post which has comments, the oldest post and its corresponding id in the comments table. We then join back to comments again to restrict to the records we want.
SELECT t1.*
FROM comments t1
INNER JOIN
(
SELECT post_id, MIN(id) AS min_id
FROM comments
GROUP BY post_id
) t2
ON t1.post_id = t2.post_id AND
t1.id = t2.min_id
You can use a correlated query with aggregation to find out the earliest comment for each post:
select *
from comments c1
where id = (
select min(id)
from comments c2
where c1.post_id = c2.post_id
)
Compound index - comments(id, post_id) should be helpful.
If you are querying the whole table with many rows, then it will. This query is more useful and performant if you are querying for a small subset of posts. If you are querying the whole table, then #Tim's answer is better suited I think.

select A, B , C group by B with A from the row that has the highest C

I have collected informations from different sources about certain IDs that should match a single name. Some sources are more trustworthy than others in giving the correct name for a given ID.
I created a table (name, id, source_trustworthiness) and I want to get the most trustworthy name for each ID.
I tried
SELECT name, id, MAX( source_trustworthiness )
FROM table
GROUP BY id
this returns th highest trustworthiness available for each ID but with the first name it finds, regarless of its trustworthiness.
Is there a way I can get that right ?
Mysql has special functionality to help:
SELECT * FROM (
SELECT name, id, source_trustworthiness
FROM table
ORDER BY 3 DESC ) x
GROUP BY id
Although this wouldn't even execute in other databases (not naming all non-aggregate columns in the GROUP BY clause), with mysql it returns the first row encountered for each unique value of the grouped by columns. By ordering the rows greatest first, the first row for each id will be the most trustworthy.
Since this question is tagged mysql, this query is OK. Not only is it really simple, it's also quite fast.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT id, MAX(source_trustworthiness) max_val
FROM TableName
GROUP BY ID
) b ON a.ID = b.ID AND
a.source_trustworthiness = b.max_val