Possible speed up to delete via temp table in mysql - mysql

I am running a delete which removes all of the duplicates within a table. A duplicate is defined as a row where the tag_id, user_id, and is_self are all the same. My technique here is pretty standard, to preform this delete, since the tags_users table itself needs to be referenced to know if a duplicate exists a temp table is created so that a delete can be preformed from the same table that is being referenced. The problem is that this table is about a million rows so this query takes about an hour to run. I know this is related to the slow speed of defining this temp table and then referencing it as it is un-indexed.
DELETE FROM tags_users WHERE id IN (
SELECT id FROM (
SELECT A.id FROM tags_users as A, tags_users as B WHERE A.id > B.id AND A.user_id = B.user_id AND A.tag_id = B.tag_id AND A.is_self = B.is_self GROUP BY A.id
) temp_dup_delete
);
I have reviewed the explain from this query listed here (Please note I'm on mysql 5.5 so I'm using EXPLAIN SELECT 1 to simulate EXPLAIN DELETE). I think the best possible solution to this is to define an index on the temp table, but I cannot figure out how to do this yet. The crux of my question here is: is there a way to improve the speed of this query considering the way it defines a temp table. Thank you to anyone that can help.

Here is an alternative approach. Use an aggregation query to find the minimum id for each set of key values -- this seems to be the row you want to keep.
Then, use left outer join to match to this table and delete all the rows in the original data that do not match.
delete tu
from tags_users tu left outer join
(select tag_id, user_id, is_self, min(id) as minid
from tags_users
group by tag_id, user_id, is_self
) tui
on tui.id = tu.id
where tui.id is null;

Related

MySQL: delete query taking too long

I am trying to delete records from table with duplicate column values but it's taking forever. Basically it gets stuck and no response for hours. I have a significantly large table with over 1.3M records. Is the query inefficient? any wat to optimize it?
delete n1 from ids n1, ids n2 where n1.id > n2.id and n1.user_id = n2.user_id
Database is remote, and am using putty to run queries.
Add an index:
ALTER TABLE ids ADD INDEX (user_id, id);
This makes it efficient to find all the rows with the same user ID and higher IDs.
It will also help to join with a subquery.
DELETE n1
FROM ids AS n1
JOIN (SELECT user_id, MIN(id) AS minid
FROM ids
GROUP BY user_id) AS n2
ON n1.user_id = n2.user_id AND n1.id > n2.minid
This will still be faster with the above index.
yes, that query is very inefficient. Even if you used explicit joins you need to keep in mind that basically every row "N" is being matched up with every row before "N", and every row "N-1" is being matched up with the rows before it.
Try something like this:
DROP TEMPORARY TABLE IF EXISTS keeps;
CREATE TEMPORARY TABLE keeps (
user_id INT,
keepID INT,
INDEX (user_id, keepID)
)
INSERT INTO keeps (user_id, keepID)
SELECT user_id, MIN(id) As keepID
FROM ids
GROUP BY user_id;
DELETE FROM ids WHERE (user_id, id) NOT IN (SELECT user_id, keepID FROM keeps);
DROP TEMPORARY TABLE IF EXISTS keeps;
I'm also tempted to suggest trying something like the below, but I can't remember if MySQL allows subquerying the delete table in the delete query ... which is why I suggested the temp table in the first one.
DELETE a
FROM ids AS a
WHERE EXISTS (
SELECT *
FROM ids AS b
WHERE b.id < a.id
AND b.user_id = a.user_id
)

Deleting duplicate rows on MySQL, getting a max row error

I am deleting duplicate rows on MySQL and only leaving behind the old row (least id) but I am getting a max row error
DELETE n1
FROM item_audit n1, item_audit n2
WHERE n1.id > n2.id AND n1.description = n2.description
Keep in mind, with that join condition you are joining each row to every row before it (with the same description). This is one of those cases where a subquery will be much more effective than a join.
DELETE a
FROM item_audit a
WHERE (a.id, a.description) NOT IN
(SELECT * FROM
(
SELECT MIN(id), description
FROM item_audit
GROUP BY description
) AS realSubQ
)
Actually, assuming id is unique, it can even be simplier:
DELETE a
FROM item_audit a
WHERE a.id NOT IN
(SELECT * FROM
( SELECT MIN(id)
FROM item_audit
GROUP BY description
) AS realSubQ
)
As you discovered, MySQL needs to be "tricked" into being able to use the delete target in a subquery with the extra select * wrapper.
Alternatively, a join on the subquery could be used to reduce the size of the intermediate result set created behind the scenes.
DELETE a
FROM item_audit a
LEFT JOIN (SELECT MIN(id) AS firstId FROM item_audit GROUP BY description) AS aFirst
ON a.id = aFirst.firstId
WHERE aFirst.firstId IS NULL
;
If that fails, you can insert the first id's into a temp table, and should be able to do subquery version with that.
CREATE TEMPORARY TABLE `old_ids`
SELECT MIN(ID) AS id
FROM item_audit
GROUP BY description;
DELETE a
FROM item_audit a
LEFT JOIN old_ids ON a.id = old_ids.id
WHERE old_ids.id IS NULL
;
In any of these cases, a LIMIT clause can be placed very last to accomplish an incremental delete. The last, temp table, version has the benefit that the subquery will not need re-evaluated after every incremental delete (and the temporary table can be indexed to speed things up as well).

Update a field in table A with the max value of field in table B

I want to update a field, let's call it 'field_A', in table 'table_A' with the maximum value that exists of field 'field_B' in 'table_B', but only IF there is a max value for that field 'field_B' in table 'table_B'.
Table 'table_B' has a 'reference' field which contains the 'id' of the table_A record we want to update.
Now I have the following query, which works perfectly.
UPDATE table_A a SET a.field_A = (SELECT MAX(b.field_B)
FROM table_B WHERE b.reference = a.id)
WHERE a.id IN (
SELECT reference
FROM table_B
GROUP BY reference
HAVING COUNT(reference) > 0
)
So it only updates field_A IF there are records found for that reference because I don't want to end up setting fields 'field_A' to zero when no related records were found.
As I said before, this query already works perfectly, but now I have to run a query for table_B two times, which seems a little bit inefficient and it is probably possible to do it with only 1 join statement but I can't seem to tackle the issue.
Since this query has to cross reference a lot, really a lot, of records, performance is really an issue here.
With these two nested statements, your UPDATE statements seems quite
inefficient to me. Try this SQL statement below, that should do the job.
SQL Fiddle here: http://sqlfiddle.com/#!2/5825a/2
UPDATE table_A a1
JOIN
(
SELECT a.id as id, max(b.field_B) as max_val
FROM
table_A a
LEFT JOIN table_B b ON a.id = b.reference
GROUP BY a.id
) t on a1.id = t.id
SET
a1.field_A = t.max_val
WHERE
(t.max_val IS NOT NULL)

Eliminating duplicates from SQL query

What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name
there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4
Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.

MYSQL delete all results having count(*)=1

I have a table taged with two fields sesskey (varchar32 , index) and products (int11), now I have to delete all rows that having group by sesskey count(*) = 1.
I'm trying a fews methods but all fails.
Example:
delete from taged where sesskey in (select sesskey from taged group by sesskey having count(*) = 1)
The sesskey field could not be a primary key because its repeated.
DELETE si
FROM t_session si
JOIN (
SELECT sesskey
FROM t_session so
GROUP BY
sesskey
HAVING COUNT(*) = 1
) q
ON q.sesskey = si.sesskey
You need to have a join here. Using a correlated subquery won't work.
See this article in my blog for more detail:
Keeping rows
Or if you're using an older (pre 4.1) version of MySQL and don't have access to subqueries you need to select your data into a table, then join that table with the original:
CREATE TABLE delete_me_table (sesskey varchar32, cur_total int);
INSERT INTO delete_me_table SELECT sesskey, count(*) as cur_total FROM orig_table
WHERE cur_total = 1 GROUP BY sesskey;
DELETE FROM orig_table INNER JOIN delete_me_table USING (sesskey);
Now you have a table left over named delete_me_table which contains a history of all the rows you deleted. You can use this for archiving, trending, other fun and unusual things to surprise yourself with.
The SubQuery should work
Delete from taged
Where sesskey in
(Select sesskey
From taged
Group by sesskey
Having count(*) = 1)
EDIT: Thanks to #Quassnoi comment below... The above will NOT work in MySql, as MySql restricts referencing the table being updated or deleted from, in a Subquery i you must do the same thing using a Join ...