Opening Same Table Twice In Update - mysql

I'm trying to normalise a large dataset, I've built a table with all the relationships, called App(earances). Then I loop through another table to build a temp table which contains the duplicates, with MasterID being the one I want to keep.
The data in the duplicate table looks like this:
I then try to update the app table, swapping any duplicate id's for the corresponding master id, but I'm getting the Error: Can't reopen table: 'd'.
Here is the code:
DROP TABLE IF EXISTS Duplicates;
CREATE TEMPORARY TABLE Duplicates (
MasterID int NOT NULL,
DuplicateID int NOT NULL
);
INSERT INTO Duplicates(MasterID, DuplicateID)
SELECT p1.PlayerID as MasterID, p2.PlayerID as DuplicateID
FROM Player p1
LEFT JOIN Player p2 on p1.Name = p2.Name
WHERE p1.name = p2.name
AND p1.PlayerID < p2.PlayerID
ORDER BY p1.PlayerID;
UPDATE app a
SET a.PlayerID = ( SELECT d.MasterID FROM Duplicates d WHERE a.PlayerID = d.DuplicateID LIMIT 1 )
WHERE a.PlayerID in (SELECT d.DuplicateID FROM Duplicates d);
DELETE Player p
WHERE PlayerID = ( SELECT d.DuplicateID FROM Duplicates d )
DROP TABLE Duplicates;
The problem is with the update query, I've put the other queries in so you can get a better idea of what's going on, I think a CTE would be better here but I don't know how I could do it. I'm running this in MYSQL at the moment but I could use another SQL variant.
Thanks for the help

One method uses join:
UPDATE app a JOIN
Duplicates d
ON a.PlayerID = d.DuplicateID
SET a.PlayerID = d.MasterID;
a will get set from an arbitrary row in d, if there are multiple matches in d for a given a.
I suppose it is not a great idea to have multiple possible rows update a single row, so you could aggregate before the join:
UPDATE app a JOIN
(SELECT d.DuplicateID, MAX(d.MasterId) as MasterId
FROM Duplicates d
GROUP BY d.DuplicateID
) d
ON a.PlayerID = d.DuplicateID
SET a.PlayerID = d.MasterID;

Related

MySQL select max record from each group and insert into another table

There are 4 columns in table A, id, name, create_time and content.
create table A
(
id int primary key,
name varchar(20),
create_time datetime,
content varchar(4000)
);
create table B like A;
I want to select max create_time records in the same name, and insert into another table B.
Execute sql as follow, but the time consumption is unacceptable.
insert into B
select A.*
from A,
(select name, max(create_time) create_time from B group by name) tmp
where A.name = tmp.name
and A.create_time = tmp.create_time;
A table has 1000W rows and 10GB, execute sql spend 200s.
Is there any way to do this job faster, or change which parameters in MySQL Server to run faster.
p:
table A can be any type, paration table or some else.
First be sure you have proper index on A (name, create_time) and B (name, create_time)
then try using explicit join and on condtion
insert into B
select A.*
from A
inner join (
select name, max(create_time) create_time
from B
group by name) tmp on ( A.name = tmp.name and A.create_time = tmp.create_time)
The query you need is:
INSERT INTO B
SELECT m.*
FROM A m # m from "max"
LEFT JOIN A l # l from "later"
ON m.name = l.name # the same name
AND m.create_time < l.create_time # "l" was created later than "m"
WHERE l.name IS NULL # there is no "later"
How it works:
It joins A aliased as m (from "max") against itself aliased as l (from "later" than "max"). The LEFT JOIN ensures that, in the absence of a WHERE clause, all the rows from m are present in the result set. Each row from m is combined with all rows from l that have the same name (m.name = l.name) and are created after the row from m (m.create_time < l.create_time). The WHERE condition keeps into the results set only the rows from m that do not have any match in l (there is no record with the same name and greater creation time).
Discussion
If there are more than one rows in A that have the same name and creation_time, the query returns all of them. In order to keep only one of them and additional condition is required.
Add:
OR (m.create_time = l.create_time AND m.id < l.id)
to the ON clause (right before WHERE). Adjust/replace the m.id < l.id part of the condition to suit your needs (this version favors the rows inserted earlier in the table).
Make sure the table A has indexes on the columns used by the query (name and create_time). Otherwise the performance improvement compared with your original query is not significant.

MySQL: delete rows with "WHERE ... NOT IN" from only one single table

In a MySQL database I have a many-to-many relationship between two tables. For the sake of simplicity let's assume those tables map homes and their residents. I have a third table to map those relations (home_resident_relations). The latter table has an additional column datemodified that stores the date of the latest update of each row via triggers.
Now I want to get rid of all former residents for each home and only keep the current ones - that is those with the newest date.
I have already a working SELECT clause that will list me all old relations I want to delete:
SELECT * FROM `home_resident_relations` WHERE `resident_id` NOT IN
(SELECT tbl.`resident_id`
FROM `home_resident_relations` tbl
WHERE tbl.`datemodified` =
(SELECT max(tbl2.`datemodified`)
FROM `home_resident_relations` tbl2
WHERE tbl2.`home` = tbl.`home`
GROUP BY tbl2.`home`)
OR tbl.`datemodified` IS NULL
);
Now it would be a straight-forward idea to simply replace the SELECT * with a DELETE command to remove all those rows. However, this does not work due to error
#1093 - You can't specify target table 'home_resident_relations' for update in FROM clause
So here's my question:
How do I delete from a table while using it in the WHERE ... NOT IN clause?
Use a left join instead:
DELETE hrr
FROM `home_resident_relations` hrr LEFT JOIN
(SELECT tbl.`resident_id`
FROM `home_resident_relations` tbl
WHERE tbl.`datemodified` = (SELECT max(tbl2.`datemodified`)
FROM `home_resident_relations` tbl2
WHERE tbl2.`home` = tbl.`home`
GROUP BY tbl2.`home`
) OR
tbl.`datemodified` IS NULL
) tt
ON hrd.resident_id = tt.resident_id
WHERE tt.resident_id IS NULL;
This works for both the SELECT and DELETE.
Try using DELETE with join:
DELETE FROM `home_resident_relations`
LEFT OUTER JOIN
(SELECT tbl.`resident_id`
FROM `home_resident_relations` tbl
WHERE tbl.`datemodified` =
(SELECT max(tbl2.`datemodified`)
FROM `home_resident_relations` tbl2
WHERE tbl2.`home` = tbl.`home` )
OR tbl.`datemodified` IS NULL) s
ON(s.`resident_id` = `home_resident_relations`.`resident_id`)
WHERE s.`resident_id` is null

Eliminating duplicates from SQL query

What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name
there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4
Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.

MYSQL delete all results having count(*)=1

I have a table taged with two fields sesskey (varchar32 , index) and products (int11), now I have to delete all rows that having group by sesskey count(*) = 1.
I'm trying a fews methods but all fails.
Example:
delete from taged where sesskey in (select sesskey from taged group by sesskey having count(*) = 1)
The sesskey field could not be a primary key because its repeated.
DELETE si
FROM t_session si
JOIN (
SELECT sesskey
FROM t_session so
GROUP BY
sesskey
HAVING COUNT(*) = 1
) q
ON q.sesskey = si.sesskey
You need to have a join here. Using a correlated subquery won't work.
See this article in my blog for more detail:
Keeping rows
Or if you're using an older (pre 4.1) version of MySQL and don't have access to subqueries you need to select your data into a table, then join that table with the original:
CREATE TABLE delete_me_table (sesskey varchar32, cur_total int);
INSERT INTO delete_me_table SELECT sesskey, count(*) as cur_total FROM orig_table
WHERE cur_total = 1 GROUP BY sesskey;
DELETE FROM orig_table INNER JOIN delete_me_table USING (sesskey);
Now you have a table left over named delete_me_table which contains a history of all the rows you deleted. You can use this for archiving, trending, other fun and unusual things to surprise yourself with.
The SubQuery should work
Delete from taged
Where sesskey in
(Select sesskey
From taged
Group by sesskey
Having count(*) = 1)
EDIT: Thanks to #Quassnoi comment below... The above will NOT work in MySql, as MySql restricts referencing the table being updated or deleted from, in a Subquery i you must do the same thing using a Join ...

Merge and then Delete duplicate entries

I have mySQL database with some duplicate entries. They have the same field - phone. But they also had fields which differs. At example I have two entries with same phone, but first entry has rating filed = default_value and second entry has rating field = 5.
So I must merge this entries and only then delete duplicates...
More common example:
entry1.phone==123
entry1.phone==etry2.phone
entry1.rating!=entry2.phone
entry1.rating==default_value(0)
entry2.rating==5
merge
entry1.phone==123
entry1.rating==5
entry2 is deleted
I don't think you can do this in SQL efficiently. One slow way to do it is something like:
CREATE TEMPORARY TABLE tmp_table (...);
INSERT INTO tmp_table SELECT phone, max(rating) FROM table GROUP BY phone;
TRUNCATE table;
INSERT INTO table SELECT * FROM tmp_table;
A better way would be a stored procedure or an external script. Select all rows from the table ordered by phone and do the grouping/merging/deleting manually (iterate over the results, compare to the phone value from the previous row, if it's different you have a new group, etc.). Writing stored procedures in MySQL is painful though, so I'm not going to write the code for you. :)
It sounds like you don't really need to merge any records if you are just trying to update the first record with the non-default rating. I think you can just delete any records with the default rating.
Select a.*
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating < b.Rating
Delete a
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating < b.Rating
If you truly have to update the first record and delete the second record, you can do something similar if you have an autoincrement ID. The next example is what I would do to update the first record if an ID exists. This is only reliable if you only have phone numbers duplicated one time.
Update a
Set a.Rating = b.Rating
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating < b.Rating
and a.ID < b.ID
Delete a
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating = b.Rating
and b.ID > a.ID
Hope this helps.
-Ranthalion