Deleting duplicate rows on MySQL (leaving at least one) - mysql

I have this table in MySQL:
id_word
lang_original (the language from the original word) VARCHAR(2)
lang_target (the language from the translated word) VARCHAR(2)
word (the word itself) VARCHAR(50)
translation (the translation) VARCHAR(50)
They should not have duplicates. Is it possible to have a sql query that finds duplicates and deletes them (leaving the first match undeleted)?
Update a duplicate would be something that has the same lang_original,lang_target and word (only those 3 fields).

It's simpler to create a new table. Previous answers are good, but I like it this way:
Create a new table with a unique key for "lang_original, lang_target, word"
CREATE TABLE new_table_can_be_renamed_later (
..your-fields...
UNIQUE unique (lang_original,lang_target,workd)
);
Then fill your new table with by selecting the old table and use IGNORE in your INSERT
INSERT IGNORE INTO new_table_can_be_renamed_later
SELECT * FROM original_table
Please consider Mysql docs for right syntax.

Could work like this:
DELETE FROM tbl
WHERE EXISTS (
SELECT *
FROM tbl t
WHERE (t.lang_original, t.lang_target, t.word)
= (tbl.lang_original, tbl.lang_target, tbl.word)
AND tbl.id_word > t.id_word
)
If #Jason is right, and MySQL does not allow to reference the delete table, here is another form that works independently:
DELETE FROM tbl
USING (
SELECT min(id_word) AS min_id, lang_original, lang_target, word
FROM tbl t
GROUP BY lang_original, lang_target, word
HAVING count(*) > 1
) x
WHERE (tbl.lang_original, tbl.lang_target, tbl.word)
= ( x.lang_original, x.lang_target, x.word)
AND tbl.id_word > x.min_id
Both variants leave the duplicate with the smallest id alive and kill the rest.
If you want to save all your translations to the word with the smallest id in a group of dupes first:
UPDATE tbl SET translation = all_trans
FROM (
SELECT min(id_word) AS min_id, group_concat(translation) AS all_trans
FROM tbl
GROUP BY lang_original, lang_target, word
HAVING count(*) > 1
) t
WHERE tbl.id_word = t.min_id

I'm not sure that you can do that.
You are probably better off doing something like
select distinct * into yournewtable from originaltable
That may work.

Related

What sql query to use for only deleting duplicate results for wp_comments table?

I need to finish the select query below. The query shows me the count of comments with the same comment_id.I just ultimately want to delete the duplicates and leave the non duplicates alone.This is a wordpress database
screenshot of my current query results
SELECT `comment_ID`, `comment_ID`, count(*) FROM `wp_comments` GROUP BY `comment_ID` HAVING COUNT(*) > 1 ORDER BY `count(*)` ASC
example of 2 entries I need to delete one
First back up your bad table in case you goof something up.
CREATE TABLE wp_commments_bad_backup SELECT * FROM wp_comments;
Do you actually have duplicate records here (duplicate in all columns) ? If so, try this
CREATE TABLE wp_comments_deduped SELECT DISTINCT * FROM wp_comments;
RENAME TABLE wp_comments TO wp_comments_not_deduped;
RENAME TABLE wp_comments_deduped TO wp_comments;
If they don't have exactly the same contents and you don't care which contents you keep from each pair of duplicate rows, try something like this:
CREATE TABLE wp_comments_deduped
SELECT comment_ID,
MAX(comment_post_ID) comment_post_ID,
MAX(comment_author) comment_author,
MAX(comment_author_email) comment_author_email,
MAX(comment_author_url) comment_author_url,
MAX(comment_author_IP) comment_author_IP,
MAX(comment_date) comment_date,
MAX(comment_date_gmt) comment_date_gmt,
MAX(comment_content) comment_content,
MAX(comment_karma) comment_karma,
MAX(comment_approved) comment_approved,
MAX(comment_agent) comment_agent,
MAX(comment_type) comment_type,
MAX(comment_parent) comment_parent,
MAX(user_id) user_id
FROM wp_comments
GROUP BY comment_ID;
RENAME TABLE wp_comments TO wp_comments_not_deduped;
RENAME TABLE wp_comments_deduped TO wp_comments;
Then you'll need to doublecheck whether your deduplicating worked:
SELECT comment_ID, COUNT(*) num FROM wp_comments GROUP BY comment_ID;
Then, once you're happy with it, put back WordPress's indexes.
Pro tip: Use a plugin like Duplicator when you migrate from one WordPress setup to another; its authors have sorted out all this data migration for you.
I would recommand add a unique key to the table make it auto incremental call it tempId , so you would be able to to distinguish between one duplicate set, use below query to remove duplicate copies and at the end remove that '`tempid' column:
DELETE FROM `wp_comments`
WHERE EXISTS (
SELECT `comment_ID` , MIN(`tempid`) AS `tempid`
FROM `wp_comments` as `dups`
GROUP BY `comment_ID`
HAVING
COUNT(*) > 1
AND `dups`.`comment_ID` = `wp_comments`.`comment_ID`
AND `dups`.`tempid` = `wp_comments`.`tempid`
)
I'm not clear on why there appear to be two different fields both named 'column_ID' from the same table, but I believe this will delete only the first of the two identical records. Before running a DELETE statement, however, be sure to make a backup of the original table.
DELETE
TOP 1 *
FROM
'wp_comments'
WHERE
comment_ID IN
(
SELECT
comment_ID,
r,
(comment_ID + '_' + r) AS unique
FROM
(
SELECT
`comment_ID`,
`comment_ID`,
RANK() OVER (PARTITION BY 'comment_id' ORDER BY 'comment_id') AS r
FROM
'wp_comments'
)
WHERE
r>1
)

How to delete rows from a table in database X, where the ID exists in Database Y

I've got 2 mysql 5.7 databases hosted on the same server (we're migrating from 1 structure to another)
I want to delete all the rows from database1.table_x where the there is a corresponding row in database2.table_y
The column which contains the data to match on is called code
I'm able to do a SELECT which returns everything that is expected - this is effectively the set of data I want to delete.
An example select would be:
SELECT *
FROM `database1`.`table_x`
WHERE `code` NOT IN (SELECT `code`
FROM `database2`.`table_y`);
This works and it returns 5 rows within 138ms.
--
However, If I change the SELECT to a DELETE e.g.
DELETE
FROM `database1`.`table_x`
WHERE `code` NOT IN (SELECT `code`
FROM `database2`.`table_y`);
The query seems to hang - there are no errors returned, so I have to manually cancel the query after about 3 minutes.
--
Could anyone advise the most efficient/fastest way to achieve this?
try like below it will work
DELETE FROM table_a WHERE `code` NOT IN (
select * from
(
SELECT `code` FROM `second_database`.`table_b`
) as t
);
Try the following query:
DELETE a
FROM first_database.table_a AS a
LEFT JOIN second_database.table_b AS b ON b.code = a.code
WHERE b.code IS NULL;

Before Insert MySql trigger replace new.id

I have the following scenario: I have a product_category table that manage relationships between products and categories with these columns:
id, product_id, category_id
I have another table (cat_eq) that groups some categories with another category with these columns:
id, mykey, source_cat_id, destiny_cat_id
source_cat_id column is a VARCHAR that store comma separated id's. For example: 12,25
I need to write a trigger that before inserting in product_category table checks if the new.category_id is in the set that results when making a SELECT given some mykey , for example:
if(new.category_id in
(select source_cat_id from cat_eq where clave = 'man') )
then
set new.category_id:=(select destiny_cat_id from cat_eq where clave = 'man');
When cat_eq has more than one value, let's say (12,15) the trigger works only if the id_category is in the first place of cat_eq table row.
What I want to get is the equivalent of this, wich works ok in the trigger:
if(new.category_id in (12,25) )
then
set new.category_id:=(select destiny_cat_id from cat_eq where clave = 'man');
How can I do this?
Thanks!
Think REGEXP is the easy solution here:
IF EXISTS
select 1 from cat_eq where clave = 'man' AND
source_cat_id REGEXP CONCAT('(^|,)', new.category_id, '(,|$)')
Basically, you're looking for something that either starts with the category_id and then is followed by a comma, starts with a comma is followed by the category_id then another comma, or starts with a comma, is followed by the category_id then matches the end.
A second approach might just be a LIKE clause:
select 1 from cat_eq where clave = 'man'
AND CONCAT(',',source_cat_id,',') LIKE CONCAT('%,',category_id,',%')

Table is specified twice, both as a target for 'UPDATE' and as a separate source for data in mysql

I have below query in mysql where I want to check if branch id and year of finance type from branch_master are equal with branch id and year of manager then update status in manager table against branch id in manager
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
(
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)
)
)
but getting error
Table 'm1' is specified twice, both as a target for 'UPDATE' and as a
separate source for data
This is a typical MySQL thing and can usually be circumvented by selecting from the table derived, i.e. instead of
FROM manager AS m2
use
FROM (select * from manager) AS m2
The complete statement:
UPDATE manager
SET status = 'Y'
WHERE branch_id IN
(
select branch_id
FROM (select * from manager) AS m2
WHERE (branch_id, year) IN
(
SELECT branch_id, year
FROM branch_master
WHERE type = 'finance'
)
);
The correct answer is in this SO post.
The problem with here accepted answer is - as was already mentioned multiple times - creating a full copy of the whole table. This is way far from optimal and the most space complex one. The idea is to materialize the subset of data used for update only, so in your case it would be like this:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
SELECT * FROM(
SELECT m2.branch_id FROM manager as m2
WHERE (m2.branch_id,m2.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance')
) t
)
Basically you just encapsulate your previous source for data query inside of
SELECT * FROM (...) t
Try to use the EXISTS operator:
UPDATE manager as m1
SET m1.status = 'Y'
WHERE EXISTS (SELECT 1
FROM (SELECT m2.branch_id
FROM branch_master AS bm
JOIN manager AS m2
WHERE bm.type = 'finance' AND
bm.branch_id = m2.branch_id AND
bm.year = m2.year) AS t
WHERE t.branch_id = m1.branch_id);
Note: The query uses an additional nesting level, as proposed by #Thorsten, as a means to circumvent the Table is specified twice error.
Demo here
Try :::
UPDATE manager as m1
SET m1.status = 'Y'
WHERE m1.branch_id IN (
(SELECT DISTINCT branch_id
FROM branch_master
WHERE type = 'finance'))
AND m1.year IN ((SELECT DISTINCT year
FROM branch_master
WHERE type = 'finance'))
The problem I had with the accepted answer is that create a copy of the whole table, and for me wasn't an option, I tried to execute it but after several hours I had to cancel it.
A very fast way if you have a huge amount of data is create a temporary table:
Create TMP table
CREATE TEMPORARY TABLE tmp_manager
(branch_id bigint auto_increment primary key,
year datetime null);
Populate TMP table
insert into tmp_manager (branch_id, year)
select branch_id, year
from manager;
Update with join
UPDATE manager as m, tmp_manager as tmp_m
inner JOIN manager as man on tmp_m.branch_id = man.branch_id
SET status = 'Y'
WHERE m.branch_id = tmp_m.branch_id and m.year = tmp_m.year and m.type = 'finance';
This is by far the fastest way:
UPDATE manager m
INNER JOIN branch_master b on m.branch_id=b.branch_id AND m.year=b.year
SET m.status='Y'
WHERE b.type='finance'
Note that if it is a 1:n relationship the SET command will be run more than once. In this case that is no problem. But if you have something like "SET price=price+5" you cannot use this construction.
Maybe not a solution, but some thoughts about why it doesn't work in the first place:
Reading data from a table and also writing data into that same table is somewhat an ill-defined task. In what order should the data be read and written? Should newly written data be considered when reading it back from the same table? MySQL refusing to execute this isn't just because of a limitation, it's because it's not a well-defined task.
The solutions involving SELECT ... FROM (SELECT * FROM table) AS tmp just dump the entire content of a table into a temporary table, which can then be used in any further outer queries, like for example an update query. This forces the order of operations to be: Select everything first into a temporary table and then use that data (instead of the data from the original table) to do the updates.
However if the table involved is large, then this temporary copying is going to be incredibly slow. No indexes will ever speed up SELECT * FROM table.
I might have a slow day today... but isn't the original query identical to this one, which souldn't have any problems?
UPDATE manager as m1
SET m1.status = 'Y'
WHERE (m1.branch_id, m1.year) IN (
SELECT DISTINCT branch_id,year
FROM `branch_master`
WHERE type = 'finance'
)

sql query for deleting rows with NOT IN using 2 columns

I have a table with a composite key composed of 2 columns, say Name and ID. I have some service that gets me the keys (name, id combination) of the rows to keep, the rest i need to delete. If it was with only 1 row , I could use
delete from table_name where name not in (list_of_valid_names)
but how do I make the query so that I can say something like
name not in (valid_names) and id not in(valid_ids)
// this wont work since they separately dont identity a unique record or will it?
Use mysql's special "multiple value" in syntax:
delete from table_name
where (name, id) not in (select name, id from some_table where some_condition);
If your list is a literal list, you can still use this approach:
delete from table_name
where (name, id) not in (select 'john', 1 union select 'sally', 2);
Actually, no I retract my comment about needing special juice or being stuck with (AND OR'ing all your options).
Since you have a list of values of what you want to retain, dump that into a temporary table. Then do a delete against the base table for what does not exist in the temporary table (left outer join). I suck at mysql syntax or I'd cobble together your query. Psuedocode is approximate
DELETE
B
FROM
BASE B
LEFT OUTER JOIN
#RETAIN R
ON R.key1 = B.key1
AND R.key2 = B.key
WHERE
R.key1 IS NULL
The NOT EXISTS version:
DELETE
b
FROM
BaseTable b
WHERE
NOT EXISTS
( SELECT
*
FROM
RetainTable r
WHERE
(r.key1, r.key2) = (b.key1, b.key2)
)