How to delete only duplicate rows? - mysql

How I can create an SQL command to delete all rows from table where I have two or more specific columns with the same value and still I don't lose that row, only the duplicates?
For example:
Id value1 value2
1 71 5
2 8 8
3 8 8
4 8 8
5 23 26
Id2, Id3 and Id4 have same value1 and value2.
I need to delete all duplicate rows like (Id3 and Id4) or (Id2 and Id4) or (Id2 and Id3)

delete t
from table1 t
inner join table1 t2
on t.id>t2.id and t.value1=t2.value1 and t.value2=t2.value2

Since MySQL allows ungrouped fields in queries:
CREATE TEMPORARY TABLE ids AS
(SELECT id
FROM your_table
GROUP BY value1, value2);
DELETE FROM your_table
WHERE id NOT IN (SELECT id FROM ids);

What you can do is copy the distinct records into a new table by:
select distinct * into NewTable from MyTable

Related

How to keep from inserting duplicated rows into a table?

I have a table MY_TABLE that contains records as follows:
NAME
ID1
ID2
FLAG
JESSICA
12
34
TRUE
NULL
12
34
TRUE
I want to insert into another table TEST the values from the last 3 columns from MY_TABLE but I don't want to duplicate the rows
I am trying to do:
INSERT ALL
WHEN ID1 IS NOT NULL AND FLAG THEN
INTO TEST VALUES (
ID1,
ID2,
FLAG
)
SELECT *
FROM MY_TABLE
LEFT JOIN TEMP ON ID1;
This is resulting in my table looking like:
ID1
ID2
FLAG
12
34
TRUE
12
34
TRUE
instead of:
ID1
ID2
FLAG
12
34
TRUE
The issue I am running into is that these duplicated values for these last 3 columns are resulting from my join and I can't select only the last 3 columns in my query because I need the first column for another table I am inserting into as well (not shown and also is the reason why I need to use INSERT ALL here). Is there a way to solve this duplicate rows issue within the INSERT itself?
You can project a column with a row number for each group of rows and use the row number to decide which row to use in your multi-table insert.
create or replace temp table T1 as
select
COLUMN1::string as "NAME",
COLUMN2::string as "ID1",
COLUMN3::string as "ID2",
COLUMN4::string as "FLAG"
from (values
('JESSICA','12','34','TRUE'),
('NULL','12','34','TRUE')
);
select NAME
,ID1
,ID2
,FLAG
,row_number() over (partition by ID1, ID2, FLAG order by NAME nulls last) ROWNUMBER
from t1
;
That select will produce a result set like this:
NAME
ID1
ID2
FLAG
ROWNUMBER
JESSICA
12
34
TRUE
1
NULL
12
34
TRUE
2
In your multi-table insert, you can then key off the ROWNUMBER column:
INSERT ALL
WHEN ID1 IS NOT NULL AND FLAG AND ROWNUMBER = 1 THEN
[etc., etc.]

How to insert unique records from one table to another in MySQL

I've a sample table table1:
id transaction_number net_amount category type
1 100000 2000 A ZA
2 100001 4000 A ZA
3 100002 6000 B ZB
I've a sample table table2:
id transaction_number net_amount category type
1 100002 6000 B ZB
How do I insert unique records that are not in table2, but present in table1?
Desired result:
id transaction_number net_amount category type
1 100002 6000 B ZB
2 100000 2000 A ZA
3 100001 4000 A ZB
INSERT INTO table2 ( transaction_number, net_amount, category, type )
(
/* Rows in table1 that don't exist in table2: */
SELECT ( table1.transaction_number, table1.net_amount, table1.category, table1.type )
FROM table1
LEFT JOIN table2 ON ( table1.transaction_number = table2.transaction_number )
WHERE table2.transaction_number IS NULL
)
If you don't want to duplicate transaction numbers in table2, then create a unique index or constraint on that column (or the columns you want to be unique). Let the database handle the integrity of the data:
alter table table2 add constraint unq_table2_transaction_number
unique (transaction_number);
Then use on duplicate key update with a dummy update:
insert into table2 (transaction_number, net_amount, category, type)
select transaction_number, net_amount, category, type
from table1
on duplicate key update transaction_number = values(transaction_number);
Why do I recommend this approach? First, it is thread-safe, so it works even when multiple queries are modifying the database at the same time. Second, it puts the database in charge of data integrity, so the transactions will be unique regardless of how they are changed.
Note that the most recent versions of MySQL have deprecated this syntax in favor of the (standard) on conflict clause. The functionality is similar, but I don't think those versions are widespread.
Try this
INSERT INTO table2 (transaction_number,net_amount,category,type)
(SELECT transaction_number,net_amount,category,type from table1) ON DUPLICATE KEY UPDATE
net_amount=VALUES(net_amount),category=VALUES(category),type=VALUES(type);
Usw not exists as follows:
Insert into table2
Select t1.*
From table1 t1
Where not exists
(Select 1 from table2 t2
Where t1.transaction_number = t2.transaction_number)

MySQL SELECT Substring of rows that do not exist from other TABLE

I am sure this has been answered before, but I am just learning mysql and so I do not know how to properly search for the solution. I have two tables:
Table1 Table2
id email id domain
-- ---- -- ----
1 name#domain1.com 1 domain1.com
2 name#domain2.com 2 domain4.com
3 name#domain3.com
4 name#domain4.com
Using the emails in Table1, I would like to return the domains that do not exist in table2, and then write them to Table2, so I have a complete, unique list of domains in Table 2.
Table1 Table2
id email id domain
-- ---- -- ----
1 name#domain1.com 1 domain1.com
2 name#domain2.com 2 domain4.com
3 name#domain3.com 3 domain2.com
4 name#domain4.com 4 domain3.com
You can achieve this using a WHERE NOT IN with a sub query
INSERT INTO Table2 ( domain )
SELECT DISTINCT SUBSTRING_INDEX(email,'#',-1)
FROM Table1
WHERE SUBSTRING_INDEX(email,'#',-1) NOT IN (SELECT domain FROM Table2)
make the domain names a unique key in table2 then INSERT IGNORE select distinct substring() etc from table1
Alter table_2
add unique key k1(domain);
insert ignore into table_2(domain)
select distinct substring_index(email,'#',-1)
from table_1;
Where IGNORE will err ignore errors like duplicates.

Delete duplicates where field1 and field2 are the identical

I have a table like
productId retailerId
1 2
1 2
1 4
1 6
1 8
1 8
2 3
2 6
2 6
Now, I need to remove the duplicates. I've figured out how to remove duplicates when one field is the same. But I need to remove the duplicates such as 1 2, 1 8 and 2 6, where both fields are identical.
Any help would be very gratefully received.
Use mysql's multiple-table DELETE syntax as follows:
delete mytable
from mytable
join mytable t
on t.productId = mytable.productId
and t.retailerId = mytable.retailerId
and t.id < mytable.id
See this running on SQLFiddle.
Note that I have assumed that you have an id column as well.
Edit:
Since there is no id column, there simplest approach is to copy the desired data to a temporary table, delete all data, then copy it back, as follows:
CREATE TEMPORARY TABLE temptable
SELECT DISTINCT productId, retailerId
FROM mytable;
DELEYE FROM mytable;
INSERT INTO mytable
SELECT *
FROM temptable;

Slow query on update using select count(*)

I have to count how many times a number from table2 occurs between the number in range table2.a and table2.b
i.e. we wanna know how many times we have this : a < start < b
I ran the following query :
UPDATE table2
SET occurrence =
(SELECT COUNT(*) FROM table1 WHERE start BETWEEN table2.a AND table2.b);
table2
ID a b occurrence
1 1 10
2 1 20
3 1 25
4 2 30
table1
ID start col1 col2 col3
1 1
2 7
3 10
4 21
5 25
6 27
7 30
table2 as
3 indexes on a, b and occurrence
1567 rows (so we will SELECT COUNT(*) over table2 1567 times..)
ID column as PK
table1 as
1 index on start
42,000,000 rows
Column start was "ordered by column start"
ID column as PK
==> it took 2.5hours to do 2/3 of it. I need to speed this up... any suggestions ? :)
You could try to add the id column to the index on table 1:
CREATE INDEX start_index ON table1 (start,id);
And rewrite the query to
UPDATE table2
SET occurrence =
(SELECT COUNT(id) FROM table1 WHERE start BETWEEN table2.a AND table2.b);
This is called "covering index": http://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
-> The whole query on table 1 can be served through the data in the index -> no additional page lookup for the actual record.
Use a stored procedure. Keep the result from COUNT in a local variable, then use it to run the UPDATE query.
I will do this
// use one expensive join
create table tmp
select table2.id, count(*) as occurrence
from table1
inner join table1
on table1.start between table2.a and table2.b
group by table1.id;
update table2, tmp
set table2.occurrence=tmp.occurrence
where table2.id=tmp.id;
I think count(*) makes the database read the data rows when in your case it only needs to read the index. Try:
UPDATE table2
SET occurrence =
(SELECT COUNT(1) FROM table1 WHERE start BETWEEN table2.a AND table2.b);