Need to find and delete duplicate records in Netezza - duplicates

I need a netezza query to delete exact duplicate records and keep only unique records.
with how many ways we can delete duplicates in netezza.
Thanks in advance.
RG

This is probably the least sofisticated Solution:
Create table X as select distinct * from YOURTABLE
;
---- make sure X looks as you expect ----
Truncate table YOURTABLE
;
Insert into YOURTABLE select * from X
;
You may choose to create the table X as ‘temp’, but the risk of loosing data is potentially higher if you are not 100% sure what you are doing...

Related

Best way to write SQL delete statement, deleting pairs of records

I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.

MySQL copy distinct values from table to. table

One of my table contains data(numbers) that i would like to copy to other table, but problem is that data is not unique there can be 2 or more rows with same data i would like to copy (i need to copy each number only once). Table is around 3 milion records. Is any effcient way to do this?
Would this work for you?
INSERT INTO destination_table ('the_value_field') SELECT DISTINCT('the_value_field') FROM origin_table
Suppose there are two columns a, b in your table
INSERT INTO new_table (a, b) SELECT
a, b FROM old_table GROUP BY
a, b HAVING COUNT(*) > 1;
you can extend this with more columns.
this will be a slow process and may never complete with huge data.
So, instead copy all values into new_table using
Insert into new_table select * from old_table;
and then delete duplicate records from new table . This can be relatively faster and is with an assured completion.
You can use SELECT DISTINCT to select only the unique values.
https://www.w3schools.com/sql/sql_distinct.asp
SELECT DISTINCT `val` FROM `table_name`

Dropping duplicate MySQL rows based on column data

I have a table called sg with the following columns:
player_uuid, player_name, coins, kills, deaths, and wins
However, I ran into an issue that caused some duplicate rows and some of those rows been modified. So, I am wondering how to drop the rows with older data. That said...
How do I drop the duplicate rows where player_uuid is the same? But I only want to drop the rows where coins, kills, deaths, or wins is smaller than it's duplicate.
Example data: http://i.stack.imgur.com/Xieod.png
In this case, I want to keep the row with 46 deaths and delete the row with 43 deaths.
Failing to come up with a single delete statement due to the way the data is structured: 3 Delete statements instead:
The way it works is: Find if there are multiple rows for a given UUID, and determined which row is to be kept (Max value of the given column), then join back on itself and determine which rows are not to be kept, store in temporary table and delete all that is marked in that temporary table from the main data table (called someTable). The benefit of this approach is: If you have more then 1 duplicate (3,4,5 rows till infinity), they will also be deleted.
CREATE TEMPORARY TABLE tempTable AS
SELECT a.player_uuid, a.kills, b.keepRow
FROM someTable a
LEFT JOIN (SELECT MAX(kills) AS kills, player_uuid, 1 AS keepRow
FROM sometable
GROUP BY player_uuid
HAVING COUNT(*)>1
) b ON a.player_uuid=b.player_uuid AND a.kills=b.kills
WHERE b.keepRow!=1;
DELETE a.* FROM someTable a, tempTable b
WHERE a.player_uuid=b.player_uuid AND a.kills=b.kills;
Repeat for the other columns (wins,coins,deaths) by replacing all kills with the other column names.
Always test delete code first :)
Also: While you are at it:
At a unique index to prevent this from happening again:
CREATE UNIQUE INDEX idx_st_nn_1 ON someTable(player_uuid);
When you then try to insert a faulty record, your code will just get an error in return. The best code to handle inserts in that case would be:
INSERT INTO someTable(player_uuid,kills) VALUES ('someplayer',1000)
ON DUPLICATE KEY UPDATE kills=1000;
What also helps is having some time indicator column: Then only one delete would have to be executed:
ALTER TABLE someTable ADD COLUMN (last_updated TIMESTAMP);
Timestamps update them selves, so no code changes required to use this.

MySQL delete multiple rows in one query conditions unique to each row

So I know in MySQL it's possible to insert multiple rows in one query like so:
INSERT INTO table (col1,col2) VALUES (1,2),(3,4),(5,6)
I would like to delete multiple rows in a similar way. I know it's possible to delete multiple rows based on the exact same conditions for each row, i.e.
DELETE FROM table WHERE col1='4' and col2='5'
or
DELETE FROM table WHERE col1 IN (1,2,3,4,5)
However, what if I wanted to delete multiple rows in one query, with each row having a set of conditions unique to itself? Something like this would be what I am looking for:
DELETE FROM table WHERE (col1,col2) IN (1,2),(3,4),(5,6)
Does anyone know of a way to do this? Or is it not possible?
You were very close, you can use this:
DELETE FROM table WHERE (col1,col2) IN ((1,2),(3,4),(5,6))
Please see this fiddle.
A slight extension to the answer given, so, hopefully useful to the asker and anyone else looking.
You can also SELECT the values you want to delete. But watch out for the Error 1093 - You can't specify the target table for update in FROM clause.
DELETE FROM
orders_products_history
WHERE
(branchID, action) IN (
SELECT
branchID,
action
FROM
(
SELECT
branchID,
action
FROM
orders_products_history
GROUP BY
branchID,
action
HAVING
COUNT(*) > 10000
) a
);
I wanted to delete all history records where the number of history records for a single action/branch exceed 10,000. And thanks to this question and chosen answer, I can.
Hope this is of use.
Richard.
Took a lot of googling but here is what I do in Python for MySql when I want to delete multiple items from a single table using a list of values.
#create some empty list
values = []
#continue to append the values you want to delete to it
#BUT you must ensure instead of a string it's a single value tuple
values.append(([Your Variable],))
#Then once your array is loaded perform an execute many
cursor.executemany("DELETE FROM YourTable WHERE ID = %s", values)

mySQL find dupes and remove them

I am wondering if there is a way to do this through one query.
Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using
SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID
What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?
also I realize the ID column should have been set to auto increment, but it wasn't
My favorite way of removing duplicates would be:
ALTER IGNORE TABLE info ADD UNIQUE (ID);
To explain a bit further (for reference, take a look here)
UNIQUE - you are adding unique index to ID column.
IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
The query that I use is generally something like
Delete from table where id in (
Select Max(id) from table
Group by (DUPFIELD)
Having count (*)>1)
You have to run this several times since it all only remove one duplicated row at a time, but it's fast.
The most efficient way is you do it in below steps:
Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;