I have a table called sg with the following columns:
player_uuid, player_name, coins, kills, deaths, and wins
However, I ran into an issue that caused some duplicate rows and some of those rows been modified. So, I am wondering how to drop the rows with older data. That said...
How do I drop the duplicate rows where player_uuid is the same? But I only want to drop the rows where coins, kills, deaths, or wins is smaller than it's duplicate.
Example data: http://i.stack.imgur.com/Xieod.png
In this case, I want to keep the row with 46 deaths and delete the row with 43 deaths.
Failing to come up with a single delete statement due to the way the data is structured: 3 Delete statements instead:
The way it works is: Find if there are multiple rows for a given UUID, and determined which row is to be kept (Max value of the given column), then join back on itself and determine which rows are not to be kept, store in temporary table and delete all that is marked in that temporary table from the main data table (called someTable). The benefit of this approach is: If you have more then 1 duplicate (3,4,5 rows till infinity), they will also be deleted.
CREATE TEMPORARY TABLE tempTable AS
SELECT a.player_uuid, a.kills, b.keepRow
FROM someTable a
LEFT JOIN (SELECT MAX(kills) AS kills, player_uuid, 1 AS keepRow
FROM sometable
GROUP BY player_uuid
HAVING COUNT(*)>1
) b ON a.player_uuid=b.player_uuid AND a.kills=b.kills
WHERE b.keepRow!=1;
DELETE a.* FROM someTable a, tempTable b
WHERE a.player_uuid=b.player_uuid AND a.kills=b.kills;
Repeat for the other columns (wins,coins,deaths) by replacing all kills with the other column names.
Always test delete code first :)
Also: While you are at it:
At a unique index to prevent this from happening again:
CREATE UNIQUE INDEX idx_st_nn_1 ON someTable(player_uuid);
When you then try to insert a faulty record, your code will just get an error in return. The best code to handle inserts in that case would be:
INSERT INTO someTable(player_uuid,kills) VALUES ('someplayer',1000)
ON DUPLICATE KEY UPDATE kills=1000;
What also helps is having some time indicator column: Then only one delete would have to be executed:
ALTER TABLE someTable ADD COLUMN (last_updated TIMESTAMP);
Timestamps update them selves, so no code changes required to use this.
Related
I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.
I am trying to delete all rows from a table with a particular id.
my query is:
DELETE FROM table_name WHERE x_id='46';
the error returned is:
#1136 - Column count doesn't match value count at row 1
my table has a composite primary key x_id is one of the columns in the primary key.
Please Help!
That error is strange for a delete statement. It is most likely coming from badly written trigger that is being executed as a result of the delete.
This error would most likely be encountered on an insert statement such as the following:
insert into foo(bar, baz)
select bar, baz, foobar, 2
from myTable
Note how the insert statement specifies 2 columns, but provides 4 values.
You might try to provide a second value to the delete query to match the composite index for the row.
DELETE FROM CPI
WHERE (CountryID, Year) IN (('AD', 2010), ('AF', 2009), ('AG', 1992))
Cause:
You may have a trigger on this table, then you changed the table structure.
Now, you may get this error when you delete, insert, or update in this table (depending on the trigger event you specified).
Solution:
To solve this issue, you have to update the trigger as well, the number of columns of the trigger should match the number of the columns of the table.
I have a table that has some duplicate results. For example:
`person_url` `movie_url`
1 2
1 2
2 3
Would become -->
`person_url` `movie_url`
1 2
2 3
I know how to do it by creating a new table,
create table tmp_credits (select distinct * from name);
However, it is a pretty large table and I have a couple indexes on it which will need to be re-created. How would I do this transformation in place, that is, without creating a new table?
You can add a UNIQUE index over your table's columns using the IGNORE keyword:
ALTER IGNORE TABLE name ADD UNIQUE INDEX (person_url, movie_url);
As stated in the manual:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
This will also prevent duplicates from being added in the future.
`create table temp
(col1 varchar(20),col2 varchar(20));
INSERT INTO temp VALUES
('1','one'),('2','two'),('2','two');
`select col1,col2 from temp
union
select col1,col2 from temp;
`
Have you considered just putting a semantic layer/view on top of the table that de-dups?
select person_url, movie_url
from name
group by person_url, movie_url
I'm able to display duplicates in my table
table name reportingdetail and column name ReportingDetailID
SELECT DISTINCT ReportingDetailID from reportingdetail group by ReportingDetailID HAVING count(ReportingDetailID) > 1;
+-------------------+
| ReportingDetailID |
+-------------------+
| 664602311 |
+-------------------+
1 row in set (2.81 sec)
Dose anyone know how can I go about deleting duplicates and keep only one record?
I tired the following
SELECT * FROM reportingdetail USING reportingdetail, reportingdetail AS vtable WHERE (reportingdetailID > vtable.id) AND (reportingdetail.reportingdetailID=reportingdetailID);
But it just deleted everything and kept single duplicates records!
The quickest way (that I know of) to remove duplicates in MySQL is by adding an index.
E.g., assuming reportingdetailID is going to be the PK for that table:
mysql> ALTER IGNORE TABLE reportingdetail
-> ADD PRIMARY KEY (reportingdetailID);
From the documentation:
IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table or
if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key errors
occur. If IGNORE is specified, only the first row is used of rows with
duplicates on a unique key. The other conflicting rows are deleted.
Incorrect values are truncated to the closest matching acceptable
value.
Adding this index will both remove duplicates and prevent any future duplicates from being inserted. If you do not want the latter behavior, just drop the index after creating it.
The following MySQL commands will create a temporary table and populate it with all columns GROUPED by one column name (the column that has duplicates) and order them by the primary key ascending. The second command creates a real table from the temporary table. The third command drops the table that is being used and finally the last command renames the second temporary table to the current being used table name.
Thats a really fast solution. Here are the four commands:
CREATE TEMPORARY TABLE videos_temp AS SELECT * FROM videos GROUP by
title ORDER BY videoid ASC;
CREATE TABLE videos_temp2 AS SELECT * FROM videos_temp;
DROP TABLE videos;
ALTER TABLE videos_temp2 RENAME videos;
This should give you duplicate entries.
SELECT `ReportingDetailID`, COUNT(`ReportingDetailID`) AS Nummber_of_Occurrences FROM reportingdetail GROUP BY `ReportingDetailID` HAVING ( COUNT(`ReportingDetailID`) > 1 )
I am wondering if there is a way to do this through one query.
Seems when I was initially populating my DB with dummy data to work with 10k records, somewhere in the mess of it all the script dummped an extra 1,044 rows where the rows are duplicates. I determined this using
SELECT x.ID, x.firstname FROM info x
INNER JOIN (SELECT ID FROM info
GROUP BY ID HAVING count(id) > 1) d ON x.ID = d.ID
What I am trying to figure out is through this single query can I add another piece to it that will remove one of the matching dupes from each dupe found?
also I realize the ID column should have been set to auto increment, but it wasn't
My favorite way of removing duplicates would be:
ALTER IGNORE TABLE info ADD UNIQUE (ID);
To explain a bit further (for reference, take a look here)
UNIQUE - you are adding unique index to ID column.
IGNORE - is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.
The query that I use is generally something like
Delete from table where id in (
Select Max(id) from table
Group by (DUPFIELD)
Having count (*)>1)
You have to run this several times since it all only remove one duplicated row at a time, but it's fast.
The most efficient way is you do it in below steps:
Step 1: Move the non duplicates (unique tuples) into a temporary table
CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY [column to remove duplicates by];
Step 2: delete delete the old table.We no longer need the table with all the duplicate entries, so drop it!
DROP TABLE old_table;
Step 3: rename the new_table to the name of the old_table
RENAME TABLE new_table TO old_table;