I need to delete around 300,000 duplicates in my database. I want to check the Card_id column for duplicates, then check for duplicate timestamps. Then delete one copy and keep one. Example:
| Card_id | Time |
| 1234 | 5:30 |
| 1234 | 5:45 |
| 1234 | 5:30 |
| 1234 | 5:45 |
So remaining data would be:
| Card_id | Time |
| 1234 | 5:30 |
| 1234 | 5:45 |
I have tried several different delete statements, and merging into a new table but with no luck.
UPDATE: Got it working!
Alright after many failures I got this to work for DB2.
delete from(
select card_id, time, row_number() over (partition by card_id, time) rn
from card_table) as A
where rn > 1
rn increments when there are duplicates for card_id and time. The duplicated, or second rn, will be deleted.
I strongly suggest you take this approach:
create temporary table tokeep as
select distinct card_id, time
from t;
truncate table t;
insert into t(card_id, time)
select *
from tokeep;
That is, store the data you want. Truncate the table, and then regenerate it. By truncating the table, you get to keep triggers and permissions and other things linked to the table.
This approach should also be faster than deleting many, many duplicates.
If you are going to do that, you ought to insert a proper id as well:
create temporary table tokeep as
select distinct card_id, time
from t;
truncate table t;
alter table t add column id int auto_increment;
insert into t(card_id, time)
select *
from tokeep;
If you haven't Primary key or Candidate key probably there is no option using only one command. Try solution below.
Create table with duplicates
select Card_id,Time
into COPY_YourTable
from YourTable
group by Card_id,Time
having count(1)>1
Remove duplicates using COPY_YourTable
delete from YourTable
where exists
(
select 1
from COPY_YourTable c
where c.Card_id = YourTable.Card_id
and c.Time = YourTable.Time
)
Copy data without duplicates
insert into YourTable
select Card_id,Time
from COPY_YourTabl
Related
How to retrieve odd rows from the table?
In the Base table always Cr_id is duplicated 2 times.
Base table
I want a SELECT statement that retrieves only those c_id =1 where Cr_id is always first as shown in the output table.
Output table
Just see the base table and output table you should automatically know what I want, Thanx.
Just testing min date should be enough
drop table if exists t;
create table t(c_id int,cr_id int,dt date);
insert into t values
(1,56,'2020-12-17'),(56,56,'2020-12-17'),
(1,8,'2020-12-17'),(56,8,'2020-12-17'),
(123,78,'2020-12-17'),(1,78,'2020-12-18');
select c_id,cr_id,dt
from t
where c_id = 1 and
dt = (select min(dt) from t t1 where t1.cr_id = t.cr_id);
+------+-------+------------+
| c_id | cr_id | dt |
+------+-------+------------+
| 1 | 56 | 2020-12-17 |
| 1 | 8 | 2020-12-17 |
+------+-------+------------+
2 rows in set (0.002 sec)
What you're looking for could be "partition by", at least if you're working on mssql.
(In the future, please include more background, SQL is not just SQL)
https://codingsight.com/grouping-data-using-the-over-and-partition-by-functions/
I have an old query lying around, that is able to put a sorting index on data who lacks this, although the underlying reason is 99.9% sure to be a bad data design.
Typically I use this query to remove bad data, but you may rewrite it to become a join instead, so that you can identify the data you need.
The reason why I'm not putting that answer here, is to point out, bad data design results in more work when reading it afterwards, whom seems to be the real root cause here.
DELETE t
FROM
(
SELECT ROW_NUMBER () OVER (PARTITION BY column_1 ,column_2, column_3 ORDER BY column_1,column_2 ,column_3 ) AS Seq
FROM Table
)t
WHERE Seq > 1
Briefly: database imported from foreign source, so I cannot prevent duplicates, I can only prune and clean the database.
Foreign db changes daily, so, I want to automate the pruning process.
It resides on:
MariaDB v10.4.6 managed predominantly by phpMyadmin GUI v4.9.0.1 (both pretty much up to date as of this writing).
This is a radio browsing database.
It has multiple columns, but for me there are only few important:
StationID (it is unique entry number, thus db does not consider new entries as duplicates, all of them are unique because of this primary key)
There are no row numbers.
Name, url, home-page, country, etc
I do want to remove multiple url duplicated entries base on:
duplicate url has country to it, but some country values are NULL (=empty)
so I do want remove all duplicates except one containing country name, if there is one entry with it, if there is none, just one url, regardless of name (names are multilingual, so some duplicated urls have also various names, which I do not care for.
StationID (unique number, but not consecutive, also this is primary db key)
Name (variable, least important)
url (variable, but I do want to remove the duplicates)
country (variable, frequently NULL/empty, I want to eliminate those with empty entries as much as possible, if possible)
One url has to stay by any means (not to be deleted)
I have tried multitude of queries, some work for SELECT, but do NOT for DELETE, some hang my machine when executed. Here are some queries I tried (remember I use MariaDB, not oracle, or ms-sql)
SELECT * from `radio`.`Station`
WHERE (`radio`.`Station`.`Url`, `radio`.`Station`.`Name`) IN (
SELECT `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
FROM `radio`.`Station`
GROUP BY `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
HAVING COUNT(*) > 1)
This one should show all entries (not only one grouped), but this query hangs my machine
This query gets me as close as possible:
SELECT *
FROM `radio`.`Station`
WHERE `radio`.`Station`.`StationID` NOT IN (
SELECT MAX(`radio`.`Station`.`StationID`)
FROM `radio`.`Station`
GROUP BY `radio`.`Station`.`Url`,`radio`.`Station`.`Name`,`radio`.`Station`.`Country`)
However this query lists more entries:
SELECT *, COUNT(`radio`.`Station`.`Url`) FROM `radio`.`Station` GROUP BY `radio`.`Station`.`Name`,`radio`.`Station`.`Url` HAVING (COUNT(`radio`.`Station`.`Url`) > 1);
But all of these queries group them and display only one row.
I also tried UNION, INNER JOIN, but failed.
WITH cte AS..., but phpMyadmin does NOT like this query, and mariadb cli also did not like it.
I also tried something of this kind, published at oracle blog, which did not work, and I really had no clue what was what in this function:
select *
from (
select f.*,
count(*) over (
partition by `radio`.`Station`.`Url`, `radio`.`Station`.`Name`
) ct
from `radio`.`Station` f
)
where ct > 1
I did not know what f.* was, query did not like ct.
Given
drop table if exists radio;
create table radio
(stationid int,name varchar(3),country varchar(3),url varchar(3));
insert into radio values
(1,'aaa','uk','a/b'),
(2,'bbb','can','a/b'),
(3,'bbb',null,'a/b'),
(4,'bbb',null,'b/b'),
(5,'bbb',null,'b/b');
You could give the null countries a unique value (using coalesce), fortunately stationid is unique so:
select t.stationid,t.name,t.country,t.url
from radio t
join
(select url,max(coalesce(country,stationid)) cntry from radio t group by url) s
on s.url = t.url and s.cntry= coalesce(t.country,t.stationid);
Yields
+-----------+------+---------+------+
| stationid | name | country | url |
+-----------+------+---------+------+
| 1 | aaa | uk | a/b |
| 5 | bbb | NULL | b/b |
+-----------+------+---------+------+
2 rows in set (0.00 sec)
Translated to a delete
delete t from radio t
join
(select url,max(coalesce(country,stationid)) cntry from radio t group by url) s
on s.url = t.url and s.cntry <> coalesce(t.country,t.stationid);
MariaDB [sandbox]> select * from radio;
+-----------+------+---------+------+
| stationid | name | country | url |
+-----------+------+---------+------+
| 1 | aaa | uk | a/b |
| 5 | bbb | NULL | b/b |
+-----------+------+---------+------+
2 rows in set (0.00 sec)
Fix 2 problems at once:
Dup rows already in table
Dup rows can still be put in table
Do this fore each table:
CREATE TABLE new LIKE real;
ALTER TABLE new ADD UNIQUE(x,y); -- will prevent future dups
INSERT IGNORE INTO new -- IGNORE dups
SELECT * FROM real;
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
I am generating a mySQL query from PHP.
Part of the query re-orders a table based on some variables (which do not include the primary key).
The code doesn't produce errors, however the table is not sorted.
I echo'd out the SQL code, and it looks correct, I tried running it directly in phpMyAdmin, and it runs also without error, but the table is still not sorted as requested.
alter table anavar order by dset_name, var_id;
I am pretty sure that this has to do with the fact that I have a primary key variable (UID) which is not present in the sort.
Both prior and post running the query the table remains ordered by UID. Deleting UID and re-running the query results in a correctly sorted table, but this seems like an overkill solution.
Any suggestions?
create table t2
( id int auto_increment primary key,
someInt int not null,
thing varchar(100) not null,
theWhen datetime not null,
key(theWhen) -- creates an index on theWhen
);
-- my table now has 2 indexes on it
-- see it by running `show indexes from t2`
-- truncate table t2;
insert t2(someInt,thing,theWhen) values
(17,'chopstick','2016-05-08 13:00:00'),
(14,'alligator','2016-05-01'),
(11,'snail','2016-07-08 19:00:00');
select * from t2; -- returns in physical order (the primary key `id`)
select * from t2 order by thing; -- returns via thing, which has no index anyway
select * from t2 order by theWhen,thing; -- partial index use
note that indexes aren't even used until you have a significant number of rows in a db anyway
Edit (new data comes in)
insert t2 (someInt,thing,theWhen) values (777,'apple',now());
select t2.id,t2.thing,t2.theWhen,#rnk:=#rnk+1 as rank
from t2
cross join (select #rnk:=0) xParams
order by thing;
+----+-----------+---------------------+------+
| id | thing | theWhen | rank |
+----+-----------+---------------------+------+
| 2 | alligator | 2016-05-01 00:00:00 | 1 |
| 4 | apple | 2016-09-04 15:04:50 | 2 |
| 1 | chopstick | 2016-05-08 13:00:00 | 3 |
| 3 | snail | 2016-07-08 19:00:00 | 4 |
+----+-----------+---------------------+------+
Focus on the fact that you can maintain your secondary indices and generate a rank on the fly whenever you want.
I have some historical data tables in my Mysql database.
I want to repeat a day's historical data for another day in the same table.
Table structure, with some sample data:
Id | Date | Value
1 | 2012-04-30 | 5
2 | 2012-04-30 | 10
3 | 2012-04-30 | 15
I want to repeat those ids & values, but for a new date - e.g. 2012-05-01. i.e. adding:
1 | 2012-05-01 | 5
2 | 2012-05-01 | 10
3 | 2012-05-01 | 15
I feel that there should be a straightforward way of doing this... I've tried playing with UPDATE statements with sub-queries and using multiple LEFT JOINs, but haven't get there yet.
Any ideas on how I can do this?
EDIT: To clarify...
- I do NOT want to add these to a new table
- Nor do I want to change the existing records in the table.
- The ids are intentionally duplicated (they are a foreign_key to another table that records what the data refers to...).
INSERT INTO yourTable
SELECT ID, "2012-05-01" As Date, Value
FROM yourTable
WHERE Date = "2012-04-31"
Usually, your ID would be an autoincrement though, so having the same ID in the same table would not work. Either use a different ID, or a different table.
Different ID (next autoincrement):
INSERT INTO yourTable
SELECT NULL as ID, "2012-05-01" As Date, Value
FROM yourTable
WHERE Date = "2012-04-31"
Different table (referring to original ID)
INSERT INTO yourTable_hist
SELECT NULL as ID, ID as old_ID, "2012-05-01" As Date, Value
FROM yourTable
WHERE Date = "2012-04-31"
Maybe something like this:
UPDATE Table1
SET Date=DATE_ADD(Date, INTERVAL 1 DAY)
Or if you want to insert them to a new table:
INSERT INTO Table1
SELECT
ID,
DATE_ADD(Date, INTERVAL 1 DAY),
Value
FROM
Table2
I have a table:
+--------+-------------------+-----------+
| ID | Name | Order |
+--------+-------------------+-----------+
| 1 | John | 1 |
| 2 | Mike | 3 |
| 3 | Daniel | 4 |
| 4 | Lisa | 2 |
| 5 | Joe | 5 |
+--------+-------------------+-----------+
The order can be changed by admin hence the order column. On the admin side I have a form with a select box Insert After: to entries to the database. What query should I use to order+1 after the inserted column.
I want to do this in a such way that keeps server load to a minimum because this table has 1200 rows at present. Is this the correct way to save an order of the table or is there a better way?
Any help appreciated
EDIT:
Here's what I want to do, thanks to itsmatt:
want to reorder row number 1 to be after row 1100, you plan to leave 2-1100 the same and then modify 1 to be 1101 and increment 1101-1200
You need to do this in two steps:
UPDATE MyTable
SET `Order` = `Order` + 1
WHERE `Order` > (SELECT `Order`
FROM MyTable
WHERE ID = <insert-after-id>);
...which will shift the order number of every row further down the list than the person you're inserting after.
Then:
INSERT INTO MyTable (Name, `Order`)
VALUES (Name, (SELECT `Order` + 1 FROM MyTable WHERE ID = <insert-after-id>));
To insert the new row (assuming ID is auto increment), with an order number of one more than the person you're inserting after.
Just add the new row in any normal way and let a later SELECT use ORDER BY to sort. 1200 rows is infinitesimally small by MySQL standards. You really don't have to (and don't want to) keep the physical table sorted. Instead, use keys and indexes to access the table in a way that will give you what you want.
you can
insert into tablename (name, `order`)
values( 'name', select `order`+1 from tablename where name='name')
you can also you id=id_val in your inner select.
Hopefully this is what you're after, the question isn't altogether clear.